Parsing large xml with Nokogiry and Ruby
It happens that if you try to parse large xml files with ruby and Nokogiri (larger than a couple of Gigas) your laptop will become stale or your memory will simply be dried out.
What you ahve to do in cases like this is parse the file one node at a time…
#!/usr/bin/env ruby
# encoding: utf-8
require ‘nokogiri’
class MyDocument < Nokogiri::XML::SAX::Document
def end_document
puts “the document has ended”
end
def start_element elem, attribs = []
@attry = {}
attribs.each do |attrib|
@attry[“#{attrib[0]}”] = “#{attrib[1]}”
end
puts “do something with your #{elem}”
puts “you can display your attributes like this #{attry[‘attribute_name’]}”
end
end
parser = Nokogiri::XML::SAX::Parser.new(MyDocument.new)
parser.parse(File.open(“import.xml”))
end