Parsing large xml with Nokogiry and Ruby

It happens that if you try to parse large xml files with ruby and Nokogiri (larger than a couple of Gigas) your laptop will become stale or your memory will simply be dried out.

What you ahve to do in cases like this is parse the file one node at a time…

#!/usr/bin/env ruby
# encoding: utf-8

require ‘nokogiri’

class MyDocument < Nokogiri::XML::SAX::Document
  def end_document
    puts “the document has ended”
  end

  def start_element elem, attribs = []
    @attry = {}
    attribs.each do |attrib|
      @attry[“#{attrib[0]}”] = “#{attrib[1]}”
    end
    puts “do something with your #{elem}”

    puts “you can display your attributes like this #{attry[‘attribute_name’]}”

  end

end



parser = Nokogiri::XML::SAX::Parser.new(MyDocument.new)
parser.parse(File.open(“import.xml”))



end