ruby: performance comparison of rexml and libxml

Posted by phillip Sun, 18 Mar 2007 10:20:00 GMT

update: here’s the same for PHP’s XML Parser.

a quick comparison of the two libraries available for processing XML in ruby shows dramatic performance differences.

am i missing something, is there a fundamental flaw in the test? of course REXML is pure ruby, while libxml is C; but can the difference really be so huge?

loading an xml file

file size libxml REXML factor
10KB 0,83 39,17 47,0
100KB 6,67 306,56 46,0
1.6MB 71,88 3954,21 55,0

simple xpath expression

file size libxml REXML factor
10KB 0,12 124,68 1004,7
100KB 0,67 678,11 1016,8
1.6MB 6,21 22578,18 3633,6

the test code

def benchmark
   start = Time.new.to_f
   10.times { yield }
   puts ((Time.new.to_f - start) / 10) * 1000
end

doc = nil

# exclude the effect of filesystem caching (makes sense?)
File.read('products.xml')

#
# libxml
#
require 'rubygems'
require 'xml/libxml'

benchmark do
   doc = XML::Document.file("products.xml")
end

benchmark do
   doc.find('//articles/article/shortdesc').each do |node|
      #puts node.content
   end
end

#
# rexml
#
require "rexml/document"

benchmark do
   doc = REXML::Document.new File.read("products.xml")
end

benchmark do
   doc.elements.each("//articles/article/shortdesc") do |node| 
      #puts node.text
   end
end

Posted in , , | Tags , , , ,