ruby / jruby xml stream parser

June 20, 2009

to give sonar an xml input format, for content provided via a rest api or files, i looked around for a ruby xml parser oriented towards parsing large documents : perhaps too large to fit in memory

there are plenty of low-level stream parsers around e.g. in rexml, but they stop some way short of allowing the solution to be expressed in a natural way

here‘s a parser, which sits atop rexml’s pull parser, and allows you to formulate your parse in ruby blocks which straightforwardly process xml elements. what you keep and how you convert is completely specified by those blocks, so you can happily parse an unending document in constant memory

  <person name="alice">likes cheese</person>
  <person name="bob">likes music</person>
  <person name="charles">likes alice</person>

can be parsed with :

require 'rubygems'
require 'xml_stream_parser'

people = {} do
  element "people"  do |name,attrs|
    elements "person" do |name, attrs|
      people[attrs["name"]] = text

a plainer api is also supported, allowing a parse to be split over multiple methods [ since parse_dsl uses instance_exec to call blocks, and loses context ]

people = {} do |p|
  p.element( "people" ) do |name,attrs|
    p.elements( "person" ) do |name, attrs|
      people[attrs["name"]] = p.text

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: