Where Are The Wise Men?

Mike's Ramblings

Another Shot of XML + AWK

| Comments

[It's been a while][]since I've done anything with xmlgawk but a sticky problem had to rear it's ugly head at me. The problem was fairly simple -- I had an XML file with some tags that contains some CDATA. I simply wanted to extract all that CDATA out. The kicker -- the XML file was a bit shy of 500MB. My standard [xmllint][]tool couldn't even get through the file. I didn't even attempt with Python or Java.

So I dusted off my xmlgawk and came up with this simple script:

@load xml XMLSTARTELEM=="text"{ data="" } XMLCHARDATA  { data=$0 }

XMLENDELEM=="text" {
print data
}

And then I redirected it to a file. Easy . . and fast. It only took 3-5 minutes to finish the output, and that was in Cygwin. So it's handy to know [xmlgawk][]once in a while.