User:Scsbot/xmlsed

From Wiktionary, the free dictionary
Jump to navigation Jump to search

xmlsed is (intended to be) a simple, general-purpose tool for parsing, analyzing, extracting data from, and modifying XML, HTML, and SGML files.

It is a work in progress -- it is not finished or complete. As of this writing, it has only a couple of features, just those needed by "wikised", the bot script run by User:Scsbot. But for this application (its only application so far) it works just fine.

(Yes, I know, I should have used an off-the-shelf XML tool, such as XSLT or Xerces, to perform these tasks, rather than reinventing the wheel. But the off-the-shelf tools I've looked at are Just Too Complicated.)

Invocation:

xmlsed [flags] inputfile [tag]

At this stage there are only two useful option flags:

-t
print a "table of contents" of the input file, showing the nesting structure of the tags. Also, each tag is given a unique identifying number.
-x
extract the contents of the requested tag. Tags can be identified in two ways: by their path, or by the unique identifying number listed by -t. Tag attributes can be extracted as well, using the syntax path/@tag or #uniqueid/@tag.

Source code: ftp://ftp.eskimo.com/u/s/scs/src/xmlsed.tar.gz