Skip to content

Fast and lightweight event-driven streaming XML parser in pure JavaScript

License

Notifications You must be signed in to change notification settings

TriplyDB/async-saxophone

 
 

Repository files navigation

async-saxophone

Not Maintained

async-saxophone is no longer being actively developed and has been archived.

It was developed as a asynchronous replacement for saxophone, but has not kept up with changes to saxophone. If this is of interest, you probably need to examine how saxophone was made asynchronous and apply similar changes to saxophone's current code. You are still welcome to use it but you may want to check for any forks that are more actively maintained. If it is of value to you, you are welcome to fork it or start a new project and incorporate its code into your project.

Description

Fast and lightweight asynchonous XML parser in pure JavaScript.

Async-saxophone is based upon Saxophone, which, in turn, is inspired by SAX parsers such as sax-js and EasySax: unlike most XML parsers, but like Saxophone, async-saxophone does not create a Document Object Model (DOM) tree as a result of parsing documents.

Instead, it implements an async iterator. Async-saxophone takes as input an XML document in the form of a string or any iterable, including a stream. It parses the XML and then outputs the nodes (tags, text, comments, etc) encountered as they are parsed. As an async iterator, it is suitable for iteration using for await...of.

Async-saxophone was developed to assure that a new chunk of XML is not taken from its input until all nodes encountered have been processed, even if there is delay in processing. The asynchronous design assures synchronization of input and output.

The async-saxophone parser is based upon the Saxophone parser and inherits its light weight and speed. It does not maintain document state nor check the validity of the document. Modifications to the Saxophone parser include structuring it as an async generator function, substituting yield for emit, expecting an input string or iterable as an argument, rather than being piped to, and representing each node as a tuple-like array.

The parser does not parse the attribute string in a tag nor does it parse entities in text. Saxophone's parseAttrs and parseEntities functions may be used to parse the attribute string or entities. To avoid unnecessary dependencies, Saxophone must be installed seperately if these functions are needed.

Note that if a generator function is passed as the iteratable, a "sourceIterator is not async iterable" error will be thrown. Consider a generator function to be a iterable factory. Call it to get the iterator. If a generator async function* genxml() {...} is defined, pass genxml() to the parser rather than genxml.

Installation

This package requires Node.JS 10.0 or later. It may also work in recent browsers that support async generator functions and for await...of. To install with npm:

$ npm install async-saxophone

Tests

To run tests, use the following commands:

$ git clone https://github.com/randymized/async-saxophone.git
$ cd async-saxophone
$ npm install
$ npm test

Example

const {makeAsyncXMLParser} = require('async-saxophone');
const delay = ms => new Promise(_ => setTimeout(_, ms));

const xml = '<root><example id="1" /><example id="2" /></root>'

async function main() {
    const parser = makeAsyncXMLParser();
    for await (let node of parser(xml)) {
        console.dir(node);
        await delay(500);
    }
    console.log('done')
}
main().catch(console.error)

Output:

[ 'tagopen', 'root', '', '' ]
[ 'tagopen', 'example', 'id="1"', '/' ]
[ 'tagopen', 'example', 'id="2"', '/' ]
[ 'tagclose', 'root' ]
done

Exports:

const {makeAsyncXMLParser} = require('async-saxophone');

  • makeAsyncXMLParser(options) takes parser options and returns a generator function that will parse an XML document.

    • options are detailed below.
    • parser(iterable) is the async generator function returned from makeAsyncXMLParser.
      • It takes as an argument any iterable of an XML document.
      • It returns an async iterator over the nodes encountered as the document is parsed.
  • options

    • include: a list of node types to be output. See AvailableNodes above for a complete list. If option = {include:['tagopen','tagclose']}, for example, only opening and closing tags will be output. If include is not specified, all nodes will be output.
    • alwaysTagClose: If a self-closing tag is encountered a tagclose node will be output
    • noEmptyText: If truish, empty text nodes, or text that is all whitespace will not be output.

Output:

The parser returned from makeAsyncXMLParser is an async generator function. It takes an iterable as an argument and returns an async iterator over the nodes encountered during parsing. The types of nodes and their representation is as follows:

  • tagopen: ['tagopen', tag-name, attr-string, is-self-closing].
    • tag-name the tag's name, as found in the XML: <tag-name ...>
    • attr-string everything between the tag name and > or />. This string may be parsed with Saxophone.parseAttrs to convert it into a key/value object. Any leading or trailing whitespace will be trimmed off.
    • is-self-closing will be either '/' (truish) if the tag is self-closing or '' (falsish) if it is not.
  • tagclose: ['tagclose', tag-name]
  • text: ['text',content]. Entities in the text may be parsed with the Saxophone.parseEntities function.
  • cdata: ['cdata',content]
  • commment: ['comment',content]
  • processinginstruction: ['processinginstruction',content]. Content of the processing instruction is not parsed.

Contributions

This is free and open source software. All contributions (even small ones) are welcome. Check out the contribution guide to get started!

Thanks to:

  • Mattéo Delabre for Saxophone. The (modified) Saxophone parser is at the heart of this package.
  • Norman Rzepka for the check in Saxophone for opening and closing tags mismatch.
  • winston01 for spotting and fixing an error in the Saxophone parser when a tag sits astride two chunks.

License

Released under the MIT license. See the full license text.

About

Fast and lightweight event-driven streaming XML parser in pure JavaScript

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%