C XML Parser and Query
A simple C program for parsing XML files (with partial validation - see below) and querying them.
This is still very much work in progress, parsing is not 100% complete & querying is slowly underway. This also means git history will most likely be full of mess (random commits to store progress, occasional commit with broken code).
For Nix users, there is a flake.nix for setting up a dev-shell.
A simple nix develop
should probably just work, or just use direnv.
Having GNU Make and a C compiler (clang, gcc ..., make sure $CC is set) should be enough.
If you use clang, make sure to set ASAN_SYMBOLIZER_PATH
to which llvm-symbolizer
(it needs to be installed - usually via libllvm
).
$ make debug
$ ./debug/cxpq_debug ./examples/bookstore.xml
$ make realease
$ ./release/cxpq ../examples/bookstore.xml
- Parsing
- Prolog
- DTD
- (Root) Node
- Attributes
- Namespaces
- Content
- CDATA (parsed as text content, with a flag)
- Comments
- Processing Instructions
- Validation (should be reworked and under a flag)
- Valid file (no unclosed tags, comments, etc.) - should be done?
- Valid Prolog
- Valid Root Node name
- Valid Node names (partially, missing
xml
"ban") - Namespaces (currenly they just get included in node name)
- Valid document (has prolog & a single root node)
- Querying
- XPath?
- Custom query language (CSS selector like)
- Tests
- Individual node parsers
- Entire file parser
- Validator (after rework)
- Query parsing
- Query execution
Simply parse and print back xml document
$ cxpq ./examples/valid/bookstore.xml
Query all books anywhere in the xml document:
$ cxpq -Q xpath --query "//book" ./examples/valid/bookstore.xml
Select all tags anywhere in the xml document:
$ cxpq -Q xpath --query "*" ./examples/valid/bookstore.xml
Complex selector with sub query & function filtering, selects titles of books which have more than 2 price tags and are inside bookstore tag.
$ cxpq -Q xpath --query "//bookstore//book[price/position() > 1]/title" ./examples/valid/bookstore.xml