A lua library for compiling a subset of PEGs to FSTs. Requires fst-fast-system (an NFST interpreter) and fst-fast (a lua library wrapping fst-fast-system).
For arbitrary regular expressions or string literals A and B, this library turns A/B, AB, and A* into DFAs quite well. However, this library aims to make A/B and A* possessive, and allow linear-time matching and capture. Therefore, there are still two major items on the roadmap before this library can be made a part of Rosie:
- Possessiveness. This will require queries that can find the following:
- The DFA of an NFA
- The states that ought to be demoted (their outgoing arrows ignored)
- The arrows that ought to be ignored
- The NFA resulting from removing those arrows
And interpreters (AST transformers) that can generate
- NFAs from each subexpression of the AST
- DFAs from each sub-nfa
- B substates from each subexpression
- B states from each subexpression
- B subarrows from each subexpression
- B arrows from each subexpression
- The possessive NFA that results from removing the forbidden b arrows.
- Matching subexpressions This library intends to use Danny Dubé and Marc Feely's method of extracting matches from DFAs, described in these two papers: Efficiently building a parse tree from a regular expression Automatic construction of parse trees for lexemes And implemented in SILex
to do this, we need
- A backend that can accept
push
,snoc
, andsel
, instead of DFSTs operating from chars to char - A frontend that emits those instructions instead of DFSTs from char to char
Each interpreter in PEGREG's interpreter chain forms a tagless final interpreter.
And the FST uses a version of finite state transducers.