Txt (tɛkst) is a collection of code that provides a foundation for text parsers.
This repository contains two libraries:
- Txt.Core
- Txt.ABNF
Txt is still under active development. Pre-release versions are available on MyGet.
Building a parser using Txt begins with defining a text source. A text source can be any class that implements ITextSource
and IDisposable
.
Txt includes ITextSource
implementations for System.String
or System.IO.Stream
.
ITextSource stringTextSource = new StringTextSource("<c>Hello World</c>");
ITextSource streamTextSource = new StreamTextSource(File.OpenRead("data.txt"), Encoding.UTF8);
A text source object is passed to a text scanner, which provides methods for reading and matching character data. A text scanner can be any class that implements ITextScanner
and IDisposable
.
Txt includes a TextScanner
class that is intended to fullfill every need.
ITextScanner textScanner = new TextScanner(stringTextSource);
A text scanner is passed to a Lexer, which reads grammar elements from the text source and converts them to Element
objects. A Lexer can be any class that implements ILexer<T>
. Your program should have one Lexer for every rule in a grammar.
Txt includes an abstract Lexer
class that you should derive from.
See the samples directory for concrete code samples.
An Element is a class representation of a grammar rule. An instance of an Element contains a substring that matches its grammar rule. Your program should have one Element for every rule in a grammar.
Txt includes an abstract Element
class that you must derive from.
See the samples directory for concrete code samples.
An Element is passed to a Parser. A Parser can be any class that implements IParser<TElement, TResult>
.
Txt includes an abstract Parser
class that you may derive from.
See the samples directory for concrete code samples.
One or more parsers are passed to a Walker. A walker is passed to an element. A walker knows how to make sense out of an element tree. It walks the tree and parses individual elements along the way.
Txt includes an abstract Walker
class that you must derive from.
See the samples directory for concrete code samples.
This example uses a grammar that contains two rules: INTEGER and DIGIT. DIGIT is a core ABNF rule.
INTEGER = 1*10DIGIT
A parser implementation would have five important classes:
- An Integer element
- An Integer element lexer
- An Integer element lexer factory
- An Integer element parser
- An Integer element walker
public class Integer : Repetition
{
public Integer(Repetition repetition)
: base(repetition)
{
}
}
public class IntegerLexer : CompositeLexer<Repetition, Integer>
{
public IntegerLexer(ILexer<Repetition> innerLexer)
: base(innerLexer)
{
}
}
public class IntegerLexerFactory : ILexerFactory<Integer>
{
private readonly ILexer<Digit> digitLexer;
private readonly IRepetitionLexerFactory repetitionLexerFactory;
public IntegerLexerFactory(IRepetitionLexerFactory repetitionLexerFactory, ILexer<Digit> digitLexer)
{
this.repetitionLexerFactory = repetitionLexerFactory;
this.digitLexer = digitLexer;
}
public ILexer<Integer> Create()
{
return new IntegerLexer(repetitionLexerFactory.Create(digitLexer, 1, 10));
}
}
public class IntegerParser : Parser<Integer, int>
{
protected override int ParseImpl(Integer integer)
{
return int.Parse(integer.Text);
}
}
public class IntegerWalker : Walker
{
public void Enter(Integer integer)
{
Console.WriteLine("Entering the Integer");
}
public void Enter(Digit digit)
{
Console.WriteLine("Entering a Digit at position " + digit.Context.Offset);
}
public void Exit(Integer integer)
{
Console.WriteLine("Exiting the Integer");
}
public void Exit(Digit digit)
{
Console.WriteLine("Exiting the Digit");
}
public bool Walk(Integer integer)
{
var parser = new IntegerParser();
Console.WriteLine("The integer is " + parser.Parse(integer));
return base.Walk(integer);
}
public bool Walk(Digit digit)
{
var parser = new DigitParser();
Console.WriteLine("The digit is " + parser.Parse(digit));
return base.Walk(digit);
}
}
StringTextSource example:
string input = "2147483647";
var repetitionLexerFactory = new RepetitionLexerFactory();
var valueRangeLexerFactory = new ValueRangeLexerFactory();
var digitLexerFactory = new DigitLexerFactory(valueRangeLexerFactory);
var digitLexer = digitLexerFactory.Create();
var integerLexerFactory = new IntegerLexerFactory(repetitionLexerFactory, digitLexer);
var integerLexer = integerLexerFactory.Create();
var integerParser = new IntegerParser();
using (ITextSource textSource = new StringTextSource(input))
using (ITextScanner textScanner = new TextScanner(textSource))
{
var result = integerLexer.Read(textScanner);
if (result == null)
{
throw new FormatException();
}
int value = integerParser.Parse(result);
}
StreamTextSource:
File.WriteAllText("input.txt", " 2147483647");
var repetitionLexerFactory = new RepetitionLexerFactory();
var valueRangeLexerFactory = new ValueRangeLexerFactory();
var digitLexerFactory = new DigitLexerFactory(valueRangeLexerFactory);
var digitLexer = digitLexerFactory.Create();
var integerLexerFactory = new IntegerLexerFactory(repetitionLexerFactory, digitLexer);
var integerLexer = integerLexerFactory.Create();
var integerParser = new IntegerParser();
using (Stream fileStream = File.OpenRead("input.txt"))
using (PushbackInputStream inputStream = new PushbackInputStream(fileStream))
using (ITextSource textSource = new StreamTextSource(inputStream, Encoding.UTF8))
using (ITextScanner textScanner = new TextScanner(textSource))
{
var result = integerLexer.Read(textScanner);
if (result == null)
{
throw new FormatException();
}
int value = integerParser.Parse(result);
}
Notes:
In any real application you should use a DI container to manage all the dependencies between lexer and lexer factory classes.
The PushbackInputStream
wrapper class exists to enable support for forward-only streams like System.Net.Sockets.NetworkStream
.
The ReadResult<>
object contains properties that describe the read operations:
Success
indicates whether the read operation succeededElement
contains the grammar element ifSuccess
istrue
EndOfInput
indicates whether enough characters were available before the end of inputText
contains the matched text.- If
Success
isfalse
then this is only a partial match.
- If
ErrorText
contains the mismatched text ifSuccess
isfalse
andEndOfInput
isfalse