A .NET library for text matching using a SQL like syntax with boolean operators.
var query = @"('foo' AND 'ba\'r') OR NOT 'baz'";
var matcher = TextMatcher.New(query, StringComparison.Ordinal);
// True
Console.WriteLine(matcher.Matches("test tost tast tust tist"));
// False
Console.WriteLine(matcher.Matches("test tost tast tust baz tist"));
// False
Console.WriteLine(matcher.Matches("test foo tost tast tust baz tist"));
// False
Console.WriteLine(matcher.Matches("test bar tost tast tust baz foo tist"));
// True
Console.WriteLine(matcher.Matches("test ba'r tost tast tust baz foo tist"));
var query = "('foo' AND error 'bar') OR NOT 'baz'";
// Throws UnexpectedCharacterException("Unexpected character 'e' at position '11'")
var matcher = TextMatcher.New(query, StringComparison.Ordinal);
var query = "('foo' AND 'bar') OR AND NOT 'baz'";
// Throws UnexpectedTokenException("Unexpected token 'And' at position '21'")
var matcher = TextMatcher.New(query, StringComparison.Ordinal);
var query = "('foo' AND 'bar') OR";
// Throws IncompleteExpressionException()
var matcher = TextMatcher.New(query, StringComparison.Ordinal);
The used syntax is comparable with that of SQL. Currently the boolean AND
, OR
and NOT
operators are implemented and logical grouping is done with braces (
)
. Literals are enclosed in single quotes '
and single quotes can be used in literals when escaped with a backslash \'
.
The interpreter is implemented as a simple LR parser and gets fed by a basic lexer, which converts the raw input string into tokens.
The parser, based on the reduction rules, reduces the tokens into a LINQ expressions tree. When parsing is done, the GetCompiledExpression method can be used to compile the LINQ expression tree and return it as a Func<string, StringComparison, bool>
.
The expression tree is a simple tree of boolean operators and IndexOf()
calls. Every IndexOf()
call can potentially cause a full text scan but by ordering the query to make the most of short-circuit evaluation it is posible to minimize the number of IndexOf()
calls.
- A more optimized implementation like a single text scan instead of a
IndexOf()
call per literal - Support for regex like wildcards, for example:
.
for any character or*
for any number of characters - An order operator, for example:
'literal a' BEFORE 'literal b'