Most of this document is concerned with the mechanics of raising issues and posting Pull Requests to offer improvements to Quamina. Following this, there is a section entitled Developing that describes technology issues that potential contributors will face and tools that might be helpful.
Quamina is hosted in this GitHub repository
at github.com/timbray/quamina
and welcomes
contributions.
Typically, the first step in making a change is to raise an Issue to allow for discussion of the idea. This is important because possibly Quamina already does what you want, in which case perhaps what’s needed is a documentation fix. Possibly the idea has been raised before but failed to convince Quamina’s maintainers. (Doesn't mean it won’t find favor now; times change.)
Assuming there is agreement that a change in Quamina is a good idea, the mechanics of forking the repository, committing changes, and submitting a pull request are well-described in many places; there is nothing unusual about Quamina.
The coding style suggested by the Go community is used in Quamina. See the style doc for details.
Try to limit column width to 120 characters for both code and Markdown documents such as this one.
We follow the conventions described in How to Write a Git Commit Message.
Be sure to include any related GitHub issue references in the commit message,
e.g. Closes: #<number>
.
The CHANGELOG.md
and release page uses commit message
prefixes for grouping and highlighting. A commit message that
starts with [prefix:]
will place this commit under the respective
section in the CHANGELOG
.
The following example creates a commit referencing the issue: 1234
and puts
the commit message in the pat
CHANGELOG
section:
git commit -s -m "pat: Add complex-number predicate" -m "Closes: #1234"
Currently the following prefixes are used:
api:
- Use for API-related changespat:
- Use for changes to the Quamina pattern languagechore:
- Use for repository related activitiesfix:
- Use for bug fixeskaizen:
- Use for code improvements or performance optimizationdocs:
- Use for changes to the documentation
If your contribution falls into multiple categories, e.g. api
and pat
it
is recommended to break up your commits using distinct prefixes.
Commits should be signed (not just the -s
“signed off on”) with
any of the styles GitHub supports.
Note that you can use git config
to arrange that your commits are
automatically signed with the right key.
In any repo subdirectory, go test
runs unit tests
with all the defaults, which is a decent check for basic
sanity and correctness.
Running the following command in the root repository runs all the available tests with race-detection enabled, and is an essential step before submitting any changes:
go test -race -v -count 1
The following command runs the Go linter; submissions need to be free of lint errors.
golangci-lint run
At the moment we don’t have a script for running this
in all the Quamina subdirectories so you’ll have to do
this by hand. golangci-lint
has a home page with
instructions for installing it.
Quamina's ignore-case
patterns rely on mappings found
in the generated source file case_folding.go
. Quamina
includes a program called code_gen
in the code_gen/
directory. There is a Makefile
whose only function is
to check the mapping file and rebuild it if it is older
than three months, because a Unicode version release may
have added mappings.
As a result, it is a good practice, sometime in the process
of building and submitting a PR, to type make
at some
point, which will rebuild and re-run code_gen
; that program
will display a message saying whether or not it rebuilt the
case-folding mappings. If it did rebuild those mappings, please
include the generated case_folding.go
source in your commmit
and PR.
When opening a new issue, try to roughly follow the commit message format conventions above.
Quamina works by compiling the Patterns together into a Nondeterministic Finite Automaton (NFA) which proceeds byte-at-a-time through the UTF-encoded fields and values. NFAs are nondeterministic in the sense that a byte value may cause multiple transitions to different states.
The general workflow, for some specific pattern type, is to write code to build
an automaton that matches that type. Examples are the functions makeStringFA()
in
value_matcher.go
and makeShellStyleAutomaton()
in shell_style.go
. Then,
insert calls to the automaton builder in value_matcher.go
, which is reasonably
straightforward code. It takes care of merging new automata with existing ones
as required.
A straightforward way to test a new feature is exemplified by TestLongCase()
in
shell_style_test.go
:
- Make a
coreMatcher
by callingnewCoreMatcher()
- Add patterns to it by calling
addPattern()
- Make test data and examine matching behavior by calling
matchesForJSONEvent()
NFAs can be difficult to build and to debug. For this reason, code
is provided in prettyprinter.go
which produces human-readable NFA
representations.
To use the prettyprinter, make an instance with newPrettyPrinter()
- the only
argument is a seed used to generate state numbers. Then, instead of calling
addPattern()
, call addPatternWithPrinter()
, passing your prettyprinter into
the automaton-building code. New automata are created by valueMatcher
calls,
see value_matcher.go
. Ensure that the prettyprinter is passed to your
automaton-matching code; an example of this is in the makeShellStyleAutomaton()
function. Then, in your automaton-building code, use prettyprinter.labelTable()
to attach meaningful labels to the states of your automaton. Then at
some convenient point, call prettyprinter.printNFA()
to generate the NFA printout;
real programmers debug with Print statements.
makeShellStyleAutomaton()
code has prettyprinter
call-outs to
label the states and transitions it creates, and the TestPP()
test in
prettyprinter_test.go
uses this. The pattern being matched is "x*9"
and
the prettyprinter output is:
758 [START HERE] '"' → [910 on " at 0]
910 [on " at 0] 'x' → [821 gS at 2]
821 [gS at 2] '9' → [551 gX on 9 at 3] / ★ → [821 gS at 2]
551 [gX on 9 at 3] '"' → [937 on " at 4] / '9' → [551 gX on 9 at 3] / ★ → [821 gS at 2]
937 [on " at 4] '9' → [551 gX on 9 at 3] / 'ℵ' → [820 last step at 5] / ★ → [821 gS at 2]
820 [last step at 5] [1 transition(s)]
Each line represents one state.
Each step gets a 3-digit number and a text description. The construct ★ →
represents
a default transition, which occurs in the case that none of the other transitions match. The
symbol ℵ
represents the end of the input value.
In this particular NFA, the makeShellStyleAutomaton
code labels states corresponding to
the *
"glob" character with text including gS
for "glob spin" and states that escape the
"glob spin" state with gX
for "glob exit".
Most of the NFA-building code does not exercise the prettyprinter. Normally, you would insert such code while debugging a particular builder and remove it after completion. Since the shell-style builder is unusually complex, the prettyprinting code is retained in anticipation of future issues and progress to full regular-expression NFAs.