Skip to content

Latest commit

 

History

History
200 lines (150 loc) · 7.75 KB

CONTRIBUTING.md

File metadata and controls

200 lines (150 loc) · 7.75 KB

Contributing to quamina

Basics

Most of this document is concerned with the mechanics of raising issues and posting Pull Requests to offer improvements to Quamina. Following this, there is a section entitled Developing that describes technology issues that potential contributors will face and tools that might be helpful.

Quamina is hosted in this GitHub repository at github.com/timbray/quamina and welcomes contributions.

Typically, the first step in making a change is to raise an Issue to allow for discussion of the idea. This is important because possibly Quamina already does what you want, in which case perhaps what’s needed is a documentation fix. Possibly the idea has been raised before but failed to convince Quamina’s maintainers. (Doesn't mean it won’t find favor now; times change.)

Assuming there is agreement that a change in Quamina is a good idea, the mechanics of forking the repository, committing changes, and submitting a pull request are well-described in many places; there is nothing unusual about Quamina.

Code Style

The coding style suggested by the Go community is used in Quamina. See the style doc for details.

Try to limit column width to 120 characters for both code and Markdown documents such as this one.

Format of the Commit Message

We follow the conventions described in How to Write a Git Commit Message.

Be sure to include any related GitHub issue references in the commit message, e.g. Closes: #<number>.

The CHANGELOG.md and release page uses commit message prefixes for grouping and highlighting. A commit message that starts with [prefix:] will place this commit under the respective section in the CHANGELOG.

The following example creates a commit referencing the issue: 1234 and puts the commit message in the pat CHANGELOG section:

git commit -s -m "pat: Add complex-number predicate" -m "Closes: #1234"

Currently the following prefixes are used:

  • api: - Use for API-related changes
  • pat: - Use for changes to the Quamina pattern language
  • chore: - Use for repository related activities
  • fix: - Use for bug fixes
  • kaizen: - Use for code improvements or performance optimization
  • docs: - Use for changes to the documentation

If your contribution falls into multiple categories, e.g. api and pat it is recommended to break up your commits using distinct prefixes.

Signing commits

Commits should be signed (not just the -s “signed off on”) with any of the styles GitHub supports. Note that you can use git config to arrange that your commits are automatically signed with the right key.

Running Tests

In any repo subdirectory, go test runs unit tests with all the defaults, which is a decent check for basic sanity and correctness.

Running the following command in the root repository runs all the available tests with race-detection enabled, and is an essential step before submitting any changes:

go test -race -v -count 1

The following command runs the Go linter; submissions need to be free of lint errors.

golangci-lint run  

At the moment we don’t have a script for running this in all the Quamina subdirectories so you’ll have to do this by hand. golangci-lint has a home page with instructions for installing it.

Rebuilding the Case-folding Table

Quamina's ignore-case patterns rely on mappings found in the generated source file case_folding.go. Quamina includes a program called code_gen in the code_gen/ directory. There is a Makefile whose only function is to check the mapping file and rebuild it if it is older than three months, because a Unicode version release may have added mappings.

As a result, it is a good practice, sometime in the process of building and submitting a PR, to type make at some point, which will rebuild and re-run code_gen; that program will display a message saying whether or not it rebuilt the case-folding mappings. If it did rebuild those mappings, please include the generated case_folding.go source in your commmit and PR.

Reporting Bugs and Creating Issues

When opening a new issue, try to roughly follow the commit message format conventions above.

Developing

Automata

Quamina works by compiling the Patterns together into a Nondeterministic Finite Automaton (NFA) which proceeds byte-at-a-time through the UTF-encoded fields and values. NFAs are nondeterministic in the sense that a byte value may cause multiple transitions to different states.

The general workflow, for some specific pattern type, is to write code to build an automaton that matches that type. Examples are the functions makeStringFA() in value_matcher.go and makeShellStyleAutomaton() in shell_style.go. Then, insert calls to the automaton builder in value_matcher.go, which is reasonably straightforward code. It takes care of merging new automata with existing ones as required.

Testing

A straightforward way to test a new feature is exemplified by TestLongCase() in shell_style_test.go:

  1. Make a coreMatcher by calling newCoreMatcher()
  2. Add patterns to it by calling addPattern()
  3. Make test data and examine matching behavior by calling matchesForJSONEvent()

Prettyprinting NFAs

NFAs can be difficult to build and to debug. For this reason, code is provided in prettyprinter.go which produces human-readable NFA representations.

To use the prettyprinter, make an instance with newPrettyPrinter() - the only argument is a seed used to generate state numbers. Then, instead of calling addPattern(), call addPatternWithPrinter(), passing your prettyprinter into the automaton-building code. New automata are created by valueMatcher calls, see value_matcher.go. Ensure that the prettyprinter is passed to your automaton-matching code; an example of this is in the makeShellStyleAutomaton() function. Then, in your automaton-building code, use prettyprinter.labelTable() to attach meaningful labels to the states of your automaton. Then at some convenient point, call prettyprinter.printNFA() to generate the NFA printout; real programmers debug with Print statements.

Prettyprinter output

makeShellStyleAutomaton() code has prettyprinter call-outs to label the states and transitions it creates, and the TestPP() test in prettyprinter_test.go uses this. The pattern being matched is "x*9" and the prettyprinter output is:

 758 [START HERE] '"' → [910 on " at 0]
 910 [on " at 0] 'x' → [821 gS at 2]
 821 [gS at 2] '9' → [551 gX on 9 at 3] / ★ → [821 gS at 2]
 551 [gX on 9 at 3] '"' → [937 on " at 4] / '9' → [551 gX on 9 at 3] / ★ → [821 gS at 2]
 937 [on " at 4] '9' → [551 gX on 9 at 3] / 'ℵ' → [820 last step at 5] / ★ → [821 gS at 2]
 820 [last step at 5]  [1 transition(s)]

Each line represents one state.

Each step gets a 3-digit number and a text description. The construct ★ → represents a default transition, which occurs in the case that none of the other transitions match. The symbol represents the end of the input value.

In this particular NFA, the makeShellStyleAutomaton code labels states corresponding to the * "glob" character with text including gS for "glob spin" and states that escape the "glob spin" state with gX for "glob exit".

Most of the NFA-building code does not exercise the prettyprinter. Normally, you would insert such code while debugging a particular builder and remove it after completion. Since the shell-style builder is unusually complex, the prettyprinting code is retained in anticipation of future issues and progress to full regular-expression NFAs.