Welcome to the DoltgreSQL repository! This guide outlines how to get started, in addition to how to modify the code. We have a standard guideline that we try to adhere to for the project, and this guide will go over all of those. I will assume familiarity with the Go toolchain.
- Star the repository: Since you're about to contribute to the project, why not give us a star to show your support? �
- Install the latest Go: We generally attempt to stay up-to-date with the latest version of Go. For the specific version in use, you may check the go.mod file.
- Install PSQL 15: Although we do not use PostgreSQL, we do use the PSQL client (for PostgreSQL 15) for testing. In addition, it is worthwhile to have a local PostgreSQL instance installed to cross-check behavior, especially when writing tests (which should always be verified by a PostgreSQL 15 instance).
- Clone the repository: Clone the repository to a local directory of your choice.
- Build the parser: Run the
doltgresql/postgres/parser/build.sh
script. This creates a few files within thedoltgresql/postgres/parser/parser
directory that are necessary for parsing PostgreSQL statements. It is recommended to run this file every time you pull changes into your local repository, as these generated files are not included since they would cause near guaranteed merge conflicts. - Run Go tests: Before building the project, you should always run all of the tests, which can be done by running
go test ./... --count=1
from the source root directory. This ensures that all Go tests pass, which also ensures that your Go environment is installed and configured correctly. - Build the binary: From the source root directory, run
go build -o <bin_name> ./cmd/doltgres
, where<bin_name>
is the name of the binary (usuallydoltgres
ordoltgres.exe
). To run the program without creating an executable, rungo run ./cmd/doltgres
. - Run Bats tests: We make use of Bats for all end-user-style tests.
Assuming you have NPM installed, you can install Bats by running
npm install -g bats
. Then, navigate todoltgresql/testing/bats
and runbats .
, which will run all of the Bats tests in the directory. An alternative is to use BashSupport Pro, which is cross-platform and used by several developers. Additionally, our Bats tests assume that you have adoltgresql
(notdoltgres
) binary on your PATH. For Windows users, this means that the binary should not end with the.exe
file extension. Remember to recompile the executable on your PATH whenever you want to re-test any Bats tests. - Change the data directory: This is optional but recommended.
By default, we create databases within the
~/doltgres/databases
directory. For developmental purposes, you may want to change this behavior. You have two options:- Set the
DOLTGRES_DATA_DIR
environment variable to a different directory. A value of.
causes DoltgreSQL to use the current directory as the data directory, so you can have multiple data directories simply by running the program in different directories. This behavior is more consistent with Dolt's behavior. This is the recommended option for development. - Specify the directory in the
--data-dir
argument. This overrides the environment variable if it is present.
- Set the
All of the tooling in this repository is designed for a Unix-like environment. Some of the lead project developers are on Windows, however they all prefer the Unix commandline environment over Command Prompt and Powershell. These are a few alternatives that are used to provide a Unix environment:
- Git Bash: The Mintty client that may be installed alongside Git.
- WSL 1: The Linux subsystem for Windows, specifically version 1. We've attempted to use version 2 with Dolt, however we ran into a few issues. It's possible that it works, but since we do not test or support it, we heavily suggest sticking to version 1 instead.
- Cygwin: Cygwin is a full Unix-like environment for Windows. Unlike WSL, which runs a full Linux kernel within Windows, Cygwin emulates Unix through native Windows processes. This has not been extensively tested, however some have had success using it for development.
- Format the repository: You can do this by running
doltgresql/scripts/format_repo.sh
. This will reformat any changes to adhere to our preferred style. This is required, otherwise the pull request's checks will not pass. - Run all tests: You should ensure that all tests are running correctly. All changes are required to have passing tests before they're merged. Additionally, most changes should have tests. We will not accept submissions that do not properly test to verify their behavior.
- Create a Pull Request: Create your pull request and base it against the main branch. All pull requests should have a description stating what was fixed, and possibly why it was broken. If a specific issue is being fixed, then that issue should be linked to from the PR. In some cases, the PR is not properly tagged from within the issue, so refer to this link for manually linking the PR within the issue.
- Address Feedback: If any feedback was provided on the PR, then it must be addressed before it may be merged. In some cases, an approving review will not be given until we've re-examined the pull request.
- Merge the Pull Request: Congratulations! You're now an official contributor of the DoltgreSQL project.
Here are a few rules that apply to the entire codebase:
- Comment your code as thoroughly as possible. It is better to over-comment your code than under-comment. All functions require a function comment, even if it's just stating that it implements an interface.
- Remove dead code. Don't just put it in a comment block. It will still be in the source history, so it can be retrieved at a later time if needed. Commenting it out will potentially confuse another developer, who may think that it has some contextual significance. If it does, then it should be a proper comment rather than just commented-out code.
- File names use snake case.
- Leave a
//TODO:
comment when there is more work left to be done in an area. This allows us to keep track of incomplete implementations, so that we don't assume a function is "complete". A common source of bugs is when another integrator assumes that a function is completely implemented, and interacts with that function under that assumption. If they find a//TODO:
, then it may clue them in, and/or assist with debugging efforts.
Some packages have rules that only apply to those packages. This is generally due to the size of those packages being larger than average, or some other unique reason that doesn't necessarily apply to the rest of the codebase.
The nested parser
naming may be confusing, but they're two distinct levels of the parser.
The first level, postgres/parser
, contains all of the code for parsing a SQL statement into an AST.
The second level, postgres/parser/parser
, specifically contains the YACC file, along with the generated build files to support the grammar contained in the YACC file.
The parser has been adapted from CockroachDB, using the most recent commit that falls under the Apache 2.0 License (see the BSL).
As such, their README goes over how it's all structured, and we intend to continue in their style.
This means that we should continue to add the %HELP
comments for statements, mark statements as errors when needed, etc.
This differs from our Vitess fork, which is used by GMS and Dolt.
All Bats tests must follow this general structure:
@test "file-name: test name" {
# Test Contents
}
file-name
is the name of the file, without the .bats
file extension.
test name
is the unique name of the test within that file.
You may surround the name portion with either single quotes '
or double quotes "
, although double quotes may cause issues if the test name includes a backslash.
Tests within testing/go
are modeled after engine tests in GMS.
One key deviation is the Focus field.
Unlike in GMS, we do not have a skipped test that a developer would locally unskip and fill in with their test details.
This is by design, as such tests may never become an actual test within a file.
To support this workflow, the Focus field was added.
Whenever this field is set to true
, only focused tests are run.
This means that the developer is required to write a valid test, and then they may focus on that test as though it were the only test.
Once the developer is done, they'll simply delete the Focus line (defaults to false
), and all of the tests will run.
This extends beyond adding new tests, as it allows a developer to focus on failing tests too. Let's say that two specific tests are failing, you can apply Focus to both of those tests, and only those tests will run. Yes, Focus works on more than a single test. In fact, if Focus is present on any tests, then it changes the testing mode such that only tests with Focus are run.
This has a major benefit over commenting out other tests too, in that it's safe for use with GitHub Actions. If a developer accidentally forgets to disable Focus, then the actions will fail, and report that Focus must be disabled. Compare this with commenting out tests, where a developer forgetting to uncomment the tests will rely on their code reviewers to catch the mistake. If this occurs after the code review, then it may make it into the main branch, where it may remain undiscovered for years. This has almost happened multiple times in GMS, so this method of testing is a far safer alternative.