We are always looking for ways to improve the Cognite Toolkit CLI. You can report bugs and ask questions in our Cognite Hub group.
We are also looking for contributions to new modules (content) and the Toolkit codebase that make the configuration of Cognite Data Fusion easier, faster and more reliable.
If you want to contribute to the codebase, you can do so by creating a new branch and
opening a pull request. Prefix the PR title with the Jira issue
number on the form [CDF-12345]
. A good PR should include a good description of the change to help the reviewer
understand the nature and context of the change.
The Cognite Toolkit CLI and modules have an extensive test and linting battery to ensure quality and speed of development.
See pyproject.toml for the linting and testing configuration.
See tests for more information on how to run and maintain tests.
The cdf_
prefixed modules are tested as part of the product development.
Your local environment needs a working Python installation and a virtual environment. We use poetry
to manage
the environment and its dependencies.
Install pre-commit hooks by running poetry run pre-commit install
in the root of the repository.
When developing in vscode, the cdf-tk-dev.py
file is useful to run the toolkit. This script will set the
environment and paths correctly (to avoid conflicts with the installed cdf package) and also sets the
SENTRY_ENABLED
environment variable to false
to avoid sending errors to Sentry.
In .vscode/launch.json you will see a number of examples of debugging configurations that you can use to debug.
- Main app entry point: cognite_toolkit/_cdf.py
- App subcommands: cognite_toolkit/_cdf_tk/commands
- Resource loaders: cognite_toolkit/_cdf_tk/loaders
- Tests: tests
- CI/CD: .github/workflows
When you develop the Cognite Toolkit you should avoid sending errors to sentry
. You can control sentry
by setting
the environment
variable SENTRY_ENABLED=false
. This is set automatically when you use the cdf-tk-dev.py
.
The official cdf_* modules are owned by the respective teams in Cognite. Any changes to these will be reviewed by the teams to ensure that nothing breaks. If you open a PR on these modules, the PR will be reviewed by the team owning the module.
cdf_infield_location is an example of a team-owned module.
Adding a new module consists of the following steps:
- Determine where to put it (core, common, modules, examples, or experimental).
- Create a new directory for the module with sub-directories per configuration type the module needs. See the YAML reference documentation.
- Add a
default.config.yaml
file to the module root directory if you have variables in the templates. - Add a
README.md
file to the module root directory with a description of the module and variables. - Update
default.packages.yaml
in cognite_toolkit root with the new module if it is part of a package - If this is an official module, add a description of the module in the module and package documentation.
If you are not a Cognite employee and would like to contribute a module, please open an issue, so we can get in touch with you.
Each module should be as standalone as possible, but they can be dependent on other modules. If you need to deploy a data model as a foundational element for both transformations and applications to work, you may add a module with the data model. However, a better module would be one that includes all the elements needed to get data from the source system, through RAW (if necessary), into a source data model, and then transformed by one or more transformations into a domain data model. The solution data models can then be a separate module that relies on the ingestion module.
Please take care to think about the best grouping of modules to make it easy to deploy and maintain. We are aiming at standardizing as much as possible, so we do not optimize for customer-specific changes and naming conventions except where we design to support it.
NOTE! Customer-specific projects should be able to use these templates directly, and also adopt new changes from this repository as they are released. Configurations that contain defaults that are meant to be changed by the customer, e.g. mapping of properties from source systems to CDF, should be contained in separate modules.
All the configurations should be kept in camelCase YAML and in a format that is compatible with the CDF API.
The configuration files are loaded directly into the Python SDK's support data classes for
use towards the CDF API. Client side schema validation should be done in the Python SDK and not in cdf-tk
to ensure that you can immediately
add a yaml configuration property without upcoming anything else than the version of the Python SDK.
NOTE!! As of now, any non-recognised properties will just be ignored by the Python SDK. If you don't get the desired configuration deployed, check your spelling.
The scripts currently support many resources like raw, data models, time series, groups, and transformations. It also has some support for loading of data that may be used as example data for CDF projects. However, as a general rule, templates should contain governed configurations necessary to set up ingest, data pipelines, and contextualisations, but not the actual data itself.
Of course, where data population of e.g. data model is part of the configuration, that is fine. The scripts are continuously under development to simplify management of configurations, and we are pushing the functionality into the Python SDK when that makes sense.
The templates are bundled with the cdf-tk
tool, so they are released together.
To release a new version of the cdf-tk
tool and the templates, you need to do the following:
-
Create a new preparation branch from
main
where you can make the final changes and do version bumping, e.g.prepare_for_0_1_0b3
. UseaX
for alpha,bX
for beta, andrcX
for release candidate:-
Update
CHANGELOG.cdf-tk.md
file with a header e.g.## [0.1.0b3] - 2024-01-12
and review the change comments since the previous release. Ensure that the changes are correctly reflected in the comments and that the changes can be easily understood. Also verify that any breaking changes are clearly marked as such (**BREAKING**
). -
Do the same update to
CHANGELOG.templates.md
file. -
Update the files with the new version number, this is done with the
cdf bump --patch
(or--minor
,--major
,--alpha
,--beta
) command.cognite_toolkit/_version.py
pyproject.toml
_system.yaml
(multiple)
You can use the
python bump --minor --alpha
command to bump the version in all files. -
Run
poetry lock
to update thepoetry.lock
file. -
Run
pytest tests
locally to ensure that tests pass. -
Run
python module_upgrade/run_check.py
to ensure that thecdf-tk modules upgrade
command works as expected. against previous versions. See Module Upgrade for more information.if a check fails due to missing package:
- source .venv/.../bin/activate
- pip install dependency
- deactivate
- run script again
-
-
Get approval to squash merge the branch into
main
:- Verify that all Github actions pass.
-
Create a release branch:
release-x.y.z
frommain
:- Create a new tag on the branch with the version number, e.g.
v0.1.0b3
. - Open a PR with the existing
release
branch as base comparing to your newrelease-x.y.z
branch. - Get approval and merge (do not squash).
- Verify that the Github action
release
passes and pushes to PyPi.
- Create a new tag on the branch with the version number, e.g.
-
Create a new release on github.com with the tag and release notes:
- Find the tag you created and create the new release.
- Copy the release notes from the
CHANGELOG.cdf-tk.md
file, add a# cdf-tk
header. - Copy then further below the release notes from the
CHANGELOG.templates.md
file, add a# Templates
header. - Remember to mark as pre-release if this is not a final release.
-
Evaluate necessary announcements:
- On the Cognite Hub group, create a new post.
- As part of product releases, evaluate what to include.
- Cognite internal announcements.