This section covers how to setup your development environment as a contributor to OS-Climate Data Commons developing a data ingestion or processing pipeline. The setup is based around use of OS-Climate GitHub, and a Jupyter Hub / Elyra service provided as a management development platform built to support the needs of our contributors.
In order to have a standardized structure that can be easily understood by data scientists, devops engineers and developers, repositories should be created by using one of the project templates:
You can click the Use the template
button provided in the repository and create the structure for your repo this way. Take care to select OS-Climate as the owner; the default is to create under your own GitHub ID, which may not be your intention if you are contributing to OS-Climate.
Having a defined structure in a project ensures all the pieces required for the ML and DevOps lifecycles are present and easily discoverable and allows managing library dependencies, notebooks, test data, documentation, etc. For more information on this topic, we recommend reading and understanding the Cookiecutter Data Science documentation, on which our standard repository template is inspired.
- Add the APL 2.0 Open Source license to your repository. This is done by going into your repository, creating a new file called "LICENSE", clicking the button
Choose a license template
on the right, selecting the Apache License 2.0 template and then committing the proposed change.
-
Explain what the repository is about in the "README.md" file created in the repository root.
-
Document the contribution structure of your repository for you and your trusted collaborators in the "OWNERS" file found in the repository root.
-
Confirm whether DCO / CLA covers this repository.
With your GitHub credentials and once you are part of the team odh-env-users, you will be able to access the development environment.
- Click this link to access and select githubidp for authentication.
- Select the image called
Elyra Notebook Image
andLarge
for container size.
- Your server should start automatically after a couple of minutes and the Jupyter launcher appear.
From the File menu, create a new text file called credentials.env
. You can copy this file from this link. This example file includes links the JWT token retrieval client for Trino access.
To secure and restrict the access to data based on user profiles, we have defined role-based accessc controls to specific schemas in Trino based on your team assignments. Therefore, authentication with the Trino service has been federated with GitHub SSO and on a weekly basis you will need to retrieve a JWT token from this Token Retrieval Client. Get the token and cut / paste the token string as your TRINO_PASSWD in the credentials file.
Once you are in the Jupyterlab UI, you can use the Git extension provided to clone this repo.
- Click the Git extension button from Jupyterlab UI and select
Clone a repository
:
-
Enter the HTTPS address of the repository you want to clone. If it is private and you have access, enter your credentials when requested.
-
You are ready to go!