OS-Climate Data Commons Developer Guide: Setup Initial Environment

This section covers how to setup your development environment as a contributor to OS-Climate Data Commons developing a data ingestion or processing pipeline. The setup is based around use of OS-Climate GitHub, and a Jupyter Hub / Elyra service provided as a management development platform built to support the needs of our contributors.

1. Create new GitHub repository for pipeline development

In order to have a standardized structure that can be easily understood by data scientists, devops engineers and developers, repositories should be created by using one of the project templates:

Template for Data Science projects
Template for Data Pipeline projects

You can click the Use the template button provided in the repository and create the structure for your repo this way. Take care to select OS-Climate as the owner; the default is to create under your own GitHub ID, which may not be your intention if you are contributing to OS-Climate.

Having a defined structure in a project ensures all the pieces required for the ML and DevOps lifecycles are present and easily discoverable and allows managing library dependencies, notebooks, test data, documentation, etc. For more information on this topic, we recommend reading and understanding the Cookiecutter Data Science documentation, on which our standard repository template is inspired.

2. Make your repository open-source friendly

Add the APL 2.0 Open Source license to your repository. This is done by going into your repository, creating a new file called "LICENSE", clicking the button Choose a license template on the right, selecting the Apache License 2.0 template and then committing the proposed change.

Explain what the repository is about in the "README.md" file created in the repository root.
Document the contribution structure of your repository for you and your trusted collaborators in the "OWNERS" file found in the repository root.
Confirm whether DCO / CLA covers this repository.

3. Access the development environment with Elyra images on JupyterHub

With your GitHub credentials and once you are part of the team odh-env-users, you will be able to access the development environment.

Click this link to access and select githubidp for authentication.

Select the image called Elyra Notebook Image and Large for container size.

Your server should start automatically after a couple of minutes and the Jupyter launcher appear.

4. Set your credentials environment variables

From the File menu, create a new text file called credentials.env. You can copy this file from this link. This example file includes links the JWT token retrieval client for Trino access.

To secure and restrict the access to data based on user profiles, we have defined role-based accessc controls to specific schemas in Trino based on your team assignments. Therefore, authentication with the Trino service has been federated with GitHub SSO and on a weekly basis you will need to retrieve a JWT token from this Token Retrieval Client. Get the token and cut / paste the token string as your TRINO_PASSWD in the credentials file.

5. Access your repo using Jupyterlab Git Extension

Once you are in the Jupyterlab UI, you can use the Git extension provided to clone this repo.

Click the Git extension button from Jupyterlab UI and select Clone a repository:

Enter the HTTPS address of the repository you want to clone. If it is private and you have access, enter your credentials when requested.
You are ready to go!

Next Step

Explore notebooks and manage dependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setup-initial-environment.md

setup-initial-environment.md

OS-Climate Data Commons Developer Guide: Setup Initial Environment

1. Create new GitHub repository for pipeline development

2. Make your repository open-source friendly

3. Access the development environment with Elyra images on JupyterHub

4. Set your credentials environment variables

5. Access your repo using Jupyterlab Git Extension

Next Step

Files

setup-initial-environment.md

Latest commit

History

setup-initial-environment.md

File metadata and controls

OS-Climate Data Commons Developer Guide: Setup Initial Environment

1. Create new GitHub repository for pipeline development

2. Make your repository open-source friendly

3. Access the development environment with Elyra images on JupyterHub

4. Set your credentials environment variables

5. Access your repo using Jupyterlab Git Extension

Next Step