This document will guide you through setting up the yaml generator in your environment.
Create your working environment and load the required packages.
- Create a virtual environment. For python 2.7, try
virtualenv <env_folder_name>
- Source your virtual environment.
source <env_folder_name>/bin/activate
- Ensure pip is up to date.
pip install --upgrade pip
- Install the requirements from the requirements file.
pip install -r requirements.txt
Set up your google account credentials and API access. This document will go through the process of setting up the google drive requirements. You may study google's documentation for more in-depth knowledge.
- Ensure you are logged into the Google project that will host the spreadsheet.
- Using a web browser, navigate to google libraries at: https://console.developers.google.com/apis/library.
- Search for
Google Sheets API
. - Select the Google Sheets API and enable it.
- Using a web browser, navigate to https://console.developers.google.com/apis/credentials
- Select the project owning the Google Sheet.
- Use the
Create credentials
button to create anOAuth Client ID
. - You may have to create a consent screen for your project. Provide a name and any other requirements you would like to place on the screen. Then go back to the main credentials screen.
- Select a type of Other and give it a name.
- Download the client secret key.
Alternatively, you may use a service account key if you prefer.
To execute the yaml generator:
- Ensure your virtual environment is active.
source <env_name>/bin/activate
- To create a yaml file for a single spreadsheet, in the root directory, type:
PYTHONIOENCODING=UTF-8 PYTHONPATH=./cdr_data_dictionary:$PYTHONPATH python cdr_data_dictionary/generate_yaml.py -k <path_to_client_secret_key> -i <URL identifier of the Google Sheet> -s "<Sheet Name>" --cdr-version <CDR version spreadsheet correlates with> -c
- To create a yaml file for all spreadsheets, type:
PYTHONIOENCODING=UTF-8 PYTHONPATH=./cdr_data_dictionary:$PYTHONPATH python cdr_data_dictionary/generate_yaml.py -k <path_to_client_secret_key> -i <URL identifier of the Google Sheet> --cdr-version <CDR version spreadsheet correlates with> -c
This will generate a yaml file for the sheet identified by the URL identifier and the sheet name. For example, if the URL of the sheet is: https://docs.google.com/spreadsheets/d/1qDTcz5tUnEGFoZfyPPtp-2mF6U/edit, then the identifier is: 1qDTcz5tUnEGFoZfyPPtp-2mF6U. If the document contained sheets named: Field Gen
, Row Sup
, you would generate a yaml file for the Row Sup
sheet for CDR version R1 with the command line: PYTHONIOENCODING=UTF-8 PYTHONPATH=./cdr_data_dictionary:$PYTHONPATH python cdr_data_dictionary/generate_yaml.py -k <path_to_client_secret_key> -i 1qDTcz5tUnEGFoZfyPPtp-2mF6U -s "Row Sup" --cdr-version R1
Other optional command line arguments exist and can be viewed with: PYTHONPATH=./cdr_data_dictionary:$PYTHONPATH python cdr_data_dictionary/generate_yaml.py -h
. These optional arguments allow you to define a range of cells in the sheet to read and the column to use when grouping.
To execute the yaml generator:
- Ensure your virtual environment is active.
source <virtual_env_name>/Scripts/activate
- To create a yaml file for a single spreadsheet, in the root directory, type:
PYTHONIOENCODING=UTF-8 PYTHONPATH=./;./cdr_data_dictionary;%PYTHONPATH% python cdr_data_dictionary/generate_yaml.py -k <path_to_client_secret_key> -i <URL identifier of the Google Sheet> -s "<Sheet Name>" --cdr-version <CDR version spreadsheet correlates with> -c
- To create a yaml file for all spreadsheets, type:
PYTHONIOENCODING=UTF-8 PYTHONPATH=./;./cdr_data_dictionary;%PYTHONPATH% python cdr_data_dictionary/generate_yaml.py -k <path_to_client_secret_key> -i <URL identifier of the Google Sheet> --cdr-version <CDR version spreadsheet correlates with> -c
This will generate a yaml file for the sheet identified by the URL identifier and the sheet name. For example, if the URL of the sheet is: https://docs.google.com/spreadsheets/d/1qDTcz5tUnEGFoZfyPPtp-2mF6U/edit, then the identifier is: 1qDTcz5tUnEGFoZfyPPtp-2mF6U. If the document contained sheets named: Field Gen
, Row Sup
, you would generate a yaml file for the Row Sup
sheet for CDR version R1 with the command line: PYTHONIOENCODING=UTF-8 PYTHONPATH=./;./cdr_data_dictionary;%PYTHONPATH% python cdr_data_dictionary/generate_yaml.py -k <path_to_client_secret_key> -i 1qDTcz5tUnEGFoZfyPPtp-2mF6U -s "Row Sup" --cdr-version R1
Other optional command line arguments exist and can be viewed with: PYTHONPATH=./;./cdr_data_dictionary;%PYTHONPATH% python cdr_data_dictionary/generate_yaml.py -h
. These optional arguments allow you to define a range of cells in the sheet to read and the column to use when grouping.
(yaml_env) $ python cdr_data_dictionary/generate_yaml.py -h
usage: generate_yaml.py [-h] -k KEY_FILE -i SPREADSHEET_ID [-s SHEET_NAME]
[-r RANGE] [-o OUTPUT_FILE] [-g COLUMN_ID]
[-d SCHEMA_FILE] [-l LOG_PATH] [-c] --cdr-version
CDR_VERSION
Google drive yaml prototype. Automatically generates a yaml file from the read
only version of the identified file. Defaults exist for choosing the value to
group around. See https://developers.google.com/sheets/api/quickstart/python
and https://developers.google.com/sheets/api/guides/concepts for information
on configuring required credentials for reading Google Drive files.
optional arguments:
-h, --help show this help message and exit
-k KEY_FILE, --key-file KEY_FILE
Filepath to your service account or client secret key
-i SPREADSHEET_ID, --spreadsheet-id SPREADSHEET_ID
Google spreadsheet ID (as seen in URL)
-s SHEET_NAME, --sheet-name SHEET_NAME
Name of the sheet in the spreadsheet to parse. Enclose
in quotes if the name contains spaces. If not
provided, defaults to all sheets.
-r RANGE, --range RANGE
Range of cells to select in the sheet. If not
specified, returns all cell values.
-o OUTPUT_FILE, --output-file OUTPUT_FILE
Name to give the produced yaml file. Always
overridden, to create CDR_<version>_YYYYMMDD.yaml
-g COLUMN_ID, --group-by COLUMN_ID
Used to create groupings around fields in a tab. Tabs
with 'Field' in the title default to Column B. Tabs
with 'Row' in the title default to Column D. Tabs with
neither in the title default to Column A. Valid
choices are letters A-Z.
-d SCHEMA_FILE, --schema-file SCHEMA_FILE
Path to the schema yaml file. If not provided,
defaults to 'schema.yaml'.
-l LOG_PATH, --log-path LOG_PATH
Specify the log file path and/or name. File name
should end in '.log'. Defaults to
LOGS/generate_yaml.log
-c, --console-log Print logs to the console, in addition to the log
file.
--cdr-version CDR_VERSION
CDR version number. Used as part of output file name.
- CircleCI will run integration tests using
./run_unit_tests.sh
. - CircleCI will run pylint error checks. If errors are encountered, the build will fail.
pylint **/*py --errors-only --persistent=no
- CircleCI will run all pylint checks. If the pylint score drops below 6, code refactoring is recommended.
pylint **/*py --persistent=no
- When adding a tab or updating a tab name, update
cdr_data_dictionary/schema.yaml
to use a list whose name matches the programmatically generated list name. This generally means all characters are lower cased, spaces are replaced with underscores, and(row)_
and(col)_
are replaced with empty strings.