This repo contains configuration and code for Prefect workflows to move data and run computing tasks. Many of the workflows use Globus for data movement between local servers and back and forth to NERSC.
These flows can be run from the command line or built into a self-contained docker container.
$ git clone [email protected]:als-computing/splash_flows_globus.git
$ cd splash_flows_globus
$ pip3 install -e .
Use .env.example
as a template.
GLOBUS_CLIENT_ID=<globus_client_id>
GLOBUS_CLIENT_SECRET=<globus_client_secret>
PREFECT_API_URL=<url_of_prefect_server>
PREFECT_API_KEY=<prefect_client_secret>
Name | Description | Status | Notes |
---|---|---|---|
move.py |
Move data from spot832 to data832, schedule pruning, and ingest into scicat | Deployed | Details |
prune.py |
Run data pruning flows as scheduled | Deployed | |
alcf.py |
Run tomography reconstruction Globus Compute Flows at ALCF | Deployed | |
nersc.py |
Run tomography reconstruction using SFAPI at NERSC | WIP | |
dispatcher.py |
Dispatch flow to control beamline subflows | WIP | |
create_deployments_<bl>.sh |
Deploy functions as prefect flows. Run once, or after updating underlying flow code | ||
globus/flows.py and globus/transfer.py |
Connect Python with globus API – could use better error handling | ||
scripts/check_globus_compute.py |
Check if CC has access to compute endpoint and if it is available | ||
scripts/check_globus_transfer.py |
Check if CC has r/w/d access to an endpoint. Also ability to delete data | ||
source scripts/login_to_globus_and_prefect.py |
Login to globus/prefect in current terminal from .env file | ||
init_<data_task>_globus_flow.py |
Register and update globus flow UUIDs in Prefect | ||
init_<data_task>_globus_flow.py |
Register and update globus flow UUIDs in Prefect | ||
orchestration/tests/<pytest_scripts>.py |
Test scripts using pytest |
globus:
globus_endpoints:
spot832:
root_path: /
uri: spot832.lbl.gov
uuid: 44ae904c-ab64-4145-a8f0-7287de38324d
These are meant to be run on flow-prd
, in bl832_agent
with properly set .env
variables (i.e. prefect id/secret, globus id/secret, ..)
General anatomy of a file:
prefect deployment build <path_of_file>:<prefect_function> -n 'name_of_the_workflow' -q <tag>
prefect deployment apply <prefect_function>-deployment.yaml
Example: The following creates a Prefect workflow for the function of process_new_file
in file of ./orchestration/flows/bl7012/move.py
prefect deployment build ./orchestration/flows/bl7012/move.py:process_new_file -n 'process_newdata7012' -q bl7012
prefect deployment apply process_new_file-deployment.yaml
Below is the command to start the Prefect workflow:
python -m orchestration.flows.bl7012.move <Relative path of file respect to the root_path defined in Globus endpoint>
An example is shown example.ipynb
to submit a PREFECT workflow to PREFECT server.
Once the job is submitted, a workflow agent is needed to work on jobs in queue. A workflow agent can be launched by:
prefect agent start -q <name-of-work-queue>
It requires to have the PREFECT_API_URL
and PREFECT_API_KEY
stored as an environment variables, such that the agent knows where to get the work queue. Once the agent is launched, the following message indicates where the agent is currently listening to.
Starting v2.7.9 agent connected to https://.../api...
___ ___ ___ ___ ___ ___ _____ _ ___ ___ _ _ _____
| _ \ _ \ __| __| __/ __|_ _| /_\ / __| __| \| |_ _|
| _/ / _|| _|| _| (__ | | / _ \ (_ | _|| .` | | |
|_| |_|_\___|_| |___\___| |_| /_/ \_\___|___|_|\_| |_|
Agent started! Looking for work from queue(s): <name-of-work-queue>...