This repository contains WDL scripts that will be used to run cell segmentation workflows on Terra.
WDL is a language for describing computational workflows. It allows you to define tasks, inputs, outputs, and dependencies. WDL files usually end with a .wdl extension.
Docker is a platform for developing, shipping, and running applications in containers. Containers provide a consistent environment for running software, regardless of the underlying infrastructure. Dockerfiles are used to create Docker images, which are portable snapshots of an application and its dependencies.
MiniWDL is a lightweight, user-friendly implementation of the Workflow Description Language (WDL) for running scientific workflows. It's designed for ease of use and efficiency, suitable for both small and large-scale data processing. MiniWDL supports local and cloud-based execution, making it versatile and scalable for various scientific computing tasks.
-
WDL comprises the following components: workflow, task, and call. Import components/scripts before drafting the workflow block. Specify required inputs, and for optional ones, append a "?" right after the input type (e.g., Float?). Utilize scatter-gather to submit jobs parallelly. In WDL, the following statement must be used to allow the scatter-gather approach on optional inputs, e.g.: "if defined(image_mpp) then select_first([image_mpp]) else 0.0", where "image_mpp" is an optional Float input. The "defined function" checks whether this input has been defined or not. Then, the "select_first function" retrieves the first available or first defined element of the "array" image_mpp; the image_mpp variable has to be displayed as an array because the select_first function just works this way as it accepts only an array. The "else" statement is also apparently necessary, even though, generally, for other programming languages, it is not required to include an "else" statement. Each algorithm (cellpose, deepcell and baysor) is stored in different scripts, thereby modularizing the pipeline slightly.
-
Follow the steps mentioned here to publish the WDL workflow on Dockstore.
-
WDL requires docker image paths for each of its "task" blocks. The images are available on DockerHub in OPP and jishar's account. WDL is simply used to communicate with Terra on how and what to run. Internally, it could employ Python scripts to execute operations (these python scripts are called using bash or smaller pythonic scripts). These Python scripts are stored in specified locations within the respective Docker images.
To build a Docker image locally, you'll need to follow these general steps:
-
Install Docker: Ensure that Docker is installed on your local machine. You can download Docker Desktop for Windows or Mac from the Docker website or install Docker Engine on Linux.
-
Create a Dockerfile: Create a file named
Dockerfile
(without any file extension) in your project directory. This file will contain instructions for building your Docker image. If you are on a Mac, ensure to remove the ".txt" extension completely by navigating to the "Get Info" section of the Dockerfile and manually removing the extension of this text bar. -
Build the Docker Image: Open a terminal or command prompt and type in,
cd {directory_having_dockerfile}
docker build -f Dockerfile -t image_name:tag .
-
Replace
image_name
with the desired name for your Docker image andtag
with a version or tag for the image (usually "latest" is used as a tag). The dot (.
) at the end specifies the build context (current directory). -
To specify a builder, use the following command:
-
template:
docker buildx build --platform {builder} -t {image_name}:{tag} .
-
example:
docker buildx build --platform linux/amd64 -t test_docker_for_tile_version2 .
-
Verify the Image: After the build process completes, you can verify that the Docker image was created successfully by running,
docker images
-
Tag the Image Locally: After building your Docker image locally, use the
docker tag
command to assign a specific tag to the image,docker tag image_name:latest your_username/repository:tag
Replaceimage_name:latest
with the name and tag of your locally built image, andyour_username/repository:tag
with the desired repository name and tag on Docker Hub. -
Push the Tagged Image: Once you have tagged the image with the desired version or tag, you can push it to Docker Hub using the
docker push
command,docker push your_username/repository:tag
This command will push the image with the specified tag (tag
) to your Docker Hub repository (your_username/repository
).
To set up Miniwdl, refer to the guidelines provided here: Miniwdl Installation Guidelines. Begin by testing an example WDL to ensure successful setup.
To test this WDL workflow on your local machine or on a cluster, please clone this repository using either of the following methods:
- Direct Download: Navigate to the repository and click on "Code" at the upper right corner, then select "Download ZIP". Move the downloaded .zip file to your desired directory on your local machine/cluster and extract its contents.
- Git Clone: Execute the command
git clone https://github.com/broadinstitute/stp_segmentation_wdl.git
to clone the repository and extract the WDL scripts onto your local machine/cluster.
Additionally, if you wish to experiment with toy MERSCOPE and Xenium datasets (10,000x10,000 pixels) designed to validate the functionality of the STP cell segmentation pipeline, please contact the STP computational team: STP Computational Team GitHub.
Navigate to the "local_test" directory within the stp_segmentation_wdl repository.
Open a terminal window and enter the following command:
miniwdl run "test_main_script.wdl" --input "inputs.json" --cfg "default.cfg"
test_main_script.wdl
is the workflow script, which includes calls to various tasks, ensuring a cleaner overall script.inputs.json
contains the default inputs for all variables used in the workflow. You can modify this file to suit your test case.default.cfg
provides configuration to enable cache calling, reducing machine burden and time consumption by preventing repetitive calls during debugging.
After the workflow runs successfully, the terminal window will display the addresses of all output files. The inputs, outputs, log files, and other artifacts for each task call will be saved in their respective folders within the current working directory, which in this case is "local_test."
-- Celldega vizualization tool --