-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME.qmd
269 lines (192 loc) · 7.66 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
---
title: Viash project template
format: gfm
engine: knitr
toc: false
---
<!-- README.md is generated from README.qmd using quarto render. Please edit that file -->
This repository is a template for setting up a new Viash project, and is part of the [Quickstart](https://viash.io/quickstart) tutorial to learn how to get started with this repository.
## What is Viash?
**Viash** is your go-to script wrapper for building data pipelines from modular software components. All you need is your trusty script and a metadata file to embark on this journey.
Check out some of Viash's key features:
- Code in your [favorite scripting language](/guide/component/create-component.html). Mix and match scripting between multiple components to suit your needs. Viash supports a wide range of languages, including Bash, Python, R, Scala, JS, and C#.
- A **custom Docker container** is auto-generated based on the dependencies you've outlined in your metadata, meaning you don't need to be a Docker expert.
- Viash also generates a **Nextflow module** from your script, so no need to be a Nextflow guru either.
- Effortlessly combine Nextflow modules to design and run scalable, reproducible data pipelines.
- Test every component on your local workstation using the convenient built-in development kit.
```{mermaid}
graph LR
subgraph component [Viash component]
subgraph script [Script]
rlang[R script]
python[Python script]
bash[Bash script]
scriptetc[...]
end
config[Viash config]
end
viash_build[Viash build]
docker_image[Docker image]
executable[Executable]
nextflow[Nextflow workflow]
component --- viash_build --> executable & docker_image & nextflow
docker_image -.-> executable & nextflow
nextflow --dependency--> nextflow
subgraph compute [Compute environment]
direction LR
local[Local execution]
awsbatch[AWS Batch]
googlebatch[Google Cloud Batch]
hpc[HPC]
infraetc[...]
end
nextflow --> compute
```
## Requirements
This guide assumes you've already installed [Viash](https://viash.io/installation), [Docker](https://docs.docker.com/engine/install). and [Nextflow](https://www.nextflow.io/index.html#GetStarted).
## Structure of this template project
To get up and running fast, we provide a [template project](https://github.com/viash-io/viash_project_template) for you to use. It contains four components from the same package as well, which are combined into a Nextflow pipeline as follows:
```{mermaid}
graph TD
input1(file1.tsv) --> B1[/remove_comments/] --> C1[/take_column/] --> Y
input2(file2.tsv)--> B2[/remove_comments/] --> C2[/take_column/] --> Y
Y[combine] --> D[/combine_columns/]
D --> output(output.tsv)
```
This pipeline takes one or more TSV files as input and stores its output in an output folder.
## Example usage
To run the pipeline, first create example input files.
Contents of `resources_test/file1.tsv`:
```{bash echo=FALSE}
cat resources_test/file1.tsv
```
Contents of `resources_test/file2.tsv`:
```{bash echo=FALSE}
cat resources_test/file2.tsv
```
Finally, we also need to create a `params.yaml` file to specify the input files for the pipeline:
Contents of `resources_test/params.yaml`:
```{bash echo=FALSE}
cat resources_test/params.yaml
```
Now run the pipeline:
```bash
nextflow run viash-io/viash_project_template \
-main-script target/nextflow/template/workflow/main.nf \
-r build/main \
-latest \
-profile docker \
-params-file resources_test/params.yaml \
--publish_dir output
```
<details>
<summary>Output</summary>
```{bash echo=FALSE}
# avoid ansi log for proper static output using -ansi-log false
nextflow run viash-io/viash_project_template \
-main-script target/nextflow/template/workflow/main.nf \
-r build/main \
-latest \
-profile docker \
--publish_dir output \
-params-file resources_test/params.yaml \
-ansi-log false
```
</details>
If you have a [Seqera Cloud](https://cloud.seqera.io) compute environment already set up, you can also launch the workflow there:
```bash
cat > params.yaml <<EOF
param_list:
- id: file1
input: s3://my-bucket/file1.tsv
- id: file2
input: s3://my-bucket/file2.tsv
publish_dir: s3://my-bucket/output
EOF
tw launch viash-io/viash_project_template \
--main-script target/nextflow/template/workflow/main.nf \
--revision build/main \
--pull-latest \
--workspace 123456789 \
--compute-env ABCDEFGHIJKLMNOP \
--params-file params.yaml
```
## Extending this template
This template is a great starting point for building your own Viash project. Here's how you can extend it.
### Step 1: Get the template
First create a new repository by clicking the "Use this template" button. If you can't see the "Use this template" button, log into GitHub first.
Next, clone the repository using the following command.
```bash
git clone https://github.com/youruser/my_first_pipeline.git && cd my_first_pipeline
```
Your new repository should contain the following files:
```bash
tree my_first_pipeline
```
<details>
<summary>Output</summary>
```{bash echo=FALSE}
tree .
```
</details>
### Step 2: Build the Viash components
With Viash you can turn the components in `src/` into Dockerized Nextflow modules by running:
```bash
viash ns build --setup cachedbuild --parallel
```
<details>
<summary>Output</summary>
```{bash echo=FALSE}
viash ns build --setup cachedbuild --parallel
```
</details>
This command not only transforms the Viash components in `src/` to Nextflow modules but it also builds the containers when appropriate (starting from the Docker cache when available using the `cachedbuild` argument). Once everything is built, a new **target** directory has been created containing the executables and modules grouped per platform:
```bash
tree target
```
<details>
<summary>Output</summary>
```{bash echo=FALSE}
tree target
```
</details>
### Step 3: Run the pipeline
You can now run the locally built pipeline using the following command:
```bash
nextflow run . \
-main-script target/nextflow/template/workflow/main.nf \
-profile docker \
-params-file resources_test/params.yaml \
--publish_dir output
```
<details>
<summary>Output</summary>
```{bash echo=FALSE}
# avoid ansi log for proper static output using -ansi-log false
nextflow run . \
-main-script target/nextflow/template/workflow/main.nf \
-profile docker \
-params-file resources_test/params.yaml \
--publish_dir output \
-ansi-log false
```
</details>
This will run the different stages of the workflow , with the final result result being stored in a file named **run.combine_columns.output** in the output directory `output`:
```bash
cat output/combined.workflow.output.tsv
```
<details>
<summary>Output</summary>
```{bash echo=FALSE}
cat output/combined.workflow.output.tsv
```
</details>
## What's next?
Congratulations, you've reached the end of this quickstart tutorial, and we're excited for you to delve deeper into the world of Viash!
Our comprehensive [guide](https://viash.io/guide) and [reference documentation](https://viash.io/reference) is here to help you explore various topics, such as:
* [Creating a Viash component and converting it into a standalone executable](https://viash.io/guide/component/create-component)
* [Ensuring reproducibility and designing customised Docker images](https://viash.io/guide/component/add-dependencies)
* [Ensuring code reliability with unit testing for Viash](https://viash.io/guide/component/unit-testing)
* [Streamlining your workflow by performing batch operations on Viash projects](https://viash.io/guide/project/batch-processing)
* [Building Nextflow pipelines using Viash components](https://viash.io/guide/nextflow_vdsl3)
So, get ready to enhance your skills and create outstanding solutions with Viash!