From 2860da4fd07b2e1322977e9684c379b6af7206a6 Mon Sep 17 00:00:00 2001 From: Robrecht Cannoodt Date: Wed, 22 Nov 2023 11:07:06 +0100 Subject: [PATCH] update paper --- docs/paper/paper.md | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/docs/paper/paper.md b/docs/paper/paper.md index 7be60ad12..5c5f4c643 100644 --- a/docs/paper/paper.md +++ b/docs/paper/paper.md @@ -68,7 +68,16 @@ In this work we introduce Viash, a tool for speeding up pipeline prototyping thr # Core features and functionality Viash is an open-source embodiment of a ‘code-first’ concept for pipeline development. Many bioinformatics research projects (and other software development projects) start with prototyping functionality in small scripts or notebooks in order to then migrate the functionality to software packages or pipeline frameworks. By adding some metadata to a code block or script (\autoref{fig-overview}A), Viash can turn a (small) code block into a highly malleable object. By encapsulating core functionality in modular building blocks, a Viash component can be used in a myriad of ways (\autoref{fig-overview}B-C): export it as a standalone command-line tool; create a highly intuitive and modular Nextflow component; ensure reproducibility by building, pulling, or pushing Docker containers; or running one or more unit tests to verify that the component works as expected. Integration with CI tools such as GitHub Actions, Jenkins or Travis CI allows for automation of unit testing, rolling releases and versioned releases. -The definition of a Viash component -- a config and a code block -- can be implemented quite concisely (\autoref{fig-cheatsheet}left). Viash currently supports different scripting languages, including Bash, JavaScript, Python and R. Through the use of several subcommands (\autoref{fig-cheatsheet}right), Viash can build the component into a standalone script using one of three backend platforms -- native, Docker, or Nextflow. Additional commands allow processing one or more Viash components simultaneously, e.g. for executing a unit test suite or (re-)building component-specific Docker images. + + +![Viash allows easy prototyping of reusable pipeline components. **A:** Viash requires two main inputs, a script (or code block) and a Viash config file. A Viash config file is a YAML file with metadata describing the functionality provided by the component (e.g. a name and description of the component and its parameters), and platform-specific metadata (e.g. the base Docker container to use, which software packages are required by the component). Optionally, the quality of the component can be improved by defining one or more unit tests with which the component functionality can be tested. **B:** Viash allows transforming a given config to a variety of different outputs. **C:** Viash supports robust pipeline development by allowing users to build their component as a standalone executable (with auto-generated CLI), build a Docker image to run the script inside, or turn the component into a standalone Nextflow module or workflow. If unit tests were defined, Viash can also run all of the unit tests and provide users with a report.\label{fig-overview}](figures/figure_2.pdf) + +The definition of a Viash component -- a config and a code block -- can be implemented quite concisely (\autoref{fig-cheatsheet} left). Viash currently supports different scripting languages, including Bash, JavaScript, Python and R. Through the use of several subcommands (\autoref{fig-cheatsheet} right), Viash can build the component into a standalone script using one of three backend platforms -- native, Docker, or Nextflow. Additional commands allow processing one or more Viash components simultaneously, e.g. for executing a unit test suite or (re-)building component-specific Docker images. + + + + +![Cheat sheet for developing modular pipeline components with Viash, including a sample Viash component (**left**) and common commands used throughout the various stages of a development cycle (**right**).\label{fig-cheatsheet}](figures/figure_3.pdf) One major benefit of using code regeneration is that best practices in pipeline development can automatically be applied, whereas otherwise this would be left up to the developer to develop and maintain. For instance, all standalone executables, Nextflow modules and Docker images are automatically versioned. When parsing command-line arguments, checking for the availability of required parameters, the existence of required input files, or the type-checking of command-line arguments is also automated. Another example is helper functions for installing software through tools such as apt, apk, yum, pip or R devtools, as these sometimes require additional pre-install commands to update package registries or post-install commands to clean up the installation cache to reduce image size of the resulting image. Here, Viash could be the technical basis for a community of people committed to sharing components that everybody can benefit from. @@ -97,20 +106,17 @@ Ultimately, Viash aims to support pan-organisational and interdisciplinary resea The OpenProblems-NeurIPS2021 organised by OpenProblems demonstrates the practical value of Viash [@luecken_sandboxpredictionintegration_2021]. As part of the preparation for the competition, a pilot benchmark was implemented to evaluate and compare the performance of a few baseline methods (\autoref{fig-usecase}A). By pre-defining the input-output interfaces of several types of components (e.g. dataset loaders, baseline methods, control methods, metrics), developers from different organisations across the globe could easily contribute Viash components to the pipeline (\autoref{fig-usecase}B). Since Viash automatically generates Docker containers and Nextflow pipelines from the meta-data provided by component developers, developers could contribute components whilst making use of their programming environment of choice without needing to have any expert knowledge of Nextflow or Amazon EC2 (\autoref{fig-usecase}C). Thanks to the modularity of Viash components, the same components used in running a pilot benchmark are also used by the evaluation worker of the competition website itself. As such, the pilot benchmark also serves as an integration test of the evaluation worker. -# Discussion -Viash is under active development. Active areas of development include expanded compatibility between Viash and other technologies (i.e. additional scripting languages, containerisation frameworks and pipeline frameworks), and ease-of-use functionality for developing and managing large catalogues of Viash components (e.g. simplified continuous integration, allowing project-wide settings, automating versioned releases). - -We appreciate and encourage contributions to or extensions of Viash. All source code is available under a GPL-3 licence on Github at [github.com/viash-io/viash](https://github.com/viash-io/viash). Extensive user documentation is available at [viash.io](https://viash.io). Requests for support or expanded functionality can be addressed to the corresponding authors. -![Viash allows easy prototyping of reusable pipeline components. **A:** Viash requires two main inputs, a script (or code block) and a Viash config file. A Viash config file is a YAML file with metadata describing the functionality provided by the component (e.g. a name and description of the component and its parameters), and platform-specific metadata (e.g. the base Docker container to use, which software packages are required by the component). Optionally, the quality of the component can be improved by defining one or more unit tests with which the component functionality can be tested. **B:** Viash allows transforming a given config to a variety of different outputs. **C:** Viash supports robust pipeline development by allowing users to build their component as a standalone executable (with auto-generated CLI), build a Docker image to run the script inside, or turn the component into a standalone Nextflow module or workflow. If unit tests were defined, Viash can also run all of the unit tests and provide users with a report.\label{fig-overview}](figures/figure_2.pdf) +![A recent NeurIPS competition for multimodal data integration [@luecken_sandboxpredictionintegration_2021] demonstrates the practical value of Viash by using Bash, R, Python, Docker, Nextflow, Viash, and Amazon EC2 as core technologies to run a pilot benchmark. **A:** The pilot benchmark pipeline consists of several types of components, each of which had strict predefined input-output interfaces. **B:** Comparing which organisations contributed one or more Viash components to the pipeline demonstrates that Viash allows multiple organisations to participate in developing a pipeline collaboratively. Note: this visualisation pertains to one aspect of organising the NeurIPS competition, and does not at all reflect the overall efforts made by any party. **C:** Developers are encouraged to implement components in their preferred scripting language. Thanks to the modularity provided by Viash, sewing together multiple components into a Nextflow pipeline can be left up to a few developers, without requiring all collaborators to have expert knowledge regarding infrastructure-specific technologies.\label{fig-usecase}](figures/figure_4.pdf) -![Cheat sheet for developing modular pipeline components with Viash, including a sample Viash component (**left**) and common commands used throughout the various stages of a development cycle (**right**).\label{fig-cheatsheet}](figures/figure_3.pdf) - +# Discussion +Viash is under active development. Active areas of development include expanded compatibility between Viash and other technologies (i.e. additional scripting languages, containerisation frameworks and pipeline frameworks), and ease-of-use functionality for developing and managing large catalogues of Viash components (e.g. simplified continuous integration, allowing project-wide settings, automating versioned releases). +We appreciate and encourage contributions to or extensions of Viash. All source code is available under a GPL-3 licence on Github at [github.com/viash-io/viash](https://github.com/viash-io/viash). Extensive user documentation is available at [viash.io](https://viash.io). Requests for support or expanded functionality can be addressed to the corresponding authors. -![A recent NeurIPS competition for multimodal data integration [@luecken_sandboxpredictionintegration_2021] demonstrates the practical value of Viash by using Bash, R, Python, Docker, Nextflow, Viash, and Amazon EC2 as core technologies to run a pilot benchmark. **A:** The pilot benchmark pipeline consists of several types of components, each of which had strict predefined input-output interfaces. **B:** Comparing which organisations contributed one or more Viash components to the pipeline demonstrates that Viash allows multiple organisations to participate in developing a pipeline collaboratively. Note: this visualisation pertains to one aspect of organising the NeurIPS competition, and does not at all reflect the overall efforts made by any party. **C:** Developers are encouraged to implement components in their preferred scripting language. Thanks to the modularity provided by Viash, sewing together multiple components into a Nextflow pipeline can be left up to a few developers, without requiring all collaborators to have expert knowledge regarding infrastructure-specific technologies.\label{fig-usecase}](figures/figure_4.pdf) + # References