Skip to content
This repository has been archived by the owner on Nov 24, 2024. It is now read-only.

Strategy to update workflows needed #3

Open
bgruening opened this issue Jul 8, 2018 · 25 comments
Open

Strategy to update workflows needed #3

bgruening opened this issue Jul 8, 2018 · 25 comments

Comments

@bgruening
Copy link
Member

External workflows, e.g. the once from the training material, should be updated regularly. Or even better should be a linked against the training material. Maybe we can pull them down before running the tests as we do with data sets currently.

@cat-bro
Copy link
Contributor

cat-bro commented Jun 4, 2020

Hi @bgruening, continuing discussion from galaxyproject/training-material#1896

A way forward could be to rewrite the update_workflows function in update-training.sh so that it updates each workflow within training/* (possibly with wget https://github.com/galaxyproject/training-material/blob/master/<topic_name>/tutorials/<tutorial_name>/workflows/<workflow_name> for each individual file rather than cloning training-material). It would copy the workflows only, not the tests. This could be run by jenkins prior to running the workflow tests, with any changes committed into workflow-testing.

Prior to being able to implement this there would need to be a clean-up of the paths (in workflow-testing) so that any training/<topic_name>/<tutorial_name>/<workflow_name> is consistent with a corresponding topics/<topic_name>/tutorials/<tutorial_name>/workflows/<workflow_name> path in training-material. (for example training/statistics/machine_learning/classification/linear_SVC_classification.ga would become training/statistics/classification_regression/classification_LSVC.ga)

I could make a start on this.

@bgruening
Copy link
Member Author

I like that, this would also save the long cloning step.

Thanks @cat-bro for working on that.

@hexylena
Copy link
Member

hexylena commented Jun 4, 2020

for what it's worth: My future plan was to just merge this back into the training material. I only created the separate repo to let things move a bit faster while we were getting testing worked out.

Things seem stable enough now (and with a second site!) that we should make this a standard part of the GTN. I'm happy to help with this too, for what it's worth. It would not be too difficult to merge back + update the tests.

@bgruening
Copy link
Member Author

We would still need this repo for other - non training workflows I think.

@hexylena
Copy link
Member

hexylena commented Jun 4, 2020

Ahh I hadn't considered that use case. Would it make sense to keep both repos then? I would really love to see the training tests as close to the training materials as possible (and kept to high standards), and then this repository could give more freedom, whatever test you want to write, or so?

@cat-bro
Copy link
Contributor

cat-bro commented Jul 23, 2020

What if everything under training were kept in galaxyproject/training-material (single source of truth) and in this repo there was just a yaml listing the .ga files and test files for any workflows that had tests? Jenkins could get these with wget rather than cloning the repo. The thing I suggested in June about syncing them over would work, but even then the double-ups are problematic. Non-GTN workflows would still live here.

@hexylena hexylena changed the title stradegy to update workflows needed Strategy to update workflows needed Jul 23, 2020
@hexylena
Copy link
Member

That could work @cat-bro, sounds good.

@bgruening
Copy link
Member Author

I assume then developing the workflow tests is not that intuitive. As you don't have the workflows next to the tests.
What about adding the test files to the workflows in GTN?

@hexylena
Copy link
Member

What about adding the test files to the workflows in GTN?

Oh, I assumed that's what @cat-bro meant, both WFs + tests live in the GTN.

@cat-bro
Copy link
Contributor

cat-bro commented Jul 23, 2020

yes, what @hexylena said

@bgruening
Copy link
Member Author

Cool, Having everything in GTN sounds great!

@bgruening
Copy link
Member Author

So we need some logic to traverse GTN, find workflow-tests and run them.

What do you think about adding a Makefile that clones the GTN (only latest HEAD) and traverses the tree and somehow returns the path to the workflow-test files.

We could also keep the list of workflows (outside of GTN) in this Makefile and finally trigger a run. This would also simplify the Jenkins setup.

The current list from Jenkins is:

training/transcriptomics/rna-seq-viz-with-volcanoplot/rna-seq-viz-with-volcanoplot.ga
training/transcriptomics/rna-seq-viz-with-heatmap2/rna-seq-viz-with-heatmap2.ga
raceid3/raceid3_workflow.ga
example1/wf3-shed-tools.ga
example2/wf4-shed-tools.ga
GraphClust2/GC-lite.ga
training/transcriptomics/small_ncrna_clustering/blockclust_workflow.ga
sklearn/adaboost/adaboost.ga
sklearn/ard/ard.ga
training/variant-analysis/microbial-variants/microbial_variant_calling.ga
training/variant-analysis/dip/diploid.ga
training/variant-analysis/mapping-by-sequencing/mapping_by_sequencing.ga
training/proteomics/protein-id-sg-ps/protein-id-sg-ps.ga
training/proteomics/protein_quant_sil/protein_quant_sil.ga
training/proteomics/metaproteomics/metaproteomics.ga
training/sequence-analysis/ref-based-rad-seq/rad_seq_ref_based.ga
training/sequence-analysis/quality-control/quality_control.ga
training/sequence-analysis/mapping/mapping.ga
training/chip-seq/formation_of_super-structures_on_xi/formation_of_super_structures_on_xi.ga
training/epigenetics/methylation-seq/methylation-seq.ga
training/assembly/general-introduction/assembly-general-introduction.ga
training/assembly/unicycler-assembly/unicycler.ga
training/metagenomics/general-tutorial/amplicon.ga
training/transcriptomics/ref-based/ref_based.ga
training/transcriptomics/small_ncrna_clustering/blockclust_workflow.ga
training/statistics/classification_regression/regression_GradientBoosting.ga
training/statistics/classification_regression/classification_LSVC.ga
training/proteomics/F1000_Metaproteomics_QueryTabular/F1000_Metaproteomics_QueryTabular.ga
training/proteomics/F1000_Proteogenomics_QueryTabular/F1000_Proteogenomics_QueryTabular.ga
training/computational-chemistry/bio3danalysis/MD_Analysis_using_Bio3D.ga
training/computational-chemistry/bio3danalysis/gromacs.ga
training/metabolomics/F1000_Metabolomics_Query_Tabular_Mass_Adjustment.ga
training/statistics/machinelearning/machine_learning.ga

@cat-bro
Copy link
Contributor

cat-bro commented Aug 3, 2020

Yes, that sounds good.

Something like

git clone https://github.com/galaxyproject/training-material.git
for f in $(find 'training-material' -path '*-test.yml' ); do echo "${f/-test.yml/.ga}" >> list_of_workflows.txt; done
<script that runs the tests>
rm -rf training-material

Some of the tutorials listed above will not have equivalents in GTN (sklearn I think?) so there would need to be a separate list of these

@hexylena
Copy link
Member

hexylena commented Aug 3, 2020

unnecessary, small optimisation:

find 'training-material' -path '*-test.yml' | sed 's/-test.yml/.ga' > list_of_workflows.txt

otherwise sounds good1

@bgruening
Copy link
Member Author

Some of the tutorials listed above will not have equivalents in GTN (sklearn I think?) so there would need to be a separate list of these

Yes, I think there is a need to have still workflows here. E.g. user-worfklows that will not make it into GTN anytime soon. Or a really strange workflow that just utilize a buch of functionallity etc ...

@bgruening
Copy link
Member Author

@cat-bro do we have a plan to move forward? Are you planning to work on this? Can we help somehow?
Next release is coming and we could test those workflows against it.

@cat-bro
Copy link
Contributor

cat-bro commented Sep 14, 2020

Hi @bgruening, this is on my trello but I keep getting distracted by other things.

I would really love to be able to test all of the workflows that run on Galaxy Australia. It would be a great comfort to people who run training sessions. A short term goal would be to have tests for as many of the workflows in GTN as possible.

Is it OK for me to update some of the GTN tutorials to use more up-to-date versions of the tools than they currently do? I think there are some that still use tool versions that break on python3, but there are also some that use tool versions from 2017/2018 that I don't believe anybody actually teaching or learning from the tutorial will be using: in generally they will be using the latest version available.

A long term goal in my mind would be to have a strategy of workflow testing that accounts for the fact that most of the time, tool versions used in tutorials will be the latest available versions and not necessarily the versions listed on the workflows. I think that the tests as they currently are are useful, but that sometimes they are testing flows that would be unlikely in a teaching setting. For example if Galaxy has grappa version 1.2a and grappa version 3.4b, the workflow might contain grappa 1.2a but in a real world teaching scenario students will be using 3.4b, if this makes any sense.

@bgruening
Copy link
Member Author

Is it OK for me to update some of the GTN tutorials to use more up-to-date versions of the tools than they currently do?

Of course!

A long term goal in my mind would be to have a strategy of workflow testing that accounts for the fact that most of the time, tool versions used in tutorials will be the latest available versions and not necessarily the versions listed on the workflows. I think that the tests as they currently are are useful, but that sometimes they are testing flows that would be unlikely in a teaching setting. For example if Galaxy has grappa version 1.2a and grappa version 3.4b, the workflow might contain grappa 1.2a but in a real world teaching scenario students will be using 3.4b, if this makes any sense.

In theory, our training materials should all be updated to latest versions. We are just not able to do this currently. I guess one step in this direction is to inform the training author ... so have something automatic that bumps the workflow to the latest versions, runs the tests and informs the author of the training to check the (update) PR.

@cat-bro
Copy link
Contributor

cat-bro commented Sep 14, 2020

  • training/statistics/machinelearning/machine_learning-test.yml
  • training/statistics/classification_regression/regression_GradientBoosting-test.yml
  • training/statistics/classification_regression/classification_LSVC-test.yml
  • training/metabolomics/F1000_Proteogenomics_QueryTabular-test.yml # no equivalent in GTN?
  • training/sequence-analysis/mapping/mapping-test.yml
  • training/sequence-analysis/ref-based-rad-seq/rad_seq_ref_based-test.yml
  • training/sequence-analysis/quality-control/quality_control-test.yml
  • training/epigenetics/methylation-seq/methylation-seq-test.yml
  • training/epigenetics/hicexplorer/hicexplorer-test.yml
  • training/transcriptomics/rna-seq-viz-with-heatmap2/rna-seq-viz-with-heatmap2-test.yml
  • training/transcriptomics/small_ncrna_clustering/blockclust_workflow-test.yml
  • training/transcriptomics/rna-seq-viz-with-volcanoplot/rna-seq-viz-with-volcanoplot-test.yml
  • training/transcriptomics/ref-based/ref_based-test.yml
  • training/assembly/unicycler-assembly/unicycler-test.yml
  • training/assembly/general-introduction/assembly-general-introduction-test.yml
  • training/chip-seq/formation_of_super-structures_on_xi/formation_of_super_structures_on_xi-test.yml
  • training/variant-analysis/microbial-variants/microbial_variant_calling-test.yml
  • training/variant-analysis/dip/diploid-test.yml
  • training/variant-analysis/mapping-by-sequencing/mapping_by_sequencing-test.yml
  • training/computational-chemistry/bio3danalysis/MD_Analysis_using_Bio3D-test.yml
  • training/computational-chemistry/gromacs/gromacs-test.yml
  • training/proteomics/protein_quant_sil/protein_quant_sil-test.yml
  • training/proteomics/protein-id-sg-ps/protein-id-sg-ps-test.yml
  • training/proteomics/F1000_Metaproteomics_QueryTabular/F1000_Metaproteomics_QueryTabular-test.yml # no equivalent in GTN?
  • training/proteomics/F1000_Proteogenomics_QueryTabular/F1000_Proteogenomics_QueryTabular-test.yml # no equivalent in GTN?
  • training/proteomics/metaproteomics/metaproteomics-test.yml
  • training/metagenomics/general-tutorial/amplicon-test.yml

@cat-bro
Copy link
Contributor

cat-bro commented Sep 14, 2020

^ boxes are ticked if the tutorial has an equivalent working test in GTN. I think that the sequence-analysis tutorials do too but I'd need to run the tests again to be sure.

As a first step it would be good to have equivalent tests in GTN for all of these, so that no value is lost moving to running the GTN tests instead of the tests in this repo.

@bgruening
Copy link
Member Author

@cat-bro; @malloryfreeberg has started to add several new tests to GTN in the last weeks. This is all super exciting! Thanks all!

@malloryfreeberg
Copy link

@cat-bro; @malloryfreeberg has started to add several new tests to GTN in the last weeks. This is all super exciting! Thanks all!

@cat-bro the list of workflows and workflow tests I've been adding to the GTN material are all referenced in this ticket: galaxyproject/training-material#1459

@cat-bro
Copy link
Contributor

cat-bro commented Oct 20, 2020

That's great! There are now more tests for training material in GTN than in this repo. There are still a few in this repo that do not have equivalent tests in GTN but they may need a bit more work and can be added to GTN over time. It's probably time to abandon the training tests in this repo and run the tests that are in GTN.

A script to run the tests could be something like

# get list of local workflows with tests (ignoring training folder)
find . \( -name '*-test.yml' ! -path './training*' \) | sed 's/^\.\///g' | sed 's/-test.yml/.ga/g' > $workflow_list

# clone training-material repo
git clone https://github.com/galaxyproject/training-material.git

# get list of training-material workflows with tests
find 'training-material' -path '*-test.yml' | sed 's/-test.yml/.ga/g' >> $workflow_list

mkdir results

cat $workflow_list | while read workflow_path; do
   ./run_galaxy_workflow_tests.sh $workflow_path
   cp tool_test_output.json results/$(sed 's/\//_/g' <<< $workflow_path).tool_test_output.json
done

# planemo merge_test_reports ...... ## merge all of the files in reports and produce on html doc

The above is untested/unfinished.

Not sure if it's best to be cloning training-material each time or to have a clone somewhere on the Jenkins server that can have git pull run on it each time.

I really like the idea of having a merged test report that could be available on Jenkins.

@malloryfreeberg
Copy link

@cat-bro I like your ideas, and am fully supportive of harmonising the tests and finding ways to keep them up-to-date. Let me know what would be helpful for you to support this. I'm happy to continue going through all the GTN materials and making sure all the tutorials have both a workflow and a workflow test.

@bgruening
Copy link
Member Author

Not sure if it's best to be cloning training-material each time or to have a clone somewhere on the Jenkins server that can have git pull run on it each time.

Cloning is ok. We can use git clone --depth 1 to speed it up. @cat-bro can we have a Makefile target that runs your script and the cloneing? This would make testing locally easy but also the Jenkins integration.

This is what we do in the tools-land to merge the planemo Json reports: https://github.com/galaxyproject/tools-iuc/blob/master/.github/workflows/pr.yaml#L315

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants