Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional annotation of protein sequences - Workflow #228

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

rlibouba
Copy link
Contributor

Hello, I'd like to suggest this new protein sequence annotation workflow, using eggNOG Mapper and Interproscan.

At the same time, I'd like to tell you about a problem I'm having with interproscan testing. I get this error
Failed to find output [interproscan xml] in invocation outputs [{'eggNOG Mapper annotations': {'src': 'hda', 'id': '4bafbd75dc760dfd', 'workflow_step_id': '6a2bc09b040f62c5'}, 'eggNOG Mapper seed_orthologs': {'src': 'hda', 'id': '2e6846bf7c441bd6', 'workflow_step_id': '6a2bc09b040f62c5'}}]

Have a nice day!

@rlibouba rlibouba marked this pull request as draft October 11, 2023 08:04
@bgruening
Copy link
Member

@rlibouba we need a .dockstore.yml file here.

@bgruening
Copy link
Member

Ok, now the CI is properly running :)

@mvdbeek
Copy link
Member

mvdbeek commented Oct 30, 2023

Hey @rlibouba, this is the error message:

Screenshot 2023-10-30 at 08 04 00

The reference data is coming from cvmfs, do you know where we can get 5.59-91.0 from ?

@rlibouba
Copy link
Contributor Author

Hi @mvdbeek, thanks for your feedback. Sorry for my late reply.

Checking with @abretaud , it should be linked to the data manager. Do you think we should use idc (https://github.com/galaxyproject/idc) to manage the problem?

@abretaud
Copy link
Collaborator

Yep, it uses this DM: https://github.com/galaxyproject/tools-iuc/tree/main/data_managers/data_manager_interproscan
It's a few tens of Gb IIRC, if it could be managed by IDC it would great, but not sure if it's ready to handle non-genome data

@mvdbeek
Copy link
Member

mvdbeek commented Nov 29, 2023

I think the problem was that we don't have a great way to publish large datasets, but we can always rsync this onto cvmfs from a site that has the data available

@abretaud
Copy link
Collaborator

Ok, the DM mostly downloads a big archive but also makes a few file indexing and .properties file writing

Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ Functional_annotation_of_protein_sequences.ga_0

    Execution Problem:

    • Failed to run workflow, at least one job is in [error] state.
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: input:

        • step_state: scheduled
      • Step 2: InterProScan:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "7f541560911b11efa0bb6fbe6244f820"
              applications ["TIGRFAM", "FunFam", "SFLD", "SUPERFAMILY", "PANTHER", "Gene3D", "Hamap", "PrositeProfiles", "Coils", "SMART", "CDD", "PRINTS", "PIRSR", "PrositePatterns", "AntiFam", "Pfam", "MobiDBLite", "PIRSF"]
              chromInfo "/tmp/tmpglv3_kjp/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              database "5.59-91.0"
              dbkey "?"
              goterms true
              iprlookup false
              licensed {"__current_case__": 1, "applications_licensed": ["Phobius", "SignalP_EUK", "TMHMM"], "use": "true"}
              oformat ["TSV", "XML"]
              pathways true
              seqtype "p"
      • Step 3: eggNOG Mapper:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "7f541560911b11efa0bb6fbe6244f820"
              annotation_options {"__current_case__": 0, "go_evidence": "non-electronic", "no_annot": "", "seed_ortholog_evalue": "0.001", "seed_ortholog_score": null, "target_orthologs": "all", "tax_scope": null}
              chromInfo "/tmp/tmpglv3_kjp/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              eggnog_data "5.0.2"
              ortho_method {"__current_case__": 0, "dmnd_ignore_warnings": false, "dmnd_iterate": false, "evalue": null, "input": {"values": [{"id": 1, "src": "hda"}]}, "input_trans": {"__current_case__": 0, "itype": "proteins"}, "m": "diamond", "matrix_gapcosts": {"__current_case__": 2, "gap_costs": "--gapopen 11 --gapextend 1", "matrix": "BLOSUM62"}, "pident": null, "query_cover": null, "score": "0.001", "sensmode": "sensitive", "subject_cover": null}
              output_options {"md5": false, "no_file_comments": false, "report_orthologs": false}
    • Other invocation details
      • error_message

        • Failed to run workflow, at least one job is in [error] state.
      • history_id

        • a75ad98d0072565b
      • history_state

        • error
      • invocation_id

        • a75ad98d0072565b
      • invocation_state

        • scheduled
      • workflow_id

        • a75ad98d0072565b

Copy link

github-actions bot commented Nov 7, 2024

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ Functional_annotation_of_protein_sequences.ga_0

    Execution Problem:

    • Failed to run workflow, at least one job is in [error] state.
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: input:

        • step_state: scheduled
      • Step 2: InterProScan:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "981cf6669d1411efb86a952a98c6485c"
              applications ["TIGRFAM", "FunFam", "SFLD", "SUPERFAMILY", "PANTHER", "Gene3D", "Hamap", "PrositeProfiles", "Coils", "SMART", "CDD", "PRINTS", "PIRSR", "PrositePatterns", "AntiFam", "Pfam", "MobiDBLite", "PIRSF"]
              chromInfo "/tmp/tmpv6ekc37k/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              database "5.59-91.0"
              dbkey "?"
              goterms true
              iprlookup false
              licensed {"__current_case__": 1, "applications_licensed": ["Phobius", "SignalP_EUK", "TMHMM"], "use": "true"}
              oformat ["TSV", "XML"]
              pathways true
              seqtype "p"
      • Step 3: eggNOG Mapper:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "981cf6669d1411efb86a952a98c6485c"
              annotation_options {"__current_case__": 0, "go_evidence": "non-electronic", "no_annot": "", "seed_ortholog_evalue": "0.001", "seed_ortholog_score": null, "target_orthologs": "all", "tax_scope": null}
              chromInfo "/tmp/tmpv6ekc37k/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              eggnog_data "5.0.2"
              ortho_method {"__current_case__": 0, "dmnd_ignore_warnings": false, "dmnd_iterate": false, "evalue": null, "input": {"values": [{"id": 1, "src": "hda"}]}, "input_trans": {"__current_case__": 0, "itype": "proteins"}, "m": "diamond", "matrix_gapcosts": {"__current_case__": 2, "gap_costs": "--gapopen 11 --gapextend 1", "matrix": "BLOSUM62"}, "pident": null, "query_cover": null, "score": "0.001", "sensmode": "sensitive", "subject_cover": null}
              output_options {"md5": false, "no_file_comments": false, "report_orthologs": false}
    • Other invocation details
      • error_message

        • Failed to run workflow, at least one job is in [error] state.
      • history_id

        • 32d241b6562b0252
      • history_state

        • error
      • invocation_id

        • 32d241b6562b0252
      • invocation_state

        • scheduled
      • workflow_id

        • 32d241b6562b0252

Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ Functional_annotation_of_protein_sequences.ga_0

    Execution Problem:

    • Failed to run workflow, at least one job is in [error] state.
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: input:

        • step_state: scheduled
      • Step 2: InterProScan:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "f260d004a73f11ef98d565f464fb6b9d"
              applications ["TIGRFAM", "FunFam", "SFLD", "SUPERFAMILY", "PANTHER", "Gene3D", "Hamap", "PrositeProfiles", "Coils", "SMART", "CDD", "PRINTS", "PIRSR", "PrositePatterns", "AntiFam", "Pfam", "MobiDBLite", "PIRSF"]
              chromInfo "/tmp/tmpgu23bwdj/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              database "5.59-91.0"
              dbkey "?"
              goterms true
              iprlookup false
              licensed {"__current_case__": 1, "applications_licensed": ["Phobius", "SignalP_EUK", "TMHMM"], "use": "true"}
              oformat ["TSV", "XML"]
              pathways true
              seqtype "p"
      • Step 3: eggNOG Mapper:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is queued

            Command Line:

            • emapper.py  --data_dir '/data/db/data_managers/eggnog_data/5.0.2'   -m 'diamond' -i '/tmp/tmpgu23bwdj/files/7/8/f/dataset_78f87717-518c-4697-8e2f-d43cf6764bf1.dat' --itype 'proteins'   --matrix 'BLOSUM62' --gapopen 11 --gapextend 1 --sensmode sensitive --dmnd_iterate no   --score 0.001   --seed_ortholog_evalue 0.001 --target_orthologs=all --go_evidence=non-electronic $EGGNOG_DBMEM     --output='results' --cpu "${GALAXY_SLOTS:-4}" --scratch_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR} --temp_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR}

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "f260d004a73f11ef98d565f464fb6b9d"
              annotation_options {"__current_case__": 0, "go_evidence": "non-electronic", "no_annot": "", "seed_ortholog_evalue": "0.001", "seed_ortholog_score": null, "target_orthologs": "all", "tax_scope": null}
              chromInfo "/tmp/tmpgu23bwdj/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              eggnog_data "5.0.2"
              ortho_method {"__current_case__": 0, "dmnd_ignore_warnings": false, "dmnd_iterate": false, "evalue": null, "input": {"values": [{"id": 1, "src": "hda"}]}, "input_trans": {"__current_case__": 0, "itype": "proteins"}, "m": "diamond", "matrix_gapcosts": {"__current_case__": 2, "gap_costs": "--gapopen 11 --gapextend 1", "matrix": "BLOSUM62"}, "pident": null, "query_cover": null, "score": "0.001", "sensmode": "sensitive", "subject_cover": null}
              output_options {"md5": false, "no_file_comments": false, "report_orthologs": false}
    • Other invocation details
      • error_message

        • Failed to run workflow, at least one job is in [error] state.
      • history_id

        • 84a218bda92fbd60
      • history_state

        • error
      • invocation_id

        • 84a218bda92fbd60
      • invocation_state

        • scheduled
      • workflow_id

        • 84a218bda92fbd60

1 similar comment
Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ Functional_annotation_of_protein_sequences.ga_0

    Execution Problem:

    • Failed to run workflow, at least one job is in [error] state.
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: input:

        • step_state: scheduled
      • Step 2: InterProScan:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "f260d004a73f11ef98d565f464fb6b9d"
              applications ["TIGRFAM", "FunFam", "SFLD", "SUPERFAMILY", "PANTHER", "Gene3D", "Hamap", "PrositeProfiles", "Coils", "SMART", "CDD", "PRINTS", "PIRSR", "PrositePatterns", "AntiFam", "Pfam", "MobiDBLite", "PIRSF"]
              chromInfo "/tmp/tmpgu23bwdj/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              database "5.59-91.0"
              dbkey "?"
              goterms true
              iprlookup false
              licensed {"__current_case__": 1, "applications_licensed": ["Phobius", "SignalP_EUK", "TMHMM"], "use": "true"}
              oformat ["TSV", "XML"]
              pathways true
              seqtype "p"
      • Step 3: eggNOG Mapper:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is queued

            Command Line:

            • emapper.py  --data_dir '/data/db/data_managers/eggnog_data/5.0.2'   -m 'diamond' -i '/tmp/tmpgu23bwdj/files/7/8/f/dataset_78f87717-518c-4697-8e2f-d43cf6764bf1.dat' --itype 'proteins'   --matrix 'BLOSUM62' --gapopen 11 --gapextend 1 --sensmode sensitive --dmnd_iterate no   --score 0.001   --seed_ortholog_evalue 0.001 --target_orthologs=all --go_evidence=non-electronic $EGGNOG_DBMEM     --output='results' --cpu "${GALAXY_SLOTS:-4}" --scratch_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR} --temp_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR}

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "f260d004a73f11ef98d565f464fb6b9d"
              annotation_options {"__current_case__": 0, "go_evidence": "non-electronic", "no_annot": "", "seed_ortholog_evalue": "0.001", "seed_ortholog_score": null, "target_orthologs": "all", "tax_scope": null}
              chromInfo "/tmp/tmpgu23bwdj/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              eggnog_data "5.0.2"
              ortho_method {"__current_case__": 0, "dmnd_ignore_warnings": false, "dmnd_iterate": false, "evalue": null, "input": {"values": [{"id": 1, "src": "hda"}]}, "input_trans": {"__current_case__": 0, "itype": "proteins"}, "m": "diamond", "matrix_gapcosts": {"__current_case__": 2, "gap_costs": "--gapopen 11 --gapextend 1", "matrix": "BLOSUM62"}, "pident": null, "query_cover": null, "score": "0.001", "sensmode": "sensitive", "subject_cover": null}
              output_options {"md5": false, "no_file_comments": false, "report_orthologs": false}
    • Other invocation details
      • error_message

        • Failed to run workflow, at least one job is in [error] state.
      • history_id

        • 84a218bda92fbd60
      • history_state

        • error
      • invocation_id

        • 84a218bda92fbd60
      • invocation_state

        • scheduled
      • workflow_id

        • 84a218bda92fbd60

@mvdbeek
Copy link
Member

mvdbeek commented Nov 20, 2024

We'll also need the interproscan database. I did see this in the tool:

Screenshot 2024-11-20 at 18 17 26

I guess we at least need to make that a workflow parameter, and then also mention how that license can be gotten ?

Is there an alternative we could use with a more open license ?

Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ Functional_annotation_of_protein_sequences.ga_0

    Execution Problem:

    • Failed to run workflow, at least one job is in [error] state.
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: input:

        • step_state: scheduled
      • Step 2: InterProScan:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8c22039ea8c011ef829f6108412c6c96"
              applications ["TIGRFAM", "FunFam", "SFLD", "SUPERFAMILY", "PANTHER", "Gene3D", "Hamap", "PrositeProfiles", "Coils", "SMART", "CDD", "PRINTS", "PIRSR", "PrositePatterns", "AntiFam", "Pfam", "MobiDBLite", "PIRSF"]
              chromInfo "/tmp/tmp59gsvdmb/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              database "5.59-91.0"
              dbkey "?"
              goterms true
              iprlookup false
              licensed {"__current_case__": 1, "applications_licensed": ["Phobius", "SignalP_EUK", "TMHMM"], "use": "true"}
              oformat ["TSV", "XML"]
              pathways true
              seqtype "p"
      • Step 3: eggNOG Mapper:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is queued

            Command Line:

            • emapper.py  --data_dir '/data/db/data_managers/eggnog_data/5.0.2'   -m 'diamond' -i '/tmp/tmp59gsvdmb/files/7/a/e/dataset_7ae45ad2-dd48-42a0-934f-8233a1df599b.dat' --itype 'proteins'   --matrix 'BLOSUM62' --gapopen 11 --gapextend 1 --sensmode sensitive --dmnd_iterate no   --score 0.001   --seed_ortholog_evalue 0.001 --target_orthologs=all --go_evidence=non-electronic $EGGNOG_DBMEM     --output='results' --cpu "${GALAXY_SLOTS:-4}" --scratch_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR} --temp_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR}

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8c22039ea8c011ef829f6108412c6c96"
              annotation_options {"__current_case__": 0, "go_evidence": "non-electronic", "no_annot": "", "seed_ortholog_evalue": "0.001", "seed_ortholog_score": null, "target_orthologs": "all", "tax_scope": null}
              chromInfo "/tmp/tmp59gsvdmb/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              eggnog_data "5.0.2"
              ortho_method {"__current_case__": 0, "dmnd_ignore_warnings": false, "dmnd_iterate": false, "evalue": null, "input": {"values": [{"id": 1, "src": "hda"}]}, "input_trans": {"__current_case__": 0, "itype": "proteins"}, "m": "diamond", "matrix_gapcosts": {"__current_case__": 2, "gap_costs": "--gapopen 11 --gapextend 1", "matrix": "BLOSUM62"}, "pident": null, "query_cover": null, "score": "0.001", "sensmode": "sensitive", "subject_cover": null}
              output_options {"md5": false, "no_file_comments": false, "report_orthologs": false}
    • Other invocation details
      • error_message

        • Failed to run workflow, at least one job is in [error] state.
      • history_id

        • 843f5186ea97db5f
      • history_state

        • error
      • invocation_id

        • 843f5186ea97db5f
      • invocation_state

        • scheduled
      • workflow_id

        • 843f5186ea97db5f

1 similar comment
Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ Functional_annotation_of_protein_sequences.ga_0

    Execution Problem:

    • Failed to run workflow, at least one job is in [error] state.
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: input:

        • step_state: scheduled
      • Step 2: InterProScan:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8c22039ea8c011ef829f6108412c6c96"
              applications ["TIGRFAM", "FunFam", "SFLD", "SUPERFAMILY", "PANTHER", "Gene3D", "Hamap", "PrositeProfiles", "Coils", "SMART", "CDD", "PRINTS", "PIRSR", "PrositePatterns", "AntiFam", "Pfam", "MobiDBLite", "PIRSF"]
              chromInfo "/tmp/tmp59gsvdmb/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              database "5.59-91.0"
              dbkey "?"
              goterms true
              iprlookup false
              licensed {"__current_case__": 1, "applications_licensed": ["Phobius", "SignalP_EUK", "TMHMM"], "use": "true"}
              oformat ["TSV", "XML"]
              pathways true
              seqtype "p"
      • Step 3: eggNOG Mapper:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is queued

            Command Line:

            • emapper.py  --data_dir '/data/db/data_managers/eggnog_data/5.0.2'   -m 'diamond' -i '/tmp/tmp59gsvdmb/files/7/a/e/dataset_7ae45ad2-dd48-42a0-934f-8233a1df599b.dat' --itype 'proteins'   --matrix 'BLOSUM62' --gapopen 11 --gapextend 1 --sensmode sensitive --dmnd_iterate no   --score 0.001   --seed_ortholog_evalue 0.001 --target_orthologs=all --go_evidence=non-electronic $EGGNOG_DBMEM     --output='results' --cpu "${GALAXY_SLOTS:-4}" --scratch_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR} --temp_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR}

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8c22039ea8c011ef829f6108412c6c96"
              annotation_options {"__current_case__": 0, "go_evidence": "non-electronic", "no_annot": "", "seed_ortholog_evalue": "0.001", "seed_ortholog_score": null, "target_orthologs": "all", "tax_scope": null}
              chromInfo "/tmp/tmp59gsvdmb/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              eggnog_data "5.0.2"
              ortho_method {"__current_case__": 0, "dmnd_ignore_warnings": false, "dmnd_iterate": false, "evalue": null, "input": {"values": [{"id": 1, "src": "hda"}]}, "input_trans": {"__current_case__": 0, "itype": "proteins"}, "m": "diamond", "matrix_gapcosts": {"__current_case__": 2, "gap_costs": "--gapopen 11 --gapextend 1", "matrix": "BLOSUM62"}, "pident": null, "query_cover": null, "score": "0.001", "sensmode": "sensitive", "subject_cover": null}
              output_options {"md5": false, "no_file_comments": false, "report_orthologs": false}
    • Other invocation details
      • error_message

        • Failed to run workflow, at least one job is in [error] state.
      • history_id

        • 843f5186ea97db5f
      • history_state

        • error
      • invocation_id

        • 843f5186ea97db5f
      • invocation_state

        • scheduled
      • workflow_id

        • 843f5186ea97db5f

@mvdbeek
Copy link
Member

mvdbeek commented Nov 22, 2024

Oops, failed to change the actual data location in the eggnog data table, doing that now.

Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ Functional_annotation_of_protein_sequences.ga_0

    Execution Problem:

    • Failed to run workflow, at least one job is in [error] state.
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: input:

        • step_state: scheduled
      • Step 2: InterProScan:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Command Line:

            • mkdir -p $HOME/.interproscan-5 && sed 's|^\(data.directory=\).*$|\1/cvmfs/data.galaxyproject.org/byhand/interproscan/5.59-91.0/data|' $(dirname $(readlink -f $(command -v interproscan.sh)))/interproscan.properties > $HOME/.interproscan-5/interproscan.properties && export _JAVA_OPTIONS=-Duser.home=$HOME &&  interproscan.sh  -dp --input '/tmp/tmp2k6jlp1y/files/d/2/7/dataset_d27e9c6c-f288-4a6a-9b62-ff0a449757c8.dat' --seqtype p -f TSV,XML  --applications TIGRFAM,FunFam,SFLD,SUPERFAMILY,PANTHER,Gene3D,Hamap,PrositeProfiles,Coils,SMART,CDD,PRINTS,PIRSR,PrositePatterns,AntiFam,Pfam,MobiDBLite,PIRSF,Phobius,SignalP_EUK,TMHMM --tempdir ${TEMP:-$_GALAXY_JOB_TMP_DIR}  --pathways --goterms   --cpu ${GALAXY_SLOTS:-4}  --output-file-base 'output'

            Exit Code:

            • 1

            Standard Error:

            • Picked up _JAVA_OPTIONS: -Duser.home=/tmp/tmp2k6jlp1y/job_working_directory/000/2/home
              

            Standard Output:

            • 25/11/2024 11:35:51:740 Welcome to InterProScan-5.59-91.0
              25/11/2024 11:35:51:741 Running InterProScan v5 in STANDALONE mode... on Linux
              Invalid input specified for -appl/--applications parameter:
              Analysis Phobius does not exist or is deactivated.
              Analysis TMHMM does not exist or is deactivated.
              Analysis SignalP_EUK does not exist or is deactivated.
              
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "4a6d9e1eab2111efa444510c20e813f0"
              applications ["TIGRFAM", "FunFam", "SFLD", "SUPERFAMILY", "PANTHER", "Gene3D", "Hamap", "PrositeProfiles", "Coils", "SMART", "CDD", "PRINTS", "PIRSR", "PrositePatterns", "AntiFam", "Pfam", "MobiDBLite", "PIRSF"]
              chromInfo "/tmp/tmp2k6jlp1y/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              database "5.59-91.0"
              dbkey "?"
              goterms true
              iprlookup false
              licensed {"__current_case__": 1, "applications_licensed": ["Phobius", "SignalP_EUK", "TMHMM"], "use": "true"}
              oformat ["TSV", "XML"]
              pathways true
              seqtype "p"
      • Step 3: eggNOG Mapper:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is running

            Command Line:

            • emapper.py  --data_dir '/cvmfs/data.galaxyproject.org/byhand/eggnog_data/5.0.2'   -m 'diamond' -i '/tmp/tmp2k6jlp1y/files/d/2/7/dataset_d27e9c6c-f288-4a6a-9b62-ff0a449757c8.dat' --itype 'proteins'   --matrix 'BLOSUM62' --gapopen 11 --gapextend 1 --sensmode sensitive --dmnd_iterate no   --score 0.001   --seed_ortholog_evalue 0.001 --target_orthologs=all --go_evidence=non-electronic $EGGNOG_DBMEM     --output='results' --cpu "${GALAXY_SLOTS:-4}" --scratch_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR} --temp_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR}

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "4a6d9e1eab2111efa444510c20e813f0"
              annotation_options {"__current_case__": 0, "go_evidence": "non-electronic", "no_annot": "", "seed_ortholog_evalue": "0.001", "seed_ortholog_score": null, "target_orthologs": "all", "tax_scope": null}
              chromInfo "/tmp/tmp2k6jlp1y/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              eggnog_data "5.0.2"
              ortho_method {"__current_case__": 0, "dmnd_ignore_warnings": false, "dmnd_iterate": false, "evalue": null, "input": {"values": [{"id": 1, "src": "hda"}]}, "input_trans": {"__current_case__": 0, "itype": "proteins"}, "m": "diamond", "matrix_gapcosts": {"__current_case__": 2, "gap_costs": "--gapopen 11 --gapextend 1", "matrix": "BLOSUM62"}, "pident": null, "query_cover": null, "score": "0.001", "sensmode": "sensitive", "subject_cover": null}
              output_options {"md5": false, "no_file_comments": false, "report_orthologs": false}
    • Other invocation details
      • error_message

        • Failed to run workflow, at least one job is in [error] state.
      • history_id

        • 583fc7a87169bf5a
      • history_state

        • error
      • invocation_id

        • 583fc7a87169bf5a
      • invocation_state

        • scheduled
      • workflow_id

        • 583fc7a87169bf5a

@mvdbeek
Copy link
Member

mvdbeek commented Nov 25, 2024

I think we've got everything on cvmfs now and the interproscan run fails with

25/11/2024 11:35:51:740 Welcome to InterProScan-5.59-91.0
25/11/2024 11:35:51:741 Running InterProScan v5 in STANDALONE mode... on Linux
Invalid input specified for -appl/--applications parameter:
Analysis Phobius does not exist or is deactivated.
Analysis TMHMM does not exist or is deactivated.
Analysis SignalP_EUK does not exist or is deactivated.

I assume these are the modules that require manual installation and a license change ?
Is there any use in setting applications_licensed to a free subset ?

@rlibouba
Copy link
Contributor Author

Thanks for your feedback @mvdbeek
It might be interesting to update interproscan and fix the licensing issue. What do you think?
(ping @abretaud)

@rlibouba rlibouba marked this pull request as ready for review December 2, 2024 11:24
Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool and timely, thanks @rlibouba and @abretaud! I've just edited some minor things in the last commit, if it still looks good I'm happy to merge this.

Copy link

github-actions bot commented Dec 3, 2024

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 1
Error 0
Failure 0
Skipped 0
Passed Tests
  • ✅ Functional_annotation_of_protein_sequences.ga_0

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: input:

        • step_state: scheduled
      • Step 2: InterProScan:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • mkdir -p $HOME/.interproscan-5 && sed 's|^\(data.directory=\).*$|\1/cvmfs/data.galaxyproject.org/byhand/interproscan/5.59-91.0/data|' $(dirname $(readlink -f $(command -v interproscan.sh)))/interproscan.properties > $HOME/.interproscan-5/interproscan.properties && export _JAVA_OPTIONS=-Duser.home=$HOME &&  interproscan.sh  -dp --input '/tmp/tmpu3yxq8a1/files/0/c/8/dataset_0c8bdf11-2196-45c8-8874-9e8ab5d47aed.dat' --seqtype p -f TSV,XML  --applications TIGRFAM,FunFam,SFLD,SUPERFAMILY,PANTHER,Gene3D,Hamap,PrositeProfiles,Coils,SMART,CDD,PRINTS,PIRSR,PrositePatterns,AntiFam,Pfam,MobiDBLite,PIRSF --tempdir ${TEMP:-$_GALAXY_JOB_TMP_DIR}  --pathways --goterms   --cpu ${GALAXY_SLOTS:-4}  --output-file-base 'output'

            Exit Code:

            • 0

            Standard Error:

            • Picked up _JAVA_OPTIONS: -Duser.home=/tmp/tmpu3yxq8a1/job_working_directory/000/2/home
              

            Standard Output:

            • 03/12/2024 17:42:01:064 Welcome to InterProScan-5.59-91.0
              03/12/2024 17:42:01:066 Running InterProScan v5 in STANDALONE mode... on Linux
              03/12/2024 17:42:16:978 RunID: 2f40f6204fc9_20241203_174216689_a8sj
              03/12/2024 17:42:36:848 Loading file /tmp/tmpu3yxq8a1/files/0/c/8/dataset_0c8bdf11-2196-45c8-8874-9e8ab5d47aed.dat
              03/12/2024 17:42:36:849 Running the following analyses:
              [AntiFam-7.0,CDD-3.18,Coils-2.2.1,FunFam-4.3.0,Gene3D-4.3.0,Hamap-2021_04,MobiDBLite-2.0,PANTHER-17.0,Pfam-35.0,PIRSF-3.10,PIRSR-2021_05,PRINTS-42.0,ProSitePatterns-2022_01,ProSiteProfiles-2022_01,SFLD-4,SMART-7.1,SUPERFAMILY-1.75,TIGRFAM-15.0]
              Pre-calculated match lookup service DISABLED.  Please wait for match calculations to complete...
              03/12/2024 17:47:49:552 25% completed
              03/12/2024 18:03:37:483 50% completed
              03/12/2024 18:05:39:094 75% completed
              03/12/2024 18:08:09:329 91% completed
              03/12/2024 18:08:26:165 100% done:  InterProScan analyses completed 
              
              2024-12-03 18:08:26,681 [main] [uk.ac.ebi.interpro.scan.jms.main.Run:1801] WARN - deleteWorkingDirectoryOnCompletion : false
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "c1fb8b44b19d11ef8cd15f2b55b1640d"
              applications ["TIGRFAM", "FunFam", "SFLD", "SUPERFAMILY", "PANTHER", "Gene3D", "Hamap", "PrositeProfiles", "Coils", "SMART", "CDD", "PRINTS", "PIRSR", "PrositePatterns", "AntiFam", "Pfam", "MobiDBLite", "PIRSF"]
              chromInfo "/tmp/tmpu3yxq8a1/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              database "5.59-91.0"
              dbkey "?"
              goterms true
              iprlookup false
              licensed {"__current_case__": 0, "use": "false"}
              oformat ["TSV", "XML"]
              pathways true
              seqtype "p"
      • Step 3: eggNOG Mapper:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • emapper.py  --data_dir '/cvmfs/data.galaxyproject.org/byhand/eggnog_data/5.0.2'   -m 'diamond' -i '/tmp/tmpu3yxq8a1/files/0/c/8/dataset_0c8bdf11-2196-45c8-8874-9e8ab5d47aed.dat' --itype 'proteins'   --matrix 'BLOSUM62' --gapopen 11 --gapextend 1 --sensmode sensitive --dmnd_iterate no   --score 0.001   --seed_ortholog_evalue 0.001 --target_orthologs=all --go_evidence=non-electronic $EGGNOG_DBMEM     --output='results' --cpu "${GALAXY_SLOTS:-4}" --scratch_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR} --temp_dir ${TEMP:-$_GALAXY_JOB_TMP_DIR}

            Exit Code:

            • 0

            Standard Error:

            • �[1;32mFunctional annotation of hits...�[0m
              0 0.0003800392150878906 0.00 q/s (% mem usage: 8.00, % mem avail: 91.99)
              24 178.59459018707275 0.13 q/s (% mem usage: 8.20, % mem avail: 91.81)
              

            Standard Output:

            • #  emapper-2.1.8
              # emapper.py  --data_dir /cvmfs/data.galaxyproject.org/byhand/eggnog_data/5.0.2 -m diamond -i /tmp/tmpu3yxq8a1/files/0/c/8/dataset_0c8bdf11-2196-45c8-8874-9e8ab5d47aed.dat --itype proteins --matrix BLOSUM62 --gapopen 11 --gapextend 1 --sensmode sensitive --dmnd_iterate no --score 0.001 --seed_ortholog_evalue 0.001 --target_orthologs=all --go_evidence=non-electronic --output=results --cpu 1 --scratch_dir /tmp/tmpu3yxq8a1/tmp --temp_dir /tmp/tmpu3yxq8a1/tmp
              �[1;33m  /usr/local/bin/diamond blastp -d /cvmfs/data.galaxyproject.org/byhand/eggnog_data/5.0.2/eggnog_proteins.dmnd -q /tmp/tmpu3yxq8a1/files/0/c/8/dataset_0c8bdf11-2196-45c8-8874-9e8ab5d47aed.dat --threads 1 -o /tmp/tmpu3yxq8a1/tmp/results.emapper.hits --tmpdir /tmp/tmpu3yxq8a1/tmp/emappertmp_dmdn_qygb_zoj --sensitive -e 0.001 --min-score 0.001 --matrix BLOSUM62 --gapopen 11 --gapextend 1 --top 3  --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovhsp scovhsp�[0m
               Copying result file /tmp/tmpu3yxq8a1/tmp/results.emapper.hits from scratch to /tmp/tmpu3yxq8a1/job_working_directory/000/3/working
               Copying result file /tmp/tmpu3yxq8a1/tmp/results.emapper.seed_orthologs from scratch to /tmp/tmpu3yxq8a1/job_working_directory/000/3/working
               Copying result file /tmp/tmpu3yxq8a1/tmp/results.emapper.annotations from scratch to /tmp/tmpu3yxq8a1/job_working_directory/000/3/working
              �[31mData in /tmp/tmpu3yxq8a1/tmp will be not removed. Please, clear it manually.�[0m
              �[32mDone�[0m
              �[1;33mResult files:�[0m
                 /tmp/tmpu3yxq8a1/job_working_directory/000/3/working/results.emapper.hits
                 /tmp/tmpu3yxq8a1/job_working_directory/000/3/working/results.emapper.seed_orthologs
                 /tmp/tmpu3yxq8a1/job_working_directory/000/3/working/results.emapper.annotations
              
              ================================================================================
              CITATION:
              If you use this software, please cite:
              
              [1] eggNOG-mapper v2: functional annotation, orthology assignments, and domain 
                    prediction at the metagenomic scale. Carlos P. Cantalapiedra, 
                    Ana Hernandez-Plaza, Ivica Letunic, Peer Bork, Jaime Huerta-Cepas. 2021.
                    Molecular Biology and Evolution, msab293, https://doi.org/10.1093/molbev/msab293
              
              [2] eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated
                    orthology resource based on 5090 organisms and 2502 viruses. Jaime
                    Huerta-Cepas, Damian Szklarczyk, Davide Heller, Ana Hernandez-Plaza,
                    Sofia K Forslund, Helen Cook, Daniel R Mende, Ivica Letunic, Thomas
                    Rattei, Lars J Jensen, Christian von Mering and Peer Bork. Nucleic Acids
                    Research, Volume 47, Issue D1, 8 January 2019, Pages D309-D314,
                    https://doi.org/10.1093/nar/gky1085 
              
              [3] Sensitive protein alignments at tree-of-life scale using DIAMOND.
                     Buchfink B, Reuter K, Drost HG. 2021.
                     Nature Methods 18, 366–368 (2021). https://doi.org/10.1038/s41592-021-01101-x
              
              e.g. Functional annotation was performed using emapper-2.1.8 [1]
               based on eggNOG orthology data [2]. Sequence searches were performed using [3].
              
              
              ================================================================================
              
              Total hits processed: 24
              Total time: 2301 secs
              FINISHED
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "c1fb8b44b19d11ef8cd15f2b55b1640d"
              annotation_options {"__current_case__": 0, "go_evidence": "non-electronic", "no_annot": "", "seed_ortholog_evalue": "0.001", "seed_ortholog_score": null, "target_orthologs": "all", "tax_scope": null}
              chromInfo "/tmp/tmpu3yxq8a1/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              eggnog_data "5.0.2"
              ortho_method {"__current_case__": 0, "dmnd_ignore_warnings": false, "dmnd_iterate": false, "evalue": null, "input": {"values": [{"id": 1, "src": "hda"}]}, "input_trans": {"__current_case__": 0, "itype": "proteins"}, "m": "diamond", "matrix_gapcosts": {"__current_case__": 2, "gap_costs": "--gapopen 11 --gapextend 1", "matrix": "BLOSUM62"}, "pident": null, "query_cover": null, "score": "0.001", "sensmode": "sensitive", "subject_cover": null}
              output_options {"md5": false, "no_file_comments": false, "report_orthologs": false}
    • Other invocation details
      • history_id

        • bb0c4fc373c5ac99
      • history_state

        • ok
      • invocation_id

        • bb0c4fc373c5ac99
      • invocation_state

        • scheduled
      • workflow_id

        • bb0c4fc373c5ac99

Copy link
Collaborator

@abretaud abretaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool to see this being ready :) Just 2 typos on my email address
For licensing: yep the warnings are coming from these manual-only components, so that's the best we can do I think.
And yep, we should update interproscan soon, but it's always been tricky :D We ca merge this version for now

Co-authored-by: Anthony Bretaudeau <[email protected]>
@rlibouba
Copy link
Contributor Author

rlibouba commented Dec 6, 2024

Hi @mvdbeek ! Do you think everything is good and can be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants