Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update BUSCO and fix --miniprot parameter #6153

Merged
merged 17 commits into from
Aug 30, 2024

Conversation

rlibouba
Copy link
Contributor

Hello, I've opened this PR to update the BUSCO version and fix an error on the --miniprot parameter (#6147).
Thank you! Have a good day!

@rlibouba rlibouba marked this pull request as draft July 15, 2024 14:21
@mvdbeek
Copy link
Member

mvdbeek commented Jul 15, 2024

busco: error: unrecognized arguments: --update-data, I guess this was removed at some point between version 5.5.0 and 5.7.1 ?

@Hymenium
Copy link

Yes, it seems that this command was removed in v5.6.1, although I can't easily find any documentation regarding the change.

@Hymenium
Copy link

Note that the Augustus/MetaEuk parameter cannot select for MetaEuk currently. The options are coded to yes and no for using Augustus as an alternative, and no --metaeuk parameter is ever issued.
e.g.

#if $busco_mode.mode == 'geno' and $busco_mode.use_augustus.use_augustus_selector == 'yes':

tools/busco/busco.xml Outdated Show resolved Hide resolved
<param name="use_augustus_selector" type="select" label="Use Augustus instead of Metaeuk">
<option value="yes">Yes, use Augustus</option>
<option value="no" selected="true">Use Metaeuk</option>
<param name="use_augustus_selector" type="select" label="Select a gene predictor">

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

It may help users to put some information somewhere about BUSCO using the Prodigal predictor for prokaryotic data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can add a helper to indicate this. I'm trying to fix the errors in the tests. I'll do a general commit with all the changes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I saw that some tests had failed but I can't see any error messages to see what failed. Other than the change of default predictor and some formatting changes between version, it's hard to understand why the tests would fail.

@bgruening
Copy link
Member

@rlibouba anything we can help here?

@rlibouba
Copy link
Contributor Author

Hi @bgruening, I've noticed two different errors.
The first relates to the use of miniprot. The error I got was error: local variable 'target_id' referenced before assigment. A issues is open (https://gitlab.com/ezlab/busco/-/issues/744). Three of us have reported this error.

For the second, the error is multiple repeat at position 351 (line 7, column 3). Perhaps this is linked to the Galaxy code? Do you have any ideas?

@bgruening
Copy link
Member

For the second, the error is multiple repeat at position 351 (line 7, column 3). Perhaps this is linked to the Galaxy code? Do you have any ideas?

This does not seem to be an error, but it fails because of:

2024-08-20 15:19:16 ERROR:	
Warning message:
package ‘ggplot2’ was built under R version 4.4.1 
Warning message:
The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead. 

@abretaud
Copy link
Contributor

I think it's not related to this warning, if you look at the test 1 you only have the multiple repeat at position. Google says it's a regex related error, but I don't see where it could come from in the tool code 🤔

@Hymenium
Copy link

Hymenium commented Aug 26, 2024

Could it be related to how the results files are compared in the tests? In the previous results files some special characters are escaped (e.g. \* vs *).

tools/busco/busco.xml Outdated Show resolved Hide resolved
@bgruening
Copy link
Member

Please take it out of draft if you are ready.

@rlibouba rlibouba marked this pull request as ready for review August 29, 2024 12:17
@Hymenium
Copy link

Is there any real test for the .gff output? The first line seems to be checked, but this is created in the tool script.

tools/busco/busco.xml Outdated Show resolved Hide resolved
tools/busco/busco.xml Show resolved Hide resolved
@bgruening
Copy link
Member

@Hymenium we could add more asserts but files like tools/busco/test-data/genome_results_metaeuk/out.gff3 do a diff on comparison and do produce some lines.

@Hymenium
Copy link

Hymenium commented Aug 29, 2024

Thanks, if this output is working now then that's fine.

@@ -40,8 +40,6 @@ busco
#if $lineage_conditional.selector == 'cached':
--offline
--download_path $lineage_conditional.cached_db.fields.path
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

We still offer the "download" option in the User Interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah found it, it's just the default now. and --offline still works

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go. I just had one last question. Why do we remove the upload option? I agree that its not a good default, so maybe we should not select this option by default - rather offer the build-in databases, but removing it?

@Hymenium
Copy link

Hymenium commented Aug 30, 2024

The '--update-data' parameter has been removed from BUSCO, see the comments from 15 July.

Strangely this parameter is still mentioned in the user guide (https://busco.ezlab.org/busco_userguide.html#download-and-automated-update).

The user guide says that if a dataset name is provided (using -l or --lineage_dataset) then the associated file is automatically updated, and if a file path is given this is disabled but it seems like a warning will be issued. For automated lineage selection, it says the relevant files are automatically updated .

Copy link
Contributor

@abretaud abretaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, the --update-data is gone but it's just a simplification of options, it doesn't change the wrapper behavior, so I think it's ready

I'm going to merge in a few minutes, unless someone screams ;)

@@ -40,8 +40,6 @@ busco
#if $lineage_conditional.selector == 'cached':
--offline
--download_path $lineage_conditional.cached_db.fields.path
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah found it, it's just the default now. and --offline still works

@abretaud abretaud merged commit 21578f9 into galaxyproject:main Aug 30, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants