-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microsatbed: new tool for reporting short tandem repeats as bed track features. #6145
Conversation
…ased reporting and decorated sorting for bed output
…ormat output for all perfect STRs. This makes the tool more generic and makes better use of pytrf...
…rmatively about the provenance otherwise. Cannot pass the element_identifier from a test parameter it seems - it's a built-in so not surprisingly over-ridden.
fix dimer minimum of 1 fix tests to conform to new structures
…size for size selected and specified motifs not so easy for native pytrf operation
jbrowse2 simply will not read the bigwigs made using pybigtools :(
Co-authored-by: Björn Grüning <[email protected]>
Thanks @bgruening - lots of warts fixed! |
tools/microsatbed/microsatbed.xml
Outdated
<param name="tetramin" value="20"/> | ||
<param name="pentamin" value="20"/> | ||
<param name="hexamin" value="20"/> | ||
<output name="bed" value="dibed_wig_sample" compare="sim_size" delta="10"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add here some asserts? eg. column_count, row_count etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgruening: Yes, thanks - good idea - sorry for the delay - am travelling and distracted - in marvelous Melbourne and beyond...
Have added assertions to all the tests and fixed the leftover merge conflict marker.
Also figured out how to implement your suggestion to make the command section less complex with "--foo" values for the di/tri... etc selector which is now a macro since it is needed twice.
add contents assertions to all tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Thanks @fubar2
… features. (galaxyproject#6145) * Preparing draft PR * typo * typo * redundant copy * fix flakery * sheesh. local flake8 was fine, I swear.... * Add comments explaining two non-obvious issues being dealt with - 1 based reporting and decorated sorting for bed output * flake8 strikes again. * flake8 a trailing space. Oy. * added a native pytrf mode to run findstr and make csv or tsv or gtf format output for all perfect STRs. This makes the tool more generic and makes better use of pytrf... * update readme with stuff from the tool help * forgot the new test output for the new native pytrf findstr command line test * make native test output smaller than 43MB. Eeesh. * add missing smaller test fa * remove bogus print * Add test for built-in genome and paraphenaliae * reverted the inbuilt test because cannot get the output labelled informatively about the provenance otherwise. Cannot pass the element_identifier from a test parameter it seems - it's a built-in so not surprisingly over-ridden. * rationalise minima for everything. fix dimer minimum of 1 fix tests to conform to new structures * make bed sample small enough * add option for windowed density bigwig output with selectable window size for size selected and specified motifs not so easy for native pytrf operation * fix flake8 issue * more flakery * remove pybigtools to use ucsc-bedgraphtobigwig on a bedgraph instead jbrowse2 simply will not read the bigwigs made using pybigtools :( * Fixes suggested by Bjoern's review * Update tools/microsatbed/microsatbed.xml Co-authored-by: Björn Grüning <[email protected]> * Update microsatbed.xml * add help for windowed density bigwig option * fix logic for multiple flags add contents assertions to all tests * remove bogus old merge mark --------- Co-authored-by: Björn Grüning <[email protected]>
FOR CONTRIBUTOR:
This PR proposes a new tool and suggestions from anyone interested in microsatellites and STRs would be welcomed
Motivation was to recreate some of the NIH MARBL T2T assembly polishing browser tracks in Galaxy workflows for the VGP. Those tracks display the density of specific dinucleotides such as
CG
in 128nt windows to help identify problematic regions, where they are over-represented, and may introduce technical errors in alignment for different kinds of sequencing methods.The tool can be configured to output all motifs of one or more lengths from 1-6nt.
Alternatively, specific motifs can be provided as a comma separated string.
Two or more sequential repeats can be required although dimers can be reported as singletons.
This makes it potentially applicable for visualising the distribution of short tandem repeats and other kinds of microsatellites as bed or gff tracks on a reference fasta. For downstream processing of all exact STRs, the underlying python tool
pytrf
can be run using thefindstr
option, producing either gff, tsv or csv outputs as described at the end of the documentation.JBrowse2 bigwig sample outputs for the 4 dinucleotide density tracks using the HG002 assembly