Releases: viralemergence/virion
The VIRION preprint
Preprint compatible version (includes a handful of genus-level virus records added to CLOVER, a small fix on PREDICT taxonomy, a small fix on GenBank metadata + a copy of the preprint as initially submitted to bioRxiv on August 6, 2021)
Stable release pre-preprint
This release contains ~6 months of new data from GenBank, which is now called from an FTP server as part of the pipeline. There's also been
- small improvements of the taxonomy pipeline, including handling of "cf." names
- some small cleanup of the PREDICT dataset
- metadata tidying including a full removal of the SRA-specific backbones
- several bug fixes (e.g., issues with sparse columns and vroom column identification lead to data overwrites)
- a massive reduction in the dataset dimensions by collapsing NCBIAccession (much needed given SARS-CoV-2 influx)
More from PREDICT
A small handful of records have been added from the PREDICT PCR testing files available from USAID, and some viral genera have been added (especially for PREDICT_XX-123 type names) based on the supplemental data from spillover.global
A release without the Sequence Read Archive predictions
A decision has been made to re-envision the VIRION-SRA dataset from the ground up. The new release doesn't include that dataset, and includes much simpler guidelines for how to use the dataset (and some temporary solutions to some pipeline architecture).
A working beta release!
Various taxonomy fixes, and a ton of smoothed workflow. Get to work!
VIRION working prototype (May 2021)
This is a working version of VIRION with complete data integration and taxonomic reconciliation. Still no dynamic updating, but that will come later; this is a chance for folks to start hunting for bugs.