Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV: making the develop branch the default #7

Merged
merged 74 commits into from
Sep 19, 2023
Merged

DEV: making the develop branch the default #7

merged 74 commits into from
Sep 19, 2023

Conversation

GavinHuttley
Copy link

No description provided.

Partial download and install functions
[NEW] these are tsv files which contain a summary of the homology data.
    There are no associated checksums, so we cannot do integrity checks
[NEW] other data are made available in an emf format, including
    providing gene trees etc.. We allow the parser to be applied to these
    by turning off format checking and by using callbacks for processing
    data blocks.
[NEW] some files do not have checksums we can validate against, so we
    provide a flag for these
[NEW] multiple sequence alignment format that is more efficient for parsing

Current test very basic.
[NEW] not doing this step in parallel since overhead from pickling the large tables
    is large, and the inflation in memory could cause problems on limited hardware.
    As we filter the tables to only the genomes being analyses, the process
    is fast enough.
[NEW] now include compara tree_names, this can be used to select
    species instead of enumerating all names

[CHANGED] changed name of sample cfg file to sample.cfg
[NEW] add new properties for staging and installing genomes

[CHANGED] use these attributes in download and install functions
[NEW] reports on what has been installed
[CHANGED] wakepy not robust on linux
[CHANGED] if no alignment specified, no point trying to
    download one.
[CHANGED] move config related functions and classes to
    new _config.py module.
[CHANGED] mainly to allow a threaded and unthreaded version,
    as the latter is quite useful for debugging
[CHANGED] added download function for species trees and moved the
    species_from_ensembl_tree function to species
[NEW] trees_for_aligns() matches based on splitting file paths at
    the characters [_.- ]. As the names Ensembl gives the alignment directory
    only partly corresponds to the names of the tree files associated
    with the alignments, these mathes are incomplete. We select the tree with
    the maximum number of matches from the name start.
[NEW] merges in a new collection of species names
[NEW] func tion takes the list of alignment names and returns
    the corresponding species by identifying the alignment trees.
[NEW] use new capabilities to identify all the associated species
    and add those to the Config object.
[NEW] this and associated functions are used to write meta-data
    into the local install directory, or read that data from the
    the local install directory.
[CHANGED] now specify the relationship type. Currently limited to
    ortholog_one2one and ortholog_one2many
@GavinHuttley GavinHuttley merged commit 13dbf47 into develop Sep 19, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant