Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV: making the develop branch the default #7

Merged
merged 74 commits into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
df0d769
Merge pull request #30 from cogent3/master
GavinHuttley Jul 30, 2023
89e68c8
MAINT: explicit name for download_aligns
GavinHuttley Aug 8, 2023
c3bd1d5
ENH: added homology download capabilities
GavinHuttley Aug 9, 2023
899ca95
ENH: generalise the emf parser
GavinHuttley Aug 9, 2023
bdbde05
MAINT: code tidy
GavinHuttley Aug 9, 2023
6867bdd
Merge pull request #31 from cogent3/develop
GavinHuttley Aug 9, 2023
dac7eff
MAINT: more specific validator class name
GavinHuttley Aug 8, 2023
17e96fa
ENH: added option for overwriting existing installation
GavinHuttley Aug 9, 2023
6dfe601
MAINT: code tidy and remove unused import
GavinHuttley Aug 9, 2023
1fa8a48
ENH: add option for doing checksum
GavinHuttley Aug 9, 2023
69657c1
ENH: add maf parser
GavinHuttley Aug 9, 2023
44d18ff
ENH: download alignment maf files instead of emf
GavinHuttley Aug 9, 2023
a014253
ENH: added maf name class
GavinHuttley Aug 9, 2023
8ab8c52
ENH: basic implementation of installing alignments
GavinHuttley Aug 9, 2023
3cf0525
MAINT: change aligndb test to using maf file
GavinHuttley Aug 10, 2023
f1bb08e
MAINT: make base Sqlite mixin class and convert AlignDb to use it
GavinHuttley Aug 10, 2023
7420f55
DOC: use cog for updating the readme
GavinHuttley Aug 10, 2023
57118c0
DOC: fixed docstring
GavinHuttley Aug 10, 2023
bbc8a47
MAINT: fix type hint
GavinHuttley Aug 10, 2023
befa83d
ENH: added Config properties for compara types
GavinHuttley Aug 10, 2023
7bf1759
ENH: use trogon for a nice terminal ui
GavinHuttley Aug 10, 2023
3932db0
ENH: show progress bar during validation of checksums
GavinHuttley Aug 10, 2023
670e46e
DOC: add detail to docs
GavinHuttley Aug 10, 2023
eb28aad
DOC: added demo config file to readme
GavinHuttley Aug 10, 2023
e5e214c
DOC: fixed typos
GavinHuttley Aug 10, 2023
e342444
DOC: better language
GavinHuttley Aug 10, 2023
276e39d
DOC: more accurate install time
GavinHuttley Aug 10, 2023
a9a3ee1
ENH: added homology DB module
GavinHuttley Aug 10, 2023
3cfc3df
MAINT: better progress text message
GavinHuttley Aug 10, 2023
5669c80
ENH: added functions, classes to support loading homology database
GavinHuttley Aug 10, 2023
064aa8f
TST: basic tests of homology db and loader
GavinHuttley Aug 10, 2023
b2ce788
ENH: added HomologyDb.get_related_to()
GavinHuttley Aug 11, 2023
6c1ab6e
ENH: added HomologyDb.get_related_groups()
GavinHuttley Aug 11, 2023
bc45472
ENH: added subcommand that exports all 1-to-1 ortholog groups
GavinHuttley Aug 11, 2023
4d5b4fc
MAINT: delete unused class
GavinHuttley Aug 11, 2023
eff3997
ENH: support specifying species using compara tree
GavinHuttley Aug 11, 2023
c9c844e
ENH: use remote_path option
GavinHuttley Aug 11, 2023
ad0fefb
MAIN: get updated species list and modified Species handling
GavinHuttley Aug 12, 2023
bce418b
ENH: added compara section tree_names option
GavinHuttley Aug 12, 2023
ed7df1d
MAINT: code tidy
GavinHuttley Aug 12, 2023
d857fc9
MAINT: code tidy
GavinHuttley Aug 14, 2023
d5166c3
MAINT: consistent type hints
GavinHuttley Aug 14, 2023
e895c8a
MAINT: fixed type hint error
GavinHuttley Aug 14, 2023
0e3fff5
MAINT: code tidy
GavinHuttley Aug 14, 2023
22452fb
ENH: added rich table display function
GavinHuttley Aug 17, 2023
d96b3ca
MAINT: click commands show help by default
GavinHuttley Aug 24, 2023
427d9ba
MAINT: updated sample config
GavinHuttley Aug 24, 2023
6a7fb35
MAINT: Species methods return value not always string
GavinHuttley Aug 24, 2023
037fabe
API: change install location of genome
GavinHuttley Aug 25, 2023
77cb76f
ENH: added cli command installed
GavinHuttley Aug 25, 2023
d9707c0
DEV: configure github actions
GavinHuttley Aug 25, 2023
92a6244
TST: skip cli tests on linux
GavinHuttley Aug 25, 2023
812e79f
ENH: more robust exportrc
GavinHuttley Aug 25, 2023
1c2ddb2
MAINT: still dealing with exportrc
GavinHuttley Aug 25, 2023
26d9545
DEV: added github action status badges
GavinHuttley Aug 25, 2023
1563ac1
DEV: connect with coveralls
GavinHuttley Aug 25, 2023
b1f4ab2
DEV: added code coverage badge
GavinHuttley Aug 25, 2023
50af260
ENH: simple Genome database
GavinHuttley Aug 25, 2023
4941f63
MAINT: exit download functions if not required
GavinHuttley Sep 6, 2023
80a151d
TST: mark the cli test_download as slow
GavinHuttley Sep 6, 2023
65bb1dd
MAINT: refactor related to configs
GavinHuttley Sep 7, 2023
86a972e
MAINT: wrap download into separate function
GavinHuttley Sep 7, 2023
e24fea1
MAINT: remove unused import
GavinHuttley Sep 7, 2023
0d65190
ENH: add the _config module
GavinHuttley Sep 7, 2023
3981f6e
DEV: added a sourcery config file
GavinHuttley Sep 7, 2023
8a32799
ENH: split species tree function
GavinHuttley Sep 7, 2023
a04c1d9
MAINT: test Species contains
GavinHuttley Sep 9, 2023
a366e75
ENH: added function for matching ensembl alignment and tree names
GavinHuttley Sep 9, 2023
1eac5ea
ENH: added Config.update_species() method
GavinHuttley Sep 9, 2023
6c4131d
MAINT: code tidy
GavinHuttley Sep 9, 2023
c8cbf5d
ENH: added download.get_species_for_alignments()
GavinHuttley Sep 9, 2023
a2067a5
ENH: can now just specify the alignment name in the config
GavinHuttley Sep 9, 2023
2d1c14a
ENH: added InstallConfig class
GavinHuttley Sep 9, 2023
cfd60f0
ENH: enamed ortholog1to1 command line function to homologs
GavinHuttley Sep 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
push:
branches: [ "master" ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ "master" ]
schedule:
- cron: '39 20 * * 6'

jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write

strategy:
fail-fast: false
matrix:
language: [ 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support

steps:
- name: Checkout repository
uses: actions/checkout@v3

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.

# Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality


# Autobuild attempts to build any compiled languages (C/C++, C#, Go, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2

# ℹ️ Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun

# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.

# - run: |
# echo "Run, Build Application using script"
# ./location_of_script_within_repo/buildscript.sh

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"
57 changes: 57 additions & 0 deletions .github/workflows/testing_develop.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: CI

on:
push:
branches: [ "master"]
pull_request:
branches: [ "master"]

jobs:
tests:
name: "Python ${{ matrix.python-version }}"
runs-on: ${{ matrix.os }}

strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.10", "3.11"]

steps:
- uses: "actions/checkout@v2"
with:
fetch-depth: 0

# Setup env
- uses: "actions/setup-python@v2"
with:
python-version: "${{ matrix.python-version }}"

- name: "Installs for ${{ matrix.python-version }}"
run: |
python --version
pip install --upgrade pip wheel setuptools nox

- name: "Run nox for ${{ matrix.python-version }}"
run: "nox -s test-${{ matrix.python-version }}"

- name: Coveralls Parallel
uses: coverallsapp/github-action@master
with:
parallel: true
github-token: ${{ secrets.github_token }}
flag-name: run-${{ matrix.test_number }}
path-to-lcov: "tests/lcov-${{ matrix.python-version }}.info"

finish:
needs: tests
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [macos-latest, ubuntu-latest, windows-latest]
python-version: ["3.10", "3.11"]
steps:
- name: Coveralls Finished
uses: coverallsapp/github-action@master
with:
github-token: ${{ secrets.github_token }}
parallel-finished: true
2 changes: 2 additions & 0 deletions .sourcery.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
refactor:
python_version: '3.8'
104 changes: 102 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,106 @@
[![CI](https://github.com/cogent3/ensembl_cli/actions/workflows/testing_develop.yml/badge.svg)](https://github.com/cogent3/ensembl_cli/actions/workflows/testing_develop.yml)
[![CodeQL](https://github.com/cogent3/ensembl_cli/actions/workflows/codeql.yml/badge.svg)](https://github.com/cogent3/ensembl_cli/actions/workflows/codeql.yml)
[![Coverage Status](https://coveralls.io/repos/github/cogent3/ensembl_cli/badge.svg?branch=master)](https://coveralls.io/github/cogent3/ensembl_cli?branch=master)

# ensembl_cli

## Installation

Suggest creating a conda environment or a python virtual environment, using python3.11. Then install directly into that environment from the GitHub repo as

```
$ python -m pip install "ensembl_cli @ git+https://github.com/cogent3/ensembl_cli.git@develop"
```

Then run for the first time using

```
$ ensembl_cli tui
```

The first start takes a while as, behind the scenes, cogent3 is transpiling various functions into C and compiling them. Eventually, you get a very neat terminal interface you can click around in. To exit, make sure the "root" is selected on the left panel then `^+r`.

## Usage

The setup is (for now) controlled using a config file, defined in `ini` format. To get a starting template use the `exportrc` subcommand.

<!-- [[[cog
import cog
from ensembl_cli import cli
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(cli.main, ["exportrc", "--help"])
help = result.output.replace("Usage: main", "Usage: ensembl_cli")
cog.out(
"```\n{}\n```".format(help)
)
]]] -->
```
Usage: ensembl_cli exportrc [OPTIONS]

exports sample config and species table to the nominated path

setting an environment variable ENSEMBLDBRC with this path will force its
contents to override the default ensembl_cli settings

Options:
-o, --outpath PATH path to directory to export all rc contents
--help Show this message and exit.

```
<!-- [[[end]]] -->

<details>
<summary> Click to see a sample config file I've been using for development </summary>

Using this config, it takes approximately 16' to download (over a ~200MB/s WiFi connection) and ~45' to install on my M2 Macbook Pro (note the install is incomplete). (Note this step uses up to 10 CPU cores.)

```
[remote path]
host=ftp.ensembl.org
path=pub
[local path]
staging_path=~/Desktop/Outbox/ensembl_download
install_path=~/Desktop/Outbox/ensembl_install
[release]
release=110
[Mouse Lemur]
db=core
[Macaque]
db=core
[Gibbon]
db=core
[Orangutan]
db=core
[Bonobo]
db=core
[Human]
db=core
[Chimp]
db=core
[Gorilla]
db=core
[compara]
align_names=10_primates.epo
```
</details>

### Download

Downloads the species indicated in the config file:

- genomes sequences as fasta format
- annotations as gff3
- gene homologies for individual genomes in tsv format

Alignments indicated in the config file will be downloaded in `.maf` format.

Downloads are written to a local directory, specified in the config file. Downloads are done in parallel (using threads).

### Install

"Installation" involves transforming downloaded files into local sqlite3 databases which are saved to the location specified in the config file.

## Downloading
From the maf alignment files, the "ancestral" sequences are discarded and for every aligned sequence only the gap data is stored (i.e. gap position and length) along with the genomic coordinates. These alignments will be reconstructable by combining this information with the whole genome sequence. (This approach reduces storage requirements ~5-fold).

Make the setting in
Installation is done in parallel on multiple CPUs (since the data need to be decompressed on the fly).
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ dependencies = ["click",
"scitrack",
"typing_extensions",
"cogent3 @ git+https://github.com/cogent3/cogent3.git@develop",
"pydantic",
"rich",
"numba",
"numpy",
"trogon",
"unsync",
"wakepy",
]
Expand Down Expand Up @@ -64,6 +64,7 @@ doc = ["click==8.1.3",
"sphinxcontrib-bibtex"]
dev = ["black==23.3.0",
"click==8.1.3",
"cogapp",
"flit",
"ipykernel",
"ipython",
Expand Down
Loading
Loading