Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the mouse (GRCm38) library #21

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/MANUAL.html
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ <h1 id="custom-databases"><a href="#custom-databases">Custom Databases</a></h1>
<li>plasmids: RefSeq plasmid sequences</li>
<li>viruses: RefSeq complete viral genomes</li>
<li>human: GRCh38 human genome</li>
<li>mouse: GRCm38 mouse genome</li>
</ul>
<p>To download and install any one of these, use the <code>--download-library</code> switch, e.g.:</p>
<pre><code>kraken-build --download-library bacteria --db $DBNAME</code></pre>
Expand Down
1 change: 1 addition & 0 deletions docs/MANUAL.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,7 @@ To build a custom database:
- plasmids: RefSeq plasmid sequences
- viruses: RefSeq complete viral genomes
- human: GRCh38 human genome
- mouse: GRCm38 mouse genome

To download and install any one of these, use the `--download-library`
switch, e.g.:
Expand Down
32 changes: 30 additions & 2 deletions scripts/download_genomic_library.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
# plasmids - NCBI RefSeq plasmid sequences
# viruses - NCBI RefSeq complete viral DNA and RNA genomes
# human - NCBI RefSeq GRCh38 human reference genome
# mouse - NCBI RefSeq GRCm38 mouse reference genome

set -u # Protect against uninitialized vars.
set -e # Stop on error
Expand Down Expand Up @@ -101,7 +102,7 @@ case "$1" in
do
wget --spider --no-remove-listing $FTP_SERVER/genomes/H_sapiens/$directory/
file=$(perl -nle '/^-/ and /\b(hs_ref_GRCh\S+\.fa\.gz)\s*$/ and print $1' .listing)
[ -z "$file" ] && exit 1
[ -z $file ] && exit 1
rm .listing
wget $FTP_SERVER/genomes/H_sapiens/$directory/$file
gunzip "$file"
Expand All @@ -112,8 +113,35 @@ case "$1" in
echo "Skipping download of human genome, already downloaded here."
fi
;;
"mouse")
mkdir -p $LIBRARY_DIR/Mouse
cd $LIBRARY_DIR/Mouse
if [ ! -e "lib.complete" ]
then
# get list of CHR_* directories
wget --spider --no-remove-listing $FTP_SERVER/genomes/M_musculus/
directories=$(perl -nle '/^d/ and /(CHR_\w+)\s*$/ and print $1' .listing)
rm .listing

# For each CHR_* directory, get GRCh* fasta gzip file name, d/l, unzip, and add
for directory in $directories
do
wget --spider --no-remove-listing $FTP_SERVER/genomes/M_musculus/$directory/
file=$(perl -nle '/^-/ and /\b(mm_ref_GRCm\S+\.fa\.gz)\s*$/ and print $1' .listing)
[ -z $file ] && exit 1
rm .listing
wget $FTP_SERVER/genomes/M_musculus/$directory/$file
gunzip "$file"
done

touch "lib.complete"
else
echo "Skipping download of mouse genome, already downloaded here."
fi
;;
*)
echo "Unsupported library. Valid options are: "
echo " bacteria plasmids virus human"
echo " bacteria plasmids virus human mouse"
;;

esac
4 changes: 2 additions & 2 deletions scripts/kraken-build
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ my $DEF_MINIMIZER_LEN = 15;
my $DEF_KMER_LEN = 31;
my $DEF_THREAD_CT = 1;

my @VALID_LIBRARY_TYPES = qw/bacteria plasmids viruses human/;
my @VALID_LIBRARY_TYPES = qw/bacteria plasmids viruses human mouse/;

# Option/task option variables
my (
Expand Down Expand Up @@ -200,7 +200,7 @@ Task options (exactly one must be selected):
--download-taxonomy Download NCBI taxonomic information
--download-library TYPE Download partial library
(TYPE = one of "bacteria", "plasmids",
"viruses", "human")
"viruses", "human","mouse")
--add-to-library FILE Add FILE to library
--build Create DB from library
(requires taxonomy d/l'ed and at least one file
Expand Down