-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kraken2-build FTP connection error (timeout) #272
Comments
Hi, I just wanted to add to this that I've been having similar issues. I have also been able to download the taxonomy fine, as well as the non-redundant protein database, but have problems with getting the standard database, and have also tried getting bacteria only with no luck. We had thought it might be firewall issues on the server, but I've now also tried it on a different server, my own laptop and someone elses desktop, and each have the same error: I also tried the amendments to the rsync_from_ncbi.pl script suggested here as well as here, and still haven't been able to get it to work. I assume that it's not actually an issue with the ftp path as this works for the non-redundant database, and these amendments should be skipping any NAs. If anyone has any suggestions then I would appreciate that, too! Thanks, |
Hi again, Just wanted to update in case this is helpful to anyone else (and maybe this will help @Nick243). I have found that the scripts given in this repository work for downloading the databases (in my case slightly edited to download protein rather than genomic sequences), and then adding these to the library and building the database using the regular kraken2 scripts worked fine. Edited to say that this was working, and I built a database with over 1000 bacterial genomes, but at some point it didn't work. I don't know any perl, so wrote an almost equivalent python script that you can give options for all domains (including an extra option for only human), and also for either DNA or protein sequences. It will also check whether you have already downloaded a sequence, so if it got stopped for any reason then you wouldn't need to re-run, and it will give you a text file out at the end that will tell you about any problems that it had while downloading. It's here in case anyone is interested. Thanks, |
Hi Robyn, Thanks so much for sharing your script. This is extremely helpful! I was able to access and pull down the files. I am looking to download the complete refseq genomes (dna sequences) for bacteria, archaea, fungi, virus, and human. I was hoping to confirm I implemented this correctly using your program. I ran: This looks to have worked beautifully. Before trying to build the Kraken2 database; however, I was hoping I could ask:
Thanks again for sharing the script and in advance for any thoughts! Nick |
Hi Nick, No worries! Glad someone else can make use of it. And yes that looks correct. I just had a quick look, and it looks like the issue is that the reference human genome is a 'Chromosome' rather than 'Complete' - I've added a bit to the script so that it ignores the Complete part if downloading the human genome, but just running with --complete False would have the same effect. Robyn |
@Nick243 apologies for the late response. Issues with downloading are harder to debug as they vary from system to system, but essentially your server is having trouble connecting with NCBI. You can try downloading the files without the --use-ftp switch and see if that works any better. @R-Wright-1 We have updated the code to fix the na error. but yes, the default downloads only look for complete/chromosome level assemblies. You can modify the rsync_from_ncbi.pl script to include other assembly levels (Line 40 - add in Contig/Scaffold) However, draft genomes are much more likely to have contamination that may skew the results. As this issue is a few months old, I'm going to close it for now. If you continue to have problems with the newest code update, please open a new issue. |
Hello,
I am attempting to build a custom (standard plus fungi) kraken2 database and keep getting a timeout error. I wanted to ask if others have seen this before and had any suggestions?
I look to have been able to successfully install the NCBI taxonomy with:
module load kraken/2.0.8
kraken2-build --download-taxonomy --db /users/olljt2/kraken/db/. --threads 24 --use-ftp
However, when I try to install a database with:
kraken2-build --download-library bacteria --db /users/olljt2/kraken/db/. --threads 24 --use-ftp
I get the following error:
Step 1/2: Performing ftp file transfer of requested files
rsync_from_ncbi.pl: FTP connection error: Net::FTP: connect: timeout
Same thing happens when our cluster administrator attempts to install the files. I tried this on a few different occasions thinking/hoping maybe it was an issue on the NCBI side. I also tried with multiple different databases. Each time I get the same error. The files look to start to download, but do not seem to finish.
The full code used was:
module load kraken/2.0.8
kraken2-build --download-taxonomy --db /users/olljt2/kraken/db/. --threads 24 --use-ftp
kraken2-build --download-library bacteria --db /users/olljt2/kraken/db/. --threads 24 --use-ftp
kraken2-build --download-library archaea --db /users/olljt2/kraken/db/. --threads 24 --use-ftp
kraken2-build --download-library viral --db /users/olljt2/kraken/db/. --threads 24 --use-ftp
kraken2-build --download-library fungi --db /users/olljt2/kraken/db/. --threads 24 --use-ftp
kraken2-build --download-library human --db /users/olljt2/kraken/db/. --threads 24 --use-ftp
kraken2-build --download-library UniVec_Core --db /users/olljt2/kraken/db/. --threads 24 --use-ftp
kraken2-build --build --db /users/olljt2/kraken/db/. --threads 24
The --build command returned:
Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map complete. [0.062s]
Estimating required capacity (step 2)...
Estimated hash table requirement: 981576 bytes
Capacity estimation complete. [0.095s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 2 bits reserved for taxid.
Completed processing of 3137 sequences, 687518 bp
Writing data to disk... complete.
Database files completed. [7.798s]
Database construction complete. [Total: 7.994s]
Which looks to be just a few sequences.
Any thoughts or suggestions would be greatly appreciated!
Thanks in advance,
Nick
The text was updated successfully, but these errors were encountered: