Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whippet index failed, link to download the hg38 GTF2.2 on this website does not work #149

Open
santataRU opened this issue Jul 3, 2024 · 1 comment

Comments

@santataRU
Copy link

Dear All,

I encountered an error while running whippet-index.jl. The error appears to be due to my GTF file not being in the 2.2 format. Could you please let me know where to download the correct GTF format (GTF2.2) for hg38? The link provided on this website does not work. Is there an updated link available?

Thank you,

Xiao

PS: error messages:
(base) xiaolei@Xiaos-Laptop bin % julia ./whippet-index.jl --fasta /Users/xiaolei/Whippet.jl/anno/hg38.fa.gz --gtf /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation_sorted.gtf.gz
Whippet v1.6.2 loading...
Activating environment at ~/Whippet.jl/Project.toml
5.640663 seconds.
Loading GTF file: /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation_sorted.gtf.gz

┌ Warning: Using low quality Transcript Support Levels (TSL 3+) in your GTF file is not recommended!
│ For more information on TSL, see: http://www.ensembl.org/Help/Glossary?id=492

│ If you would like Whippet to ignore these when building its index, use --suppress-low-tsl option!

└ @ Whippet ~/Whippet.jl/src/refset.jl:159
ERROR: LoadError: ERROR: GTF file is not in valid GTF2.2 format!

ERROR: Annotation entries for 'transcript_id' ENST00000430923.7 has already been fully processed and closed.
HINT: All GTF lines with the same 'transcript_id' must be adjacent in the GTF file and referring to the same transcript and gene!
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] load_gtf(fh::BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}; txbool::Bool, suppress::Bool, usebam::Bool, bamreader::Nullable{XAM.BAM.Reader}, bamreads::Int64, bamoneknown::Bool)
@ Whippet ~/Whippet.jl/src/refset.jl:165
[3] macro expansion
@ ~/Whippet.jl/src/timer.jl:5 [inlined]
[4] main()
@ Main ~/Whippet.jl/bin/whippet-index.jl:91
[5] top-level scope
@ ~/Whippet.jl/src/timer.jl:5
in expression starting at /Users/xiaolei/Whippet.jl/bin/whippet-index.jl:108

@santataRU
Copy link
Author

santataRU commented Jul 3, 2024

I just found the cause of the problem. I used a sorted hg38 Ensembl GTF file for the IGV browser instead of the original unsorted GTF file. In the sorted GTF file, all entries for a transcript are NOT in a continuous block, causing problems in index building.

After I ran whippet-index with the original unsorted GTF, I got a warning message: "Using low quality Transcript Support Levels (TSL 3+) in your GTF file is not recommended!" and "If you would like Whippet to ignore these when building its index, use the --suppress-low-tsl option!"

Is it better to use the --suppress-low-tsl option?

Thanks,

Xiao

PS: warning message with unsorted hg38 GTF file.

(base) xiaolei@Xiaos-Laptop bin % julia ./whippet-index.jl --fasta /Users/xiaolei/Whippet.jl/anno/hg38.fa.gz --gtf /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation.gtf.gz
Whippet v1.6.2 loading...
Activating environment at ~/Whippet.jl/Project.toml
5.724591 seconds.
Loading GTF file: /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation.gtf.gz

Warning: Using low quality Transcript Support Levels (TSL 3+) in your GTF file is not recommended!
│ For more information on TSL, see: http://www.ensembl.org/Help/Glossary?id=492

If you would like Whippet to ignore these when building its index, use --suppress-low-tsl option!

└ @ Whippet ~/Whippet.jl/src/refset.jl:159
Loaded 643514 annotated splice-sites from GTF file..
122.786011 seconds (209.44 M allocations: 20.170 GiB, 2.98% gc time)
Indexing transcriptome...
Decompressing and Indexing /Users/xiaolei/Whippet.jl/anno/hg38.fa.gz...
Building Splice Graphs for chr1..
8.388163 seconds (44.79 M allocations: 22.590 GiB, 5.43% gc time)

Dear All,

I encountered an error while running whippet-index.jl. The error appears to be due to my GTF file not being in the 2.2 format. Could you please let me know where to download the correct GTF format (GTF2.2) for hg38? The link provided on this website does not work. Is there an updated link available?

Thank you,

Xiao

PS: error messages: (base) xiaolei@Xiaos-Laptop bin % julia ./whippet-index.jl --fasta /Users/xiaolei/Whippet.jl/anno/hg38.fa.gz --gtf /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation_sorted.gtf.gz Whippet v1.6.2 loading... Activating environment at ~/Whippet.jl/Project.toml 5.640663 seconds. Loading GTF file: /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation_sorted.gtf.gz

┌ Warning: Using low quality Transcript Support Levels (TSL 3+) in your GTF file is not recommended! │ For more information on TSL, see: http://www.ensembl.org/Help/Glossary?id=492 │ │ If you would like Whippet to ignore these when building its index, use --suppress-low-tsl option! │ └ @ Whippet ~/Whippet.jl/src/refset.jl:159 ERROR: LoadError: ERROR: GTF file is not in valid GTF2.2 format!

ERROR: Annotation entries for 'transcript_id' ENST00000430923.7 has already been fully processed and closed. HINT: All GTF lines with the same 'transcript_id' must be adjacent in the GTF file and referring to the same transcript and gene! Stacktrace: [1] error(s::String) @ Base ./error.jl:33 [2] load_gtf(fh::BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}; txbool::Bool, suppress::Bool, usebam::Bool, bamreader::Nullable{XAM.BAM.Reader}, bamreads::Int64, bamoneknown::Bool) @ Whippet ~/Whippet.jl/src/refset.jl:165 [3] macro expansion @ ~/Whippet.jl/src/timer.jl:5 [inlined] [4] main() @ Main ~/Whippet.jl/bin/whippet-index.jl:91 [5] top-level scope @ ~/Whippet.jl/src/timer.jl:5 in expression starting at /Users/xiaolei/Whippet.jl/bin/whippet-index.jl:108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant