-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readBed parsing failure #185
Comments
This could be due to the potential changes in the dependencies we use to
read the files faster. If you send a reproducible example I will take a look
On Tue 4. Jun 2019 at 15:17, socanas ***@***.***> wrote:
I have been using the readBed feature of genomation to read Bed files into
R for use in Enriched Heatmaps. Recently I have been getting a parsing
failures error, ie.:
Warning: 62474 parsing failures.
row col expected actual file
199 X4 no trailing characters .3333
'EnrichedHM.final.sort.formatted.a1-bs_input_CpG.txt'
430 X4 no trailing characters .6667
'EnrichedHM.final.sort.formatted.a1-bs_input_CpG.txt'
1046 X4 no trailing characters .6667
'EnrichedHM.final.sort.formatted.a1-bs_input_CpG.txt'
Any number in the bed file that has a decimal value is changed to NA in
the GRanges object. I do not want to truncate or round these values. I am
sure that I am missing something simple. I have used the readBed function
for similar files and never have had an issue. Any suggestions?
Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#185?email_source=notifications&email_token=AAE32ENHASZCDAPFPBSOYKLPYZTP7A5CNFSM4HS4IGG2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GXQ3C7Q>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAE32EO27HKRBZF27MJG23DPYZTP7ANCNFSM4HS4IGGQ>
.
--
Sent from mobile, excuse the brevity
|
Thank you for your quick reply! The file test1.txt gives the parsing error and test2.txt does not give the parsing error. Both files were formatted with the same scripts. Columns are "chr, start, stop, methylation %, coverage, strand (+/-)". test1<-readBed("test1.txt", remove.unusual=TRUE)
|
Hi @socanas, file test1.txt gives the parsing error and test2.txt does not give the parsing error, because readBed uses first 30 rows to detect classes of columns (character, integer, decimal numbers etc) and in the test1 file your 4th column in the first 30 rows doesnt have a decimal number, but in test2 file you have them. I would just add chr1 3000827 3000827 100.0 1 + cheers, |
hmm, I've also had such issue in the past, maybe rewriting a function that reads BED files https://github.com/BIMSBbioinfo/genomation/blob/master/R/readData.R#L68 from |
@katwre Thank you for the solution! That works great! |
Thank you @katwre !! I think we went with read_delim because it could read gzipped files at the time, but now data.table::fread can also read gzipped files without piping afaik. if that's the case we can use fread |
@al2na ah true! but it looks like fread reads gzziped files now too, so it could work
|
then we can change it to fread:)
…On Wed, Jun 5, 2019 at 7:48 PM katwre ***@***.***> wrote:
@al2na <https://github.com/al2na> ah true! but it looks like fread reads
gzziped files now too, so it could work
> data.table::fread("test2.txt.gz")
V1 V2 V3 V4 V5 V6
1: chr1 3001630 3001630 100 2 -
2: chr1 3003227 3003227 100 1 -
3: chr1 3003340 3003340 100 2 -
4: chr1 3003380 3003380 0 1 -
5: chr1 3003582 3003582 100 1 +
---
1996: chr1 3670743 3670743 0 1 -
1997: chr1 3670752 3670752 0 1 -
1998: chr1 3670776 3670776 0 2 +
1999: chr1 3670821 3670821 0 1 +
2000: chr1 3670861 3670861 0 1 +
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#185?email_source=notifications&email_token=AAE32EIYH4VAIXIYL5XCXDTPY736LA5CNFSM4HS4IGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAPYQY#issuecomment-499186755>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAE32EMJHVSIDI636753IW3PY736LANCNFSM4HS4IGGQ>
.
|
I have been using the readBed feature of genomation to read Bed files into R for use in Enriched Heatmaps. Recently I have been getting a parsing failures error, ie.:
Warning: 62474 parsing failures.
row col expected actual file
199 X4 no trailing characters .3333 'EnrichedHM.final.sort.formatted.a1-bs_input_CpG.txt'
430 X4 no trailing characters .6667 'EnrichedHM.final.sort.formatted.a1-bs_input_CpG.txt'
1046 X4 no trailing characters .6667 'EnrichedHM.final.sort.formatted.a1-bs_input_CpG.txt'
Any number in the bed file that has a decimal value is changed to NA in the GRanges object. I do not want to truncate or round these values. I am sure that I am missing something simple. I have used the readBed function for similar files and never have had an issue. Any suggestions?
Thanks!
The text was updated successfully, but these errors were encountered: