-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ComBat-Seq correction creating very large count values--cannot feed into DESeq2 #20
Comments
I have a similar issue. For one gene, ComBat-seq adjusts raw counts from ~15 to ~3e+13, but not for all of the samples. Below is R code showing the raw counts and adjusted counts for this gene across all samples:
Any advice on how to avoid this would be welcome. Thanks! |
I'm also experiencing this issue. This is the first small section of compiled raw reads from star (my actual data is 45 samples for 55k genes) Gene_ID SRB000C1 SRB000C2 SRB0001 SRB0002 SRB0003
This is the same section after applying combat seq Also; I have noticed is that the program seems to have a row/column issue. whenever I use it, my output always have a column name missing in row one. Even when I delete all rownames and column names and just load the numerical data the output loses its final rowname and the first cell (A:1) vanishes : see syntax and output below.
V1 | V2 | V3 | V4 ............. V43 | V44 | V45 | Not that there are 3 entries (v34, v44, v45 - but 4 values underneath, this is because the row names all move to the left. Unsure if the two are related but it others could check their results and see if they have the same bug it might help untangle this? |
So I've looked into this further, it seems to give reliable numbers if you just use the batch effect function, but adding in covariates causes the compounding factors. So you can use it to just correct for batch effect reliably. Regarding covariates: I'm going to pick this apart, the investigation continues... |
@J-Lye , I am also able to avoid the very large counts issue by using Regarding the missing column, have you checked that |
In my case unreasonably large numbers got introduced for genes that were expressed in few samples only. Removing these genes fixed the issue. As an example, to remove genes that are expressed in fewer than a third of samples you can do |
Dear Benjamin Ostendorf, Thank you for your great suggestion. Could I ask one more question? My question is: Is it okay to run Combat-seq after filtering all those genes that are expressed in fewer than a certain proportion of samples? I am analyzing bulk RNA-seq dataset of pediatric tumor samples. I filtered genes based on the following criteria.
I found the author's comment about filtering low expressed genes in here: #3 "Filtering low expressed genes is recommended before using ComBat-Seq, although the latest version should identify genes with only 0s in any batch, and keep them unchanged." I am confusing the concept between the low expressed gene and the gene that is expressed in fewer than a certain proportion of samples (but expressed very high in one or few samples). Any comments would be of great help. Thank you in advance. Sincerely, Seunghoon |
Can confirm the issue when using
sva: v3.42.0 |
anyone figured this out? I have the same problem even if just using group and not covar_mod. I am getting like 898183362094 as max value! |
Are there any workarounds that can be done for the genes exhibiting a large number of counts before feeding the batch-corrected matrix to DESeq2? Has this issue been solved in any other way? |
Same thing happened to me, how to deal with it final? |
Still seeing this issue, routinely, across multiple datasets. At this point I'm just removing the problematic genes and re-running combat. Hoping to see some clarifying notes sometime? |
Hey,
Thanks for making this software, it seems really helpful (and I'd love to use it if I can get this to work). The matrix generated by ComBat-Seq cannot be read into DESeq. My current guess is that something about the batch correction creates very large values in the matrix. I can see this when looking at the highest values in the matrix, where some very low raw counts (<10) are being corrected to very large numbers (+e17).
When DESeq2 tries to convert these to integers, it cannot because the value is too high for DESeq2 to handle. This isn't really the concern, however, as I can't imagine a correction should ever change a count value that drastically so I may be doing something incorrectly in my implementation of ComBat-Seq.
A little information on the project: these are technical replicates where the same library was sequenced twice, the second batch being at much higher read depth. Each batch was analyzed separately and counts generated by HTSeq-Count were used for batch correction. I used the covar_mod to account for my two treatment variables.
Please let me know what you think or what other information would be helpful to provide here.
Thanks in advance--I really appreciate your help.
Christine
The text was updated successfully, but these errors were encountered: