You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use rmats to get quick PSI estimates and read counts for exons using a few hundred single-replicate RNA-seq samples. I am grouping 8 control samples in b2.txt and ~300 single replicate experimental samples in b1.txt and running the following script:
I usually align with STAR ahead of time and then use the unsorted BAM files for rmats, which usually works great. However, after 24 hours the job fails to complete and times out. I'm wondering if rmats is faster if the reads are sorted by coordinate? I've done an analysis on a similar scale with clinical samples in the past and it took only ~2 hours with sorted BAM files.
I tried to monitor the progress of the job via files being populated in the tmp folder, but the only file that appears is 2024-11-06-19:43:26_535182_0.rmats and it is empty. Any thoughts?
The text was updated successfully, but these errors were encountered:
Using bam files that are sorted by coordinate might be a little faster (maybe due to cache performance), but I wouldn't expect a big difference
rMATS doesn't output much progress information. I don't think there will be any output until it has finished reading through all of the bam files. This post estimates 1 hour per 200 million alignments for the initial step: #323 (comment)
Also, that .rmats file has : in the name which I think means you are using rMATS older than v4.1.2. If that's the case you may run into a performance issue that was fixed in v4.1.2: https://github.com/Xinglab/rmats-turbo/releases #104
I'm trying to use rmats to get quick PSI estimates and read counts for exons using a few hundred single-replicate RNA-seq samples. I am grouping 8 control samples in b2.txt and ~300 single replicate experimental samples in b1.txt and running the following script:
I usually align with STAR ahead of time and then use the unsorted BAM files for rmats, which usually works great. However, after 24 hours the job fails to complete and times out. I'm wondering if rmats is faster if the reads are sorted by coordinate? I've done an analysis on a similar scale with clinical samples in the past and it took only ~2 hours with sorted BAM files.
I tried to monitor the progress of the job via files being populated in the tmp folder, but the only file that appears is 2024-11-06-19:43:26_535182_0.rmats and it is empty. Any thoughts?
The text was updated successfully, but these errors were encountered: