Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preliminary analysis fails #26

Open
nvpatin opened this issue Oct 12, 2023 · 6 comments
Open

Preliminary analysis fails #26

nvpatin opened this issue Oct 12, 2023 · 6 comments

Comments

@nvpatin
Copy link

nvpatin commented Oct 12, 2023

Hi team! I think I'm at a point where I can start posting issues here rather than individually emailing people, particularly because I think some problems will be widespread once others try to use the pipeline.

Although my "final_data" folder contains all the output files described in the wiki, the "analysis_output" folder is always empty. The wiki says for the preliminary analysis: "This is a quarto file that will take in the output files from DADA2 and create plots and statistics regarding read retention, read lengths, quality, and more." I think this is supposed to be an HTML output? In any case it would be great to get that final step working.

I think two issues might be preventing the preliminary analysis: 1) read file names that are different from the required format and 2) a metadata sheet that is different from the required format.

  1. Although I always try to rename the fastq files according to the formula, it's possible something is off. Here is an example of a read pair that I've renamed: MFU-FISH_001-d1-1_S1_L001_R1_001.fastq.gz and MFU-FISH_001-d1-1_S1_L001_R2_001.fastq.gz. Simplifying the requirements for raw read file names would be a huge help; I'm always nervous about renaming any raw data.

  2. The sample sheets we get from our sequencing center are quite different from the example sheet for the pipeline. I've uploaded an example raw file ("SampleSheet.csv") as well as a modified sheet that I made manually to try to match the example sheet ("SampleSheetUsed.csv"). In the long run, this is a big pain in the butt! Maybe we can simplify the metadata sheet requirements? I suspect something about the sample sheet is preventing the preliminary analysis but I'm not sure. In the last step of the pipeline I get an error that says "Run name not found."
    SampleSheet.csv
    SampleSheetUsed.csv

@avancise
Copy link
Contributor

avancise commented Oct 12, 2023 via email

@nvpatin
Copy link
Author

nvpatin commented Oct 16, 2023

Thanks Amy! I can definitely work on Option 2 to edit the .Rmd file. For what it's worth, I tried changing the file names again to match the formula 100%, but that didn't help so it must be about the metadata sheet. I also think there might be a missing R module in the Docker image; see below for the full error message from my most recent analysis.

In the long run, if this is a tool we want to disseminate to other labs or scientists, I think it will be important to incorporate more flexibility in the file names and formats. I may be able to help with some of that with my .Rmd edits. Will keep everyone posted.

pipeline error message:

[1] "Starting Taxonomy Assignment at 2023-10-12 21:21:23.264674"
Finished processing reference fasta.[1] "Finished Taxonomy Assignment at 2023-10-12 21:22:17.561918 ."
Warning messages:
1: In grSoftVersion() :
unable to load shared object '/usr/local/lib/R/modules//R_X11.so':
libXt.so.6: cannot open shared object file: No such file or directory
2: In min(which(window_values < primer.data$F_qual[i])) :
no non-missing arguments to min; returning Inf
3: In min(which(window_values < primer.data$F_qual[i])) :
no non-missing arguments to min; returning Inf
4: In min(which(window_values < primer.data$F_qual[i])) :
no non-missing arguments to min; returning Inf
5: In min(which(window_values < primer.data$F_qual[i])) :
no non-missing arguments to min; returning Inf
6: In min(which(window_values < primer.data$R_qual[i])) :
no non-missing arguments to min; returning Inf
7: Using all_of() outside of a selecting function was deprecated in tidyselect 1.2.0.
ℹ See details at https://tidyselect.r-lib.org/reference/faq-selection-context.html
finished step 1. 21:22:17
starting step 3: making the stats file... 21:22:20
finished step 3. 21:22:23
metabarcoding pipeline complete! 21:22:25

@avancise
Copy link
Contributor

avancise commented Oct 16, 2023 via email

@nvpatin
Copy link
Author

nvpatin commented Oct 16, 2023

Right, sorry, it's not really an error message but I was wondering if there might be a link between the warnings and the failure to run the final step. Most likely it's due to the metadata sheet though so I'll try solving that first.

Re: pipeline availability, got it, thanks for clarifying. I'll just see what I can do to get it working for our sequence data files.

@nvpatin
Copy link
Author

nvpatin commented Oct 16, 2023

Just to confirm is it the "Report_MURI_Module3.qmd" file that generates the preliminary analyses? I can't find a .Rmd file in the file system.

@avancise
Copy link
Contributor

avancise commented Oct 17, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants