Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Evaluating Reference Data for Bulk RNA Deconvolution tutorial #5549

Open
wants to merge 72 commits into
base: main
Choose a base branch
from

Conversation

hexhowells
Copy link
Collaborator

New tutorial on evaluating reference data for bulk RNA deconvolution tools, evaluating both MuSiC and NNLS deconvolution tools within Galaxy.

hexhowells and others added 30 commits October 5, 2024 08:18
Copy link
Member

@shiltemann shiltemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hexhowells! This looks great. A few minor comment below. And I can't speak to the science, but perhaps @nomadscientist can have a look here as well?

title: Evaluating Reference Data for Bulk RNA Deconvolution
subtopic: deconvo
priority: 3
zenodo_link: ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zenodo_link: ''
zenodo_link: 'https://zenodo.org/records/5719228'

> - *"Delimited by"*: `Tab`
> - *"How should the results be sorted?"*: `With the most common value first`
>
> 2. **Rename** {% icon galaxy-pencil %} output `Cell type counts`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the faq for renaming a dataset here at the first time people are asked to do it?

> {% snippet faqs/galaxy/workflows_run.md %}
{: .hands_on}

<iframe title="Galaxy Workflow Embed" style="width: 100%; height: 700px; border: none;" src="https://usegalaxy.eu/published/workflow?id=76d3408d0d22ad05&embed=true&buttons=true&about=false&heading=false&minimap=true&zoom_controls=true&initialX=-20&initialY=-20&zoom=0.5"></iframe>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!


title: Evaluating Reference Data for Bulk RNA Deconvolution
subtopic: deconvo
priority: 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean for this tutorial to be 3rd in the subsection? It is second now because the other tutorials in the subsections have priorities 1 and 4 listed. So please double check if the order is how you want it now.


**Remember** since we have a collection of 20 inputs, the output of this workflow will be a collection of 20 elements, each corresponding to the input elements. Each output will have its own random selection of 200 cells.

> <comment-title>Inputting multiple datasets</comment-title>
Copy link
Member

@shiltemann shiltemann Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could probably be an FAQ. or maybe you could use the existing "select multiple datasets" faq ? And maybe enhance it with your screenshot?

> - Copy the URL (e.g. via right-click) of [this workflow](https://usegalaxy.eu/u/hexhowells/w/deconv-eval-stage-1) or download it to your computer.
> - Import the workflow into Galaxy
>
> {% snippet faqs/galaxy/workflows_run_trs.md path="topics/transcriptomics/tutorials/rna-seq-reads-to-counts/workflows/qc_report.ga" title="QC Report" %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a tad confusing, as it is a hands-on box within a hands-on box. Are they meant to run this QC report workflow at this point? Or did you mean to replace the workflow with the one mentioned in step 1 here?

> <hands-on-title>Run pseudo-bulk and actual proportions workflow</hands-on-title>
>
> 1. **Import the workflow** into Galaxy
> - Copy the URL (e.g. via right-click) of [this workflow](https://usegalaxy.eu/u/hexhowells/w/deconv-eval-stage-1) or download it to your computer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include the workflow here in the GTN as well and refer to that in the link.

> - {% icon param-file %} *"Input Dataset"*: `Transposed expression matrix`
> - *"Size of output collection"*: `20`
>
> 4. **Rename** {% icon galaxy-pencil %} output `Expression data`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend thinking about keeping the terms consistent, as later on when the first workflow is run, these inputs are described with different terminology

> In order to upload the input collections into the workflow, you first need to set the input type to **Multiple datasets** in the input file selection.
>
> ![Multiple Datasets](../../images/bulk-deconvolution-evaluate/batch-mode.png "Multiple Datasets button in Galaxy")
{: .comment}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experienced some difficulty loading these collections into the workflow. They would not appear in the pull down menu, and I had to drag and drop the collections without visual confirmation that they had been uploaded.

> - {% icon param-collection %} *"Expression Data"*: `Expression Data`
>
> {% snippet faqs/galaxy/workflows_run.md %}
> 3. Add a tag labelled `#A` to the first "Actual cell proportions" and "Pseudobulk" collections

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure, but I feel like the term "actual cell proportions" might be a little misleading. The cell proportions, as indicated by proportional representation in the single-cell data, are often different from the true in vivo cell type proportions due to systematic drop out biases during data collection. This might be worth mentioning, or maybe a different term which doesn't use "actual" could be substituted.

> {% snippet faqs/galaxy/workflows_run_trs.md path="topics/transcriptomics/tutorials/rna-seq-reads-to-counts/workflows/qc_report.ga" title="QC Report" %}
>
> 2. Run **Workflow inferring cellular proportions** {% icon workflow %} using the following parameters:
> - {% icon param-collection %} *"Pseudobulk - A"*: `expression data - A`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These collections also did not appear in the drop down menus, and rather had to be dragged and dropped.

> >
> > ![Scatter plot comparison](../../images/bulk-deconvolution-evaluate/scatterplot-compare.png "Scatter plot comparison between Music and NNLS")
> >
> > 1. Comparing scatter plots, the MuSiC tool has the most accurate results since the points fall closer onto the x=y line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine the case that the NNLS deconvolution more closely resembled the cell proportions in the real, biological context, while MuSic more accurately recapitulated with proportions from the single cell data. Which if these two methods are really more accurate, then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants