Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Seurat version of the PBMC clustering tutorial #5491

Draft
wants to merge 59 commits into
base: main
Choose a base branch
from

Conversation

MarisaJL
Copy link
Collaborator

This is the Seurat version of the Clustering3k PMBCs with Scanpy tutorial for single cell.

@MarisaJL MarisaJL requested a review from a team as a code owner October 31, 2024 13:30
@pavanvidem
Copy link
Member

@MarisaJL I will review it soon.

Copy link
Member

@pavanvidem pavanvidem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed and tested until the preprocessing.

> - *"Include features detected in at least this many cells"*: `3`
> - *"Include cells where at least this many features are detected"*: `200`
> - *"Calculate percentage of mito genes in each cell"*: `No`
>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include selection of genes.tsv and barcodes.tsv in params

> 4. Check that the format is `RDS`
{: .hands_on}

We can't look at the RDS file directly as it is designed for computers to read, rather than humans, but the Seurat tools will now be able to interact with the data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an inspect step here. A question on the number of features and cells filtered out with our thresholds should give an idea of what happened in the last step.

> - *"Genes to calculate residual features for"*: `all genes`
> - *"How to set variable features"*: `set number of variable features`
> - *"Output list of most variable features"*: `Yes`
> - *"Variable(s) to regress out"*: `percent.mt`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it different from the normal scaling step? In the normal scaling step, there was no variable regressed out but in scTransform, percent.mt regressed out

@pavanvidem
Copy link
Member

@MarisaJL the tutorial is well made! Some complex topics were explained in easy language. One key improvement I would suggest for this tutorial is to establish a clearer separation between log normalization and SCTransform preprocessing. There is "choose your own tutorial" option but applied only in the initial step. In later hands-on steps, both preprocessing methods are run together with side-by-side comparisons, which could be confusing for beginners trying to learn the analysis from scratch. Tracking two different analyses and managing two separate SeuratObjects may feel overwhelming at this stage.

It would be nice if the "choose your own tutorial" option carried through the entire tutorial. This way, users could follow a single workflow from start to finish before diving into comparisons. I really like your explanations and comparisons. Moving the comparisons to a dedicated section toward the end might also be beneficial. This final section could include guidance on which preprocessing method to use in specific scenarios.

Do you think it makes sense? Sorry, it sounds like a lot of restructuring.

@MarisaJL
Copy link
Collaborator Author

MarisaJL commented Nov 7, 2024

Thanks @pavanvidem - that does make sense! Since this is a tutorial designed for beginners, it's probably best to keep the two options separate. I think it is possible to have the cyoa follow on between steps, so I'll try and restructure it.

Copy link
Member

@pavanvidem pavanvidem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the tutorial and it works perfectly! The content looks great!!

> 1. {% tool [Seurat Visualize](toolshed.g2.bx.psu.edu/repos/iuc/seurat_plot/seurat_plot/5.0+galaxy0) %} with the following parameters:
> - {% icon param-file %} *"Input file with the Seurat object"*: `rds_out` (output of **Seurat Run Dimensional Reduction** {% icon tool %})
> - *"Method used"*: `Determine dimensionality with 'ElbowPlot'`
> - *"Number of dimensions to plot standard deviation for"*: If you ran the separate preprocessing steps then enter `30` here, if you used SCTransfrom then enter `50`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either change from 50 to 30 or upload a new Elbow plot for scTransform with 50 PCs.

> - *"Output list of top genes"*: `Yes`
>
{: .hands_on}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this output because it is used in many following steps.

Creating a Seurat Object in R would require two steps - first, we would need to read in our data, in this case using the `Read10X` function, then secondly we would turn it into a Seurat Object using the `CreateSeuratObject` function. On Galaxy, we can perform both steps with a single tool. The `CreateSeuratObject` function also generates some QC metrics and performs basic filtering of the data.

><hands-on-title>Create a Seurat Object</hands-on-title>
> 1. {% tool [Seurat Create](toolshed.g2.bx.psu.edu/repos/iuc/seurat_create/seurat_create/5.0+galaxy0) %} with the following parameters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The barcodes file is in txt format. Currently, the tools does not support the txt files as input.
I am trying to add it here: galaxyproject/tools-iuc#6539. So please use galaxy1 version for this step.

> - *"UMAP implementation to run"*: `uwot`
> - *"Run UMAP on dimensions, features, graph or KNN output"*: `dims`
> - *"Number of dimensions from reduction to use as input"*: If you ran separate preprocessing steps, leave this as `10`, if you used SCTransform then change it to `30`
>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename the output. This output is used later in for FindAllMarkers and then FindMarkers. Users might select wrong input if not renamed.

Although there is a lot of information here, all we need to know for now is that the markers listed for each cluster are the genes that were expressed more by these cells than any of the other clusters. We can search online for these genes to get an idea of what types of cells are in our clusters.

> <question-title></question-title>
> 1. Are the top genes associated with PCs 1-3 in our list of markers? Which clusters are they markers for?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a question on how to select markers for a particular cluster. Then answer with Filter tool on column 7.

>
> 1. Click on the {% icon galaxy-pencil %} pencil icon of the file we renamed as `DE Markers` (this was the CSV output from `FindAllMarkers`) then select {% icon galaxy-chart-select-data %} Datatypes in the central panel. Choose the second option, `Convert to Datatype` and make sure `tabular (using `Convert CSV to tabular`)` is selected in the drop down menu before pressing the `Create Dataset` button. This will create a new, tabular version of the dataset at the top of your history - make sure that this is the version you use in the next step.
>
> 2. {% tool [Table Compute](toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy0) %} with the following parameters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tested, the empty header line did not affect the heatmap. I think a simple cut tool should be enough here.

@MarisaJL MarisaJL marked this pull request as draft December 6, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants