-
Notifications
You must be signed in to change notification settings - Fork 923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Seurat version of the PBMC clustering tutorial #5491
base: main
Are you sure you want to change the base?
Conversation
Removing old images to replace with newer ones
@MarisaJL I will review it soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed and tested until the preprocessing.
> - *"Include features detected in at least this many cells"*: `3` | ||
> - *"Include cells where at least this many features are detected"*: `200` | ||
> - *"Calculate percentage of mito genes in each cell"*: `No` | ||
> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include selection of genes.tsv
and barcodes.tsv
in params
> 4. Check that the format is `RDS` | ||
{: .hands_on} | ||
|
||
We can't look at the RDS file directly as it is designed for computers to read, rather than humans, but the Seurat tools will now be able to interact with the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an inspect step here. A question on the number of features and cells filtered out with our thresholds should give an idea of what happened in the last step.
> - *"Genes to calculate residual features for"*: `all genes` | ||
> - *"How to set variable features"*: `set number of variable features` | ||
> - *"Output list of most variable features"*: `Yes` | ||
> - *"Variable(s) to regress out"*: `percent.mt` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it different from the normal scaling step? In the normal scaling step, there was no variable regressed out but in scTransform, percent.mt
regressed out
@MarisaJL the tutorial is well made! Some complex topics were explained in easy language. One key improvement I would suggest for this tutorial is to establish a clearer separation between log normalization and SCTransform preprocessing. There is "choose your own tutorial" option but applied only in the initial step. In later hands-on steps, both preprocessing methods are run together with side-by-side comparisons, which could be confusing for beginners trying to learn the analysis from scratch. Tracking two different analyses and managing two separate SeuratObjects may feel overwhelming at this stage. It would be nice if the "choose your own tutorial" option carried through the entire tutorial. This way, users could follow a single workflow from start to finish before diving into comparisons. I really like your explanations and comparisons. Moving the comparisons to a dedicated section toward the end might also be beneficial. This final section could include guidance on which preprocessing method to use in specific scenarios. Do you think it makes sense? Sorry, it sounds like a lot of restructuring. |
Thanks @pavanvidem - that does make sense! Since this is a tutorial designed for beginners, it's probably best to keep the two options separate. I think it is possible to have the cyoa follow on between steps, so I'll try and restructure it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested the tutorial and it works perfectly! The content looks great!!
> 1. {% tool [Seurat Visualize](toolshed.g2.bx.psu.edu/repos/iuc/seurat_plot/seurat_plot/5.0+galaxy0) %} with the following parameters: | ||
> - {% icon param-file %} *"Input file with the Seurat object"*: `rds_out` (output of **Seurat Run Dimensional Reduction** {% icon tool %}) | ||
> - *"Method used"*: `Determine dimensionality with 'ElbowPlot'` | ||
> - *"Number of dimensions to plot standard deviation for"*: If you ran the separate preprocessing steps then enter `30` here, if you used SCTransfrom then enter `50` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either change from 50 to 30 or upload a new Elbow plot for scTransform with 50 PCs.
> - *"Output list of top genes"*: `Yes` | ||
> | ||
{: .hands_on} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename this output because it is used in many following steps.
Creating a Seurat Object in R would require two steps - first, we would need to read in our data, in this case using the `Read10X` function, then secondly we would turn it into a Seurat Object using the `CreateSeuratObject` function. On Galaxy, we can perform both steps with a single tool. The `CreateSeuratObject` function also generates some QC metrics and performs basic filtering of the data. | ||
|
||
><hands-on-title>Create a Seurat Object</hands-on-title> | ||
> 1. {% tool [Seurat Create](toolshed.g2.bx.psu.edu/repos/iuc/seurat_create/seurat_create/5.0+galaxy0) %} with the following parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The barcodes file is in txt
format. Currently, the tools does not support the txt files as input.
I am trying to add it here: galaxyproject/tools-iuc#6539. So please use galaxy1
version for this step.
> - *"UMAP implementation to run"*: `uwot` | ||
> - *"Run UMAP on dimensions, features, graph or KNN output"*: `dims` | ||
> - *"Number of dimensions from reduction to use as input"*: If you ran separate preprocessing steps, leave this as `10`, if you used SCTransform then change it to `30` | ||
> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename the output. This output is used later in for FindAllMarkers
and then FindMarkers
. Users might select wrong input if not renamed.
Although there is a lot of information here, all we need to know for now is that the markers listed for each cluster are the genes that were expressed more by these cells than any of the other clusters. We can search online for these genes to get an idea of what types of cells are in our clusters. | ||
|
||
> <question-title></question-title> | ||
> 1. Are the top genes associated with PCs 1-3 in our list of markers? Which clusters are they markers for? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a question on how to select markers for a particular cluster. Then answer with Filter
tool on column 7.
> | ||
> 1. Click on the {% icon galaxy-pencil %} pencil icon of the file we renamed as `DE Markers` (this was the CSV output from `FindAllMarkers`) then select {% icon galaxy-chart-select-data %} Datatypes in the central panel. Choose the second option, `Convert to Datatype` and make sure `tabular (using `Convert CSV to tabular`)` is selected in the drop down menu before pressing the `Create Dataset` button. This will create a new, tabular version of the dataset at the top of your history - make sure that this is the version you use in the next step. | ||
> | ||
> 2. {% tool [Table Compute](toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy0) %} with the following parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tested, the empty header line did not affect the heatmap. I think a simple cut
tool should be enough here.
Co-authored-by: Pavankumar Videm <[email protected]>
This is the Seurat version of the Clustering3k PMBCs with Scanpy tutorial for single cell.