rest of plots

galaxyproject · Nov 2, 2023 · d1cf23a · d1cf23a
1 parent 293c737
commit d1cf23a
Show file tree

Hide file tree

Showing 5 changed files with 31 additions and 22 deletions.
diff --git a/topics/single-cell/images/scCiteSeq-RStudio/Plot12.png b/topics/single-cell/images/scCiteSeq-RStudio/Plot12.png
diff --git a/topics/single-cell/images/scCiteSeq-RStudio/Plot13.png b/topics/single-cell/images/scCiteSeq-RStudio/Plot13.png
diff --git a/topics/single-cell/tutorials/scCiteSeq-RStudio/faqs/index.md b/topics/single-cell/tutorials/scCiteSeq-RStudio/faqs/index.md
@@ -0,0 +1,3 @@
+---
+layout: faq-page
+---
diff --git a/topics/single-cell/tutorials/scCiteSeq-RStudio/preamble.md b/topics/single-cell/tutorials/scCiteSeq-RStudio/preamble.md
@@ -18,12 +18,12 @@ Before we can start exploring, we'll process our transcriptomic and surface prot
 {: .comment}
 
 # Get Your Data
-For this tutorial, we'll use a publicly available dataset of 8,617 cord blood mononuclear cells (CBMCs) which have been sequenced for transcriptomic measurements as well as 11 surface proteins. 
+For this tutorial, we'll use a publicly available dataset of 8,617 cord blood mononuclear cells (CBMCs) which have been sequenced for transcriptomic measurements as well as 11 surface proteins ({% cite Satija&Smibert2017 %}). 
 
 ><comment-title></comment-title>
 >A quick note on nomenclature when working with Cite-Seq.
->ADT: (or antibody derived tag) represents the cell surface protein measurements
->RNA: represents the transcriptomic measurements
+>ADT: ( antibody derived tag) refers to the cell surface protein abundance measurements
+>meanwhile RNA: represents the transcriptomic measurements
 {: .comment}
 
 First on the to-do list is importing our csv files. You can do this in a couple of ways: 
@@ -36,15 +36,17 @@ Then select the "Paste/Fetch Data" option:
 ![Paste/Fetch Data Button](../../images/scCiteSeq-RStudio/Plot2.png "Paste/Fetch Data")
 
 Copy the following links into the box:
+# **IS THERE A BETTER WAY TO FORMAT THIS?**
 
-1. ADT data: ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz
-2. RNA data: ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz
+[] ADT data: ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz
+
+[] RNA data: ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz
 
 Select "Start" and then close once both files indicate they are 100% ready. 
 
 The two csv data files should now begin importing into your Galaxy history!  
 
-> **Option 2.** Import A History
+> # **Option 2.** Import A History
 You can access [this history](https://usegalaxy.eu/u/camila-goclowski/h/cite-seq-tutorial-data) by clicking on the link provided.
 
 {% snippet faqs/galaxy/histories_import.md %}
@@ -83,7 +85,7 @@ Now we'll run those csv files through the updated Seurat tool with the following
 >  - *"Output list of cite-seq markers"*: `Yes`
 >  - *"Compare specific feature's effect on protein and rna expression?"*: `No`
 >  - *"Compare top RNA and protein features graphicaly against themselves and one another"*: `No`
->  - *"How many of the top featyre should be shown"*: `5`
+>  - *"How many of the top features should be shown"*: `5`
 {: .hands_on}
 
 ><comment-title></comment-title>
@@ -97,8 +99,10 @@ Now that we have some explorable data in our Galaxy history, let's move into RSt
 
 ><comment-title>Next Step</comment-title>
 > The interactive RStudio tool should begin to load now. Make your way over to your Active Interactive Tools page (User (in the top bar of the usegalaxy page) > Active Interactive Tools > RStudio)
+> ![Interactive Tools Button](../../images/scCiteSeq-RStudio/Plot12.png "Interactive Tools")
 >
->Alternatively, you may use the view (eye) icon in your Galaxy History to open the interactive RStudio environment.
+>Alternatively, you can use the view (eye) icon in your Galaxy History to open the interactive RStudio environment.
+> ![Eye Button](../../images/scCiteSeq-RStudio/Plot13.png "Eye Button")
 {: .comment}
 
-It may be useful to explore some of these output files that are now in your history. Take a look at some of the output previews and see if you can get a grasp of what's what. If not, no worries at all, we'll start looking more closely once we've made it into RStudio!
+It may be a good time to explore some of these output files that are now in your history. Take a look at some of the previews and see if you can get a grasp of what's what. If not, no worries at all, we'll start looking more closely once we've made it into RStudio!
diff --git a/topics/single-cell/tutorials/scCiteSeq-RStudio/tutorial.md b/topics/single-cell/tutorials/scCiteSeq-RStudio/tutorial.md
@@ -43,8 +43,6 @@ notebook:
 
 Before we can do any real biological investigation, we need to understand what each of the outputs from our Seurat tool are. Maybe you've already begun to dissect what's what, but just in case, let's run through each of the datasets together. 
 
-We'll begin to understand: 
-
 ## Datatypes We'll Review
 1. [RNA Matrix](#rnamatrix)
 2. [ADT Matrix](#adtmatrix)
@@ -54,7 +52,7 @@ We'll begin to understand:
 6. [Combined RNA & Protein Markers](#combinedmarkers)
 
 ><comment-title>gx_get</comment-title>
-> RStudio in galaxy comes with a gx_get() function. This  is critical to understand and be able to use in order to move datasets from your history into RStudio. The function outputs the file path with which you can access your data via RStudio.
+> RStudio in Galaxy comes with a gx_get() function. This  is critical to understand and be able to use in order to move datasets from your history into RStudio. The function outputs the file path with which you can access your data via RStudio.
 > To use it, simply use the numbered position of the dataset you are looking to import. For example: 
 > If we want to find the first dataset we imported, simply run the following command: 
 > ```r
@@ -69,20 +67,22 @@ To take a look at the pre analysis RNA-seq matrix, use the following commands:
 gx_get(1)
 RNA<-read.csv('/import/1')
 ```
-Note that the dataset we are using also contains ~5% of mouse cells, which we can use as negative controls for the cell surface protein measurements. As such, the RNA expression matrix has "HUMAN_" or "MOUSE_" appended to each gene. 
+It's worth mentioning that the dataset we are using contains ~5% mouse cells, which we can use as negative controls for the cell surface protein measurements. As such, the RNA expression matrix initially has "HUMAN_" or "MOUSE_" appended to each gene. 
 
 Now let's take a look at what's in here. 
 ```r
 view(RNA)
 ```
 ![RNA Matrix](../../images/scCiteSeq-RStudio/Plot3.png "RNA Matrix")
 
-If you're familiar with scRNA-seq matrices, this may look familiar to you. That's because it is exactly that--an RNA-seq matrix! In these matrices we have genes as row names and cell barcodes as column names. The values within the matrix denote the number of transcripts from a given gene within a given cell.
+If you're familiar with scRNA-seq matrices, this may look familiar to you. That's because it is exactly that--an RNA-seq matrix. In these matrices we have genes as row names and cell barcodes as column names. The values within the matrix denote the number of transcripts from a given gene within a given cell.
+
+You may have noticed there are lots of zero values in this matrix. You may also be thinking, "Won't that create noise in the dataset??" The answer is yes, and removing these zeros is one of the first problems that the Seurat preprocessing tool will solve. 
 
-You may have noticed there are TONS of zero values in this matrix. You may also be thinking, "Won't that create noise in the dataset??" The answer is yes, and these zeros are one of the first things that the Seurat preprocessing tool will accomplish. This matrix that we've labelled as RNA is *not* what we will be analyzing further into this tutorial. We are simply taking a look to ground ourselves in what the data looked like *before* preprocessing. 
+This matrix, with these values shown, are *not* what we will be analyzing later on in this tutorial. We are simply taking a look to get an understanding of what the data looks like *before* preprocessing. 
 
 ### ADT (Protein) Matrix <a name="adtmatrix"></a>
-We can do the same thing with the pre-analysis protein matrix. We'll call it the ADT matrix for now, since that is how Seurat recognizes it! 
+We can do the same thing with the pre-analysis protein matrix. We'll call it the ADT matrix for now, since that is how Seurat recognizes it.
 ```r
 gx_get(2)
 ADT<-read.csv('/import/2')
@@ -92,14 +92,16 @@ Again, let's take a look at what's in here:
 view(ADT)
 ```
 ![ADT Matrix](../../images/scCiteSeq-RStudio/Plot4.png "ADT Matrix")
-Looks shockingly similar, doesn't it?!
 
-In the ADT matrix, we have cell surface proteins (instead of gene names) as row names and the same cell barcodes as column names. 
+Looks shockingly similar, doesn't it?
 
+In the ADT matrix, we have cell surface proteins (instead of gene names) as row names and the same cell barcodes as column names. 
+# IMPORT HTML INTO RSTUDIO?
 If you ran the same parameters as I did, the next output (number 3 in our history) will be Seurat's run log. This is unfortunately not super easy to import into RStudio since it comes as an html format. It contains all of the run information from the background coding done by the tool. Any warnings, errors, or progress bars will be present in here and are often useful for troubleshooting in case something goes awry. Because of the html formatting, we will not look at this output together, but feel free to explore it on your own using the view (eye) icon in your history. 
+![Eye Button](../../images/scCiteSeq-RStudio/Plot13.png "Eye Button")
 
 ### Protein Markers <a name="proteinmarkers"></a>
-The next output in my galaxy history are protein markers! Let's take a look: 
+The next output in my galaxy history are protein markers, let's take a look: 
 ```r
 gx_get(4)
 protein_markers<-read.table('/import/4', header = T)
@@ -111,12 +113,12 @@ There are tons of markers in this list and if you dig through them all, you'll l
 ```r
 protein_markers<-subset(protein_markers, p_val_adj < 0.045)
 ```
-Doesn't look like there were actually *any* insignifcant markers in that list! Although we got lucky this time, I have found that it is in your best interest to always attempt this filter, especially when working with bigger, messier datasets!
+Doesn't look like there were actually any insignifcant markers in that list! Although we got lucky this time, I have found that it is in everyone's best interest to always attempt this filter, especially when working with bigger, messier datasets!
 
-Now we have a statistically signficant list of protein markers per cluster! There are a number of statistics that are included here, if you're interested in better understanding them, take a look at [Seurat's documentation of FindAllMarkers] (https://satijalab.org/seurat/reference/findallmarkers) for more details and options. 
+Now we have a statistically signficant list of protein markers per cluster! There are a number of statistics that are included in these dataframes, if you're interested in better understanding them, take a look at [Seurat's documentation of FindAllMarkers] (https://satijalab.org/seurat/reference/findallmarkers) for more details. 
 
 ### RNA Markers <a name="rnamarkers"></a>
-The next dataset in our history should be RNA markers. Let's import them, remove the statistically insignifcant ones, and take a look: 
+The next dataset in our history should be RNA markers. Let's import them, remove any statistically insignifcant ones, and take a look: 
 ```r
 gx_get(5)
 rna_markers<-read.table('/import/5', header = T)