Skip to content

Commit

Permalink
fix2-make_fragment_file
Browse files Browse the repository at this point in the history
  • Loading branch information
timonschlegel committed Jun 13, 2024
1 parent 4bebee4 commit fabc4d4
Showing 1 changed file with 33 additions and 27 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ SnapATAC2 requires 3 input files for the standard pathway of processing:
> - This tutorial starts with a `fragment` file.
> - SnapATAC2 also accepts mapped reads in a `BAM` file.
> - To learn how to get a `fragment` file or `BAM` file from raw `.FASTQ`-reads, please check out the tutorial ["Pre-processing of 10X Single-Cell ATAC-seq Datasets"]( {% link topics/single-cell/tutorials/scatac-preprocessing-tenx/tutorial.md %} )
> - If you would like to start the analysis with a `BAM` file, you can expand the details section [**Creating a fragment file**]( {% link topics/single-cell/tutorials/scatac-standard-processing-snapatac2/tutorial.md %}#details-creating-a-fragment-file).
> - If you would like to start the analysis with a `BAM` file, you can expand the details section ["Details: Creating a fragment file"]( {% link topics/single-cell/tutorials/scatac-standard-processing-snapatac2/tutorial.md %}#creating-a-fragment-file).
{: .comment}


Expand All @@ -126,7 +126,8 @@ SnapATAC2 requires 3 input files for the standard pathway of processing:
> 3. Rename the datasets
> - {% icon galaxy-pencil %} **Rename** the file `atac_pbmc_5k_nextgem_fragments.tsv` to `fragments_file.tsv`
> - {% icon galaxy-pencil %} **Rename** the file `gencode.v46.annotation.gtf.gz` to `gene_annotation.gtf.gz`
> {% snippet faqs/galaxy/datasets_rename.md %}
>
> {% snippet faqs/galaxy/datasets_rename.md %}
>
> 4. Inspect `chrom_sizes` and `fragments_file`
{: .hands_on}
Expand All @@ -145,27 +146,30 @@ SnapATAC2 requires 3 input files for the standard pathway of processing:
>
{: .question}
## Creating a fragment file
> <details-title>Creating a fragment file</details-title>
> > <hands-on-title>Data upload</hands-on-title>
> > 1. Import the file `BAM_500-PBMC` from [Zenodo]({{ page.zenodo_link }}) or from the shared data library
> > ```
> > {{ page.zenodo_link }}/files/atac_pbmc_5k_nextgem_fragments.tsv
> > ```
> > - This dataset contains mapped reads in the `BAM` format.
> > - It was generated by following the tutorial ["Pre-processing of 10X Single-Cell ATAC-seq Datasets"]( {% link topics/single-cell/tutorials/scatac-preprocessing-tenx/tutorial.md %} ) until the output of {% tool [Map with BWA-MEM](toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.18) %}
> > 2. {% tool [SnapATAC2 Preprocessing](toolshed.g2.bx.psu.edu/repos/iuc/snapatac2_preprocessing/snapatac2_preprocessing/2.5.3+galaxy1) %} with the following parameters:
> - *"Method used for preprocessing"*: `Convert a BAM file to a fragment file, using 'pp.make_fragment_file'`
> - {% icon param-file %} *"File name of the BAM file"*: `BAM_500-PBMC` (Input dataset)
> - {% icon param-toggle %} *"Indicate whether the BAM file contain paired-end reads"*: `Yes`
> - *"How to extract barcodes from BAM records?"*: `From read names using regular expressions`
> - *"Extract barcodes from read names of BAM records using regular expressions"*: `(................):`
> > <comment-title></comment-title>
> > - Not every regular expression type is supported.
> > - This expression selects 16 characters if they are followed by a colon `:`. Only the cell barcodes of the `BAM` file will match.
> {: .comment}
> > 3. Rename the generated file to `Fragments 500 PBMC`
> > 4. Now you can continue with either the `fragments_file` from earlier, or the new file `Fragments 500 PBMC`.
> > - {% icon galaxy-info %} Please note that `Fragments 500 PBMC` only contains 500 {PBMC} and thus the clustering will produce different outputs compared to the outputs generated by `fragments_file` (with 5k PBMC).
> > <hands-on-title>fragment file</hands-on-title>
> > 1. Import the file `BAM_500-PBMC` from [Zenodo]({{ page.zenodo_link }}) or from the shared data library
> > ```
> > {{ page.zenodo_link }}/files/atac_pbmc_5k_nextgem_fragments.tsv
> > ```
> > - This dataset contains mapped reads in the `BAM` format.
> > - It was generated by following the tutorial ["Pre-processing of 10X Single-Cell ATAC-seq Datasets"]( {% link topics/single-cell/tutorials/scatac-preprocessing-tenx/tutorial.md %} ) until the output of {% tool [Map with BWA-MEM](toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.18) %}
> > 2. {% tool [SnapATAC2 Preprocessing](toolshed.g2.bx.psu.edu/repos/iuc/snapatac2_preprocessing/snapatac2_preprocessing/2.5.3+galaxy1) %} with the following parameters:
> > - *"Method used for preprocessing"*: `Convert a BAM file to a fragment file, using 'pp.make_fragment_file'`
> > - {% icon param-file %} *"File name of the BAM file"*: `BAM_500-PBMC` (Input dataset)
> > - {% icon param-toggle %} *"Indicate whether the BAM file contain paired-end reads"*: `Yes`
> > - *"How to extract barcodes from BAM records?"*: `From read names using regular expressions`
> > - *"Extract barcodes from read names of BAM records using regular expressions"*: `(................):`
> >
> > > <comment-title></comment-title>
> > > - Not every regular expression type is supported.
> > > - This expression selects 16 characters if they are followed by a colon. Only the cell barcodes of the `BAM` file will match.
> > {: .comment}
> >
> > 3. Rename the generated file to `Fragments 500 PBMC`
> > 4. Now you can continue with either the `fragments_file` from earlier, or the new file `Fragments 500 PBMC`.
> > - {% icon galaxy-info %} Please note that `Fragments 500 PBMC` only contains 500 {PBMC} and thus the clustering will produce different outputs compared to the outputs generated by `fragments_file` (with 5k PBMC).
> {: .hands_on}
{: .details}
Expand All @@ -190,11 +194,13 @@ The [`AnnData`](https://anndata.readthedocs.io/en/latest/) format was initially
> - {% icon param-file %} *"Fragment file, optionally compressed with gzip or zstd"*: `fragments_file.tsv` (Input dataset)
> - {% icon param-file %} *"A tabular file containing chromosome names and sizes"*: `chrom_sizes.txt` (Input dataset)
> - {% icon param-toggle %} *"Whether the fragment file has been sorted by cell barcodes"*: `No`
> > <details-title>Sorted by barcodes</details-title>
> > - This tool requires the fragment file to be sorted according to cell barcodes.
> > - If **pp.make_fragment_file** {% icon tool %} was used to generate the fragment file, this has automatically been done.
> > - Otherwise, the setting *"sorted by cell barcodes"* should remain `No`.
> {: .details}
>
> > <details-title>Sorted by barcodes</details-title>
> > - This tool requires the fragment file to be sorted according to cell barcodes.
> > - If **pp.make_fragment_file** {% icon tool %} was used to generate the fragment file, this has automatically been done.
> > - Otherwise, the setting *"sorted by cell barcodes"* should remain `No`.
> {: .details}
>
> 2. Rename the generated file to `Anndata 5k PBMC`
>
> 3. Check that the format is `h5ad`
Expand Down

0 comments on commit fabc4d4

Please sign in to comment.