Skip to content

Commit

Permalink
Added experimental species draft (#1038)
Browse files Browse the repository at this point in the history
  • Loading branch information
brianraymor authored Oct 21, 2024
1 parent cc0fd96 commit de92e0b
Showing 1 changed file with 351 additions and 0 deletions.
351 changes: 351 additions & 0 deletions schema/drafts/experimental_species_schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,351 @@
# Schema

Contact: [email protected]

Document Status: _Draft_

Version: 5.2.0+experimental

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED" "MAY", and "OPTIONAL" in this document are to be interpreted as described in [BCP 14](https://tools.ietf.org/html/bcp14), [RFC2119](https://www.rfc-editor.org/rfc/rfc2119.txt), and [RFC8174](https://www.rfc-editor.org/rfc/rfc8174.txt) when, and only when, they appear in all capitals, as shown here.

This draft is limited to **additions** or **modifications** to [schema 5.2.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.2.0/schema.md). If a 5.2.0 reference does not appear in this document, then no schema change is required. The following **temporary** constraints for *Danio rerio* and *Drosophila melanogaster* are specified:

* The `organism_ontology_term_id` MUST be the same for all observations.
* The `tissue_type` MUST be `'tissue'` for all observations.

## General Requirements

**Organisms**. Supported organisms MUST be documented in [Required Gene Annotations](#required-gene-annotations).


### Required Ontologies

The following ontology dependencies are *pinned* for this version of the schema.

| Ontology | OBO Prefix | Release | Download |
|:--|:--|:--|:--|
| [Cell Ontology] | CL | [2024-08-16] | [cl.owl]|
| [Drosophila Anatomy Ontology] | FBbt | [2024-08-08](https://github.com/FlyBase/drosophila-anatomy-developmental-ontology/releases/tag/v2024-08-08) | [fbbt.owl] |
| [Drosophila Development Ontology] | FBdv | [2024-08-07](https://github.com/FlyBase/drosophila-developmental-ontology/releases/tag/v2024-08-07) | [fbdv.owl] |
| [Experimental Factor Ontology] | EFO | [2024-08-15 EFO 3.69.0] | [efo.owl]
| [Human Ancestry Ontology] | HANCESTRO | [3.0] | [hancestro-base.owl] |
| [Human Developmental Stages] | HsapDv | [2024-05-28] | [hsapdv.owl] |
| [Mondo Disease Ontology] | MONDO | [2024-08-06] | [mondo.owl] |
| [Mouse Developmental Stages]| MmusDv | [2024-05-28] | [mmusdv.owl] |
| [NCBI organismal classification] | NCBITaxon | [2023-06-20] | [ncbitaxon.owl] |
| [Phenotype And Trait Ontology] | PATO | [2023-05-18] | [pato.owl] |
| [Uberon multi-species anatomy ontology] | UBERON | [2024-08-07] | [uberon.owl] |
| [Zebrafish Anatomy Ontology] | ZFA<br>ZFS | [2022-12-09] | [zfa.owl] |
| | | | |

[Cell Ontology]: http://obofoundry.org/ontology/cl.html
[2024-08-16]: https://github.com/obophenotype/cell-ontology/releases/tag/v2024-08-16
[cl.owl]: https://github.com/obophenotype/cell-ontology/releases/download/v2024-08-16/cl.owl

[Drosophila Anatomy Ontology]: https://obofoundry.org/ontology/fbbt.html
[2024-08-08]: https://github.com/FlyBase/drosophila-anatomy-developmental-ontology/releases/tag/v2024-08-08
[fbbt.owl]: https://github.com/FlyBase/drosophila-anatomy-developmental-ontology/releases/download/v2024-08-08/fbbt.owl

[Drosophila Development Ontology]: https://obofoundry.org/ontology/fbdv.html
[fbdv.owl]: https://github.com/FlyBase/drosophila-developmental-ontology/releases/download/v2024-08-07/fbdv.owl

[Experimental Factor Ontology]: http://www.ebi.ac.uk/efo
[2024-08-15 EFO 3.69.0]: https://github.com/EBISPOT/efo/releases/tag/v3.69.0
[efo.owl]: https://github.com/EBISPOT/efo/releases/download/v3.69.0/efo.owl

[Human Ancestry Ontology]: http://www.obofoundry.org/ontology/hancestro.html
[3.0]: https://github.com/EBISPOT/hancestro/releases/tag/3.0
[hancestro-base.owl]: https://github.com/EBISPOT/hancestro/blob/3.0/hancestro-base.owl

[Human Developmental Stages]: http://obofoundry.org/ontology/hsapdv.html
[2024-05-28]: https://github.com/obophenotype/developmental-stage-ontologies/releases/tag/v2024-05-28
[hsapdv.owl]: https://github.com/obophenotype/developmental-stage-ontologies/releases/download/v2024-05-28/hsapdv.owl

[Mondo Disease Ontology]: http://obofoundry.org/ontology/mondo.html
[2024-08-06]: https://github.com/monarch-initiative/mondo/releases/tag/v2024-08-06
[mondo.owl]: https://github.com/monarch-initiative/mondo/releases/download/v2024-08-06/mondo.owl

[Mouse Developmental Stages]: http://obofoundry.org/ontology/mmusdv.html
[mmusdv.owl]: https://github.com/obophenotype/developmental-stage-ontologies/releases/download/v2024-05-28/mmusdv.owl

[NCBI organismal classification]: http://obofoundry.org/ontology/ncbitaxon.html
[2023-06-20]: https://github.com/obophenotype/ncbitaxon/releases/tag/v2023-06-20
[ncbitaxon.owl]: https://github.com/obophenotype/ncbitaxon/releases/download/v2023-06-20/ncbitaxon.owl.gz

[Phenotype And Trait Ontology]: http://www.obofoundry.org/ontology/pato.html
[2023-05-18]: https://github.com/pato-ontology/pato/releases/tag/v2023-05-18
[pato.owl]: https://github.com/pato-ontology/pato/blob/v2023-05-18/pato.owl

[Uberon multi-species anatomy ontology]: http://www.obofoundry.org/ontology/uberon.html
[2024-08-07]: https://github.com/obophenotype/uberon/releases/tag/v2024-08-07
[uberon.owl]: https://github.com/obophenotype/uberon/releases/download/v2024-08-07/uberon.owl

[Zebrafish Anatomy Ontology]: https://obofoundry.org/ontology/zfa.html
[2022-12-09]: https://github.com/ZFIN/zebrafish-anatomical-ontology/releases/tag/v2022-12-09
[zfa.owl]: https://github.com/ZFIN/zebrafish-anatomical-ontology/blob/v2022-12-09/zfa.owl

---

### Required Gene Annotations

ENSEMBL identifiers are required for genes and [External RNA Controls Consortium (ERCC)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978944/) identifiers for [RNA Spike-In Control Mixes] to ensure that all datasets measure the same features and can therefore be integrated.

The following gene annotation dependencies are *pinned* for this version of the schema. For multi-organism experiments, cells from any Metazoan organism are allowed as long as orthologs from the following organism annotations are used.

| Organism | Source | Required version | Download |
|:--|:--|:--|:--|
| <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A9606"><code>NCBITaxon:9606</code></a><br>for <i>Homo sapiens</i> | [GENCODE (Human)] | Human reference GRCh38.p14<br>(GENCODE v44/Ensembl 110) | [gencode.v44.primary_assembly.annotation.gtf] |
| <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A10090"><code>NCBITaxon:10090</code></a><br>for <i>Mus musculus</i> | [GENCODE (Mouse)] | Mouse reference GRCm39<br>(GENCODE vM33/Ensembl 110) | [gencode.vM33.primary_assembly.annotation.gtf] |
| <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A2697049"><code>NCBITaxon:2697049</code></a><br>for <i>SARS-CoV-2</i> | [ENSEMBL (COVID-19)] | SARS-CoV-2 reference (ENSEMBL assembly: ASM985889v3) | [Sars\_cov\_2.ASM985889v3.101.gtf] |
| <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7955"><code>NCBITaxon:7955</code></a><br>for <i>Danio rerio</i> | [ENSEMBL (Zebrafish)] | GRCz11.112 (Ensembl 112) | [Danio_rerio.GRCz11.112.gtf] |
| <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7227"><code>"NCBITaxon:7227"</code></a><br>for <i>Drosophila melanogaster</i>| [ENSEMBL (Fruit fly)] | BDGP6.46 (Ensembl 112) | [Drosophila_melanogaster.BDGP6.46.112.gtf] |
| | [ThermoFisher ERCC Spike-Ins] | ThermoFisher ERCC RNA Spike-In Control Mixes (Cat # 4456740, 4456739) | [cms_095047.txt] |

[RNA Spike-In Control Mixes]: https://www.thermofisher.com/document-connect/document-connect.html?url=https%3A%2F%2Fassets.thermofisher.com%2FTFS-Assets%2FLSG%2Fmanuals%2Fcms_086340.pdf&title=VXNlciBHdWlkZTogRVJDQyBSTkEgU3Bpa2UtSW4gQ29udHJvbCBNaXhlcyAoRW5nbGlzaCAp

[GENCODE (Human)]: https://www.gencodegenes.org/human/
[gencode.v44.primary_assembly.annotation.gtf]: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.primary_assembly.annotation.gtf.gz

[GENCODE (Mouse)]: https://www.gencodegenes.org/mouse/
[gencode.vM33.primary_assembly.annotation.gtf]: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M33/gencode.vM33.primary_assembly.annotation.gtf.gz

[ENSEMBL (COVID-19)]: https://covid-19.ensembl.org/index.html
[Sars\_cov\_2.ASM985889v3.101.gtf]: https://ftp.ensemblgenomes.org/pub/viruses/gtf/sars_cov_2/Sars_cov_2.ASM985889v3.101.gtf.gz

[ENSEMBL (Zebrafish)]: https://useast.ensembl.org/Danio_rerio/Info/Index
[Danio_rerio.GRCz11.112.gtf]: https://ftp.ensembl.org/pub/release-112/gtf/danio_rerio/Danio_rerio.GRCz11.112.gtf.gz

[ENSEMBL (Fruit fly)]: https://www.ensembl.org/Drosophila_melanogaster/Info/Index
[Drosophila_melanogaster.BDGP6.46.112.gtf]: https://ftp.ensembl.org/pub/release-112/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.46.112.gtf.gz

[ThermoFisher ERCC Spike-Ins]: https://www.thermofisher.com/order/catalog/product/4456740#/4456740
[cms_095047.txt]: https://assets.thermofisher.com/TFS-Assets/LSG/manuals/cms_095047.txt

---

## `obs` (Cell Metadata)

### development_stage_ontology_term_id

<table><tbody>
<tr>
<th>Key</th>
<td>development_stage_ontology_term_id</td>
</tr>
<tr>
<th>Annotator</th>
<td>Curator MUST annotate.</td>
</tr>
<tr>
<th>Value</th>
<td>categorical with <code>str</code> categories. If unavailable, this MUST be <code>"unknown"</code>.<br><br>
If <code>organism_ontolology_term_id</code> is <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7955"><code>"NCBITaxon:7955"</code></a> for <i>Danio rerio</i>, then this MUST be the most accurate descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/zfs/classes?obo_id=ZFS%3A0100000"><code>ZFS:0100000</code></a> for <i>zebrafish stage</i> and MUST NOT be <a href="https://www.ebi.ac.uk/ols4/ontologies/zfs/classes?obo_id=ZFS%3A0000000"><code>ZFS:0000000</code></a> for <i>Unknown</i>.<br /><br />If <code>organism_ontolology_term_id</code> is <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7227"><code>"NCBITaxon:7227"</code></a> for <i>Drosophila melanogaster</i>, then this MUST be the most accurate FBdv term.
<br><br> Otherwise, for all other organisms this MUST be the most accurate descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/uberon/classes?obo_id=UBERON%3A0000105"><code>UBERON:0000105</code></a> for <i>life cycle stage</i>, excluding <a href="https://www.ebi.ac.uk/ols4/ontologies/uberon/classes?obo_id=UBERON%3A0000071"><code>UBERON:0000071</code></a> for <i>death stage</i>.
</td>
</tr>
</tbody></table>
<br>

---

### organism_cell_type_ontology_term_id

<table><tbody>
<tr>
<th>Key</th>
<td>organism_cell_type_ontology_term_id</td>
</tr>
<tr>
<th>Annotator</th>
<td>Curator MUST annotate.</td>
</tr>
<tr>
<th>Value</th>
<td>categorical with <code>str</code> categories. This MUST be <code>"unknown"</code> when:
<ul>
<li> no appropriate term can be found (e.g. the cell type is unknown)</li>
<li><code>assay_ontology_term_id</code> is <a href="https://www.ebi.ac.uk/ols4/ontologies/efo/classes?obo_id=EFO%3A0010961"><code>"EFO:0010961"</code></a> for <i>Visium Spatial Gene Expression</i>, <code>uns['spatial']['is_single']</code> is <code>True</code>, and the corresponding value of <code>in_tissue</code> is <code>0</code></li>
<br>Otherwise:
<table>
<thead>
<tr>
<th>For <code>organism_ontolology_term_id</code></th>
<th>MUST be</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7955"><code>"NCBITaxon:7955"</code></a><br>for <i>Danio rerio</i></td>
<td>the most accurate descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/zfa/classes?obo_id=ZFA%3A0009000"><code>ZFA:0009000</code></a> for <i>cell</i>
</tr>
<tr>
<td><a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7227"><code>"NCBITaxon:7227"</code></a><br>for <i>Drosophila melanogaster</i></td>
<td>the most accurate descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/fbbt/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FFBbt_00007002?lang=en"><code>FBbt:0007002</code></a> for <i>cell</i>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
<br>

---

### organism_ontology_term_id

<table><tbody>
<tr>
<th>Key</th>
<td>organism_ontology_term_id</td>
</tr>
<tr>
<th>Annotator</th>
<td>Curator MUST annotate.</td>
</tr>
<tr>
<th>Value</th>
<td>categorical with <code>str</code> categories. This MUST be a descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A33208"<code>NCBITaxon:33208</code></a> for <i>Metazoa</i>.<br><br>If <code>organism_ontology_term_id</code> is <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7955"><code>"NCBITaxon:7955"</code></a> for <i>Danio rerio</i> or <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7227"><code>"NCBITaxon:7227"</code></a> for <i>Drosophila melanogaster</i>, then all observations MUST contain the same value.
</td>
</tr>
</tbody></table>
<br>

---

### organism_tissue_ontology_term_id

<table><tbody>
<tr>
<th>Key</th>
<td>organism_tissue_ontology_term_id</td>
</tr>
<tr>
<th>Annotator</th>
<td>Curator MUST annotate.</td>
</tr>
<tr>
<th>Value</th>
<td>categorical with <code>str</code> categories.<br>
<br>
<table>
<thead>
<tr>
<th>For <code>organism_ontolology_term_id</code></th>
<th>MUST be</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7955"><code>"NCBITaxon:7955"</code></a><br>for <i>Danio rerio</i> </td>
<td>the most accurate descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/zfa/classes?obo_id=ZFA%3A0100000"><code>ZFA:0100000</code></a> for <i>zebrafish anatomical entity</i><br>and MUST NOT be <a href="https://www.ebi.ac.uk/ols4/ontologies/zfa/classes?obo_id=ZFA%3A0009000"><code>ZFA:0009000</code></a> for <i>cell</i> or any of its descendants
</tr>
<tr>
<td><a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7227"><code>"NCBITaxon:7227"</code></a><br>for <i>Drosophila melanogaster</i></td>
<td>the most accurate descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/fbbt/classes?obo_id=FBBT%3A10000000"><code>FBbt:10000000</code></a> for <i>anatomical entity</i><br>and MUST NOT be <a href="https://www.ebi.ac.uk/ols4/ontologies/fbbt/classes?obo_id=FBbt%3A00007002"><code>FBbt:00007002</code></a> for <i>cell</i> or any of its descendants.</code>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
<br>

---

### tissue_type

<table><tbody>
<tr>
<th>Key</th>
<td>tissue_type</td>
</tr>
<tr>
<th>Annotator</th>
<td>Curator MUST annotate.</td>
</tr>
<tr>
<th>Value</th>
<td>categorical with <code>str</code> categories.<br><br>If <code>organism_ontology_term_id</code> is <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7955"><code>"NCBITaxon:7955"</code></a> for <i>Danio rerio</i> or <a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7227"><code>"NCBITaxon:7227"</code></a> for <i>Drosophila melanogaster</i>, then the value MUST be <code>"tissue"</code>.<br><br>Otherwise, the value MUST be <code>"tissue"</code>, <code>"organoid"</code>, or <code>"cell culture"</code>.
</tr>
</tbody></table>
<br>

---

## var and raw.var (Gene Metadata)

### feature_reference

<table><tbody>
<tr>
<th>Key</th>
<td>feature_reference</td>
</tr>
<tr>
<th>Annotator</th>
<td>CELLxGENE Discover MUST annotate.</td>
</tr>
<tr>
<th>Value</th>
<td><code>str</code>. This MUST be the reference organism for a feature:
<br><br>
<table>
<thead>
<tr>
<th>Reference Organism</th>
<th>MUST Use</th>
</tr>
</thead>
<tbody>
<tr>
<td><i>Homo sapiens</i></td>
<td><a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A9606"><code>"NCBITaxon:9606"</code></a></td>
</tr>
<tr>
<td><i>Mus musculus</i></td>
<td><a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A10090"><code>"NCBITaxon:10090"</code></a></td>
</tr>
<tr>
<td><i>SARS-CoV-2</i></td>
<td><a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A2697049"><code>"NCBITaxon:2697049"</code></a></td>
</tr>
<tr>
<td><i>Danio rerio</i></td>
<td><a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7955"><code>"NCBITaxon:7955"</code></a></td>
</tr>
<tr>
<td><i>Drosophila melanogaster</i></td>
<td><a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A7227"><code>"NCBITaxon:7227"</code></a></td>
</tr>
<tr>
<td><i>ERCC Spike-Ins</i></td>
<td><a href="https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon/classes?obo_id=NCBITaxon%3A32630"><code>"NCBITaxon:32630"</code></a></td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
<br>

---

## Appendix A. Changelog

### schema v5.2.0+experimental

* General Requirements
* Updated requirements for supported organisms
* Required Ontologies
* Added Drosophila Anatomy Ontology (FBbt) release 2024-08-08
* Added Drosophila Development Ontology (FBdv) release 2024-08-07
* Added Zebrafish Anatomy Ontology (ZFA+ZFS) release 2022-12-09
* Required Gene Annotations
* Refactored table to include NCBI Taxon for supported organisms
* Added *Danio rerio* Reference GRCz11.112 (Ensembl 112)
* Added *Drosophila melanogaster* Reference BDGP6.46 (Ensembl 112)
* obs (Cell metadata)
* Updated `development_stage_ontology_term_id` to support *Danio rerio* and *Drosophila melanogaster*
* Added `organism_cell_type_ontology_term_id`
* Updated `organism_ontology_term_id` for *Danio rerio* and *Drosophila melanogaster* to require all observations to contain the same value
* Added `organism_tissue_ontology_term_id`
* Updated `tissue_type` to require `"tissue"` for *Danio rerio* and *Drosophila melanogaster*
* var and raw.var (Gene Metadata)
* Updated `feature_reference` to support *Danio rerio* and *Drosophila melanogaster*

0 comments on commit de92e0b

Please sign in to comment.