Skip to content
Tiago Jesu edited this page Oct 25, 2018 · 8 revisions

Models

Models allow to interact with the database. Each modal interacts with a given database table. In pATLAS database there are 8 tables:

All models are described here.

plasmid

This is the main table that stores all sequence metadata.

This table follow a simple structure of two columns:

  • plasmid_id: which is the accession number.

    • example: NZ_CP029440_1
  • json_entry: which is a json inside that column that the frontend uses to fetch several data.

    • Example:
    {
      "length": 60653, 
      "cluster": "4", 
      "significantLinks": [{"accession": "NC_014725_1", "size": 72832, "distance": "0.0556462", "percentage_hashes": 0.184}, ...], 
      "plasmid_name": "pCAV1947-61", 
      "taxa": ["Klebsiella", "Enterobacteriaceae", "Enterobacterales"], 
      "name": "Klebsiella_quasipneumoniae",
    }

    In the above example we have a sequence with length 60653 bp, belonging to the cluster 4 (calculated by MASHix.py), which is a plasmid named pCAV1947-61, described in the species Klebsiella quasipneumoniae. Then, the taxa field stores:

    1. Genus
    2. Family
    3. Order

    The field significantLinks stores an array with all the links that a plasmid have with all other plasmids in the database. Each element in the array will be a dictionary in which the accession, size, distance (mash distance) and percentage_hashes (a representation of the shared sequence between the two plasmids) are stored for each pairwise comparison.

Corresponding model class: Plasmid

card

Although the name suggests that this database table stores card annotations, in fact it stores all database entries for antibiotic resistance databases per accession number (at the moment resfinder and card databases available in abricate.)

This table follow a simple structure of two columns:

  • plasmid_id: which is the accession number.

    • example: NZ_CP012569_1
  • json_entry: which is a json inside that column that the frontend uses to fetch several data.

    • Example:
    {
      "accession": ["AF078527:4388-4841", "JQ808129:1599-2379", "FM207631", "JQ808129"], 
      "seq_range": [[12345, 12798], [15817, 16597], [12255, 12797], [15817, 16596]], 
      "database": ["card", "card", "resfinder", "resfinder"], 
      "identity": [100.0, 100.0, 99.82, 100.0], 
      "aro_accession": ["ARO:3002847", "ARO:3002666", null, "ARO:3002666"], 
      "coverage": [100.0, 100.0, 100.0, 100.0], 
      "gene": ["arr-2", "rmtF", "ARR-3_4", "rmtf_1"]
    }

    In this json each one of the entries stores an array, in which the position of each element is relative to the same annotation between all the arrays in each database entry. For instance, the first element is:

    • accession: AF078527:4388-4841
    • seq_range: [12345, 12798] (the range in the plasmid of this annotation)
    • database: card (the database from which the annotation came from)
    • identity: 100.0 (the identity reported by blast results)
    • aro_accession: ARO:3002847 (these are annotations exclusive of card databases)
    • coverage: 100.0 (the coverage of the annotated sequence in the plasmid reported by blast results)
    • gene: arr-2 (the actual name of the gene)

Corresponding model class: Card

positive

This database table stores the annotations for vfdb made with abricate.

This table follow a simple structure of two columns:

  • plasmid_id: which is the accession number.
  • json_entry: which is a json inside that column that the frontend uses to fetch several data.

This uses a similar structure to the one used in card. However, in this case the aro_accession entry will be an array of false.

Corresponding model class: Positive

database

This database table stores the annotations for plasmidfinder latest database.

This table follow a simple structure of two columns:

  • plasmid_id: which is the accession number.
  • json_entry: which is a json inside that column that the frontend uses to fetch several data.

This uses a similar structure to the one used in card. However, in this case the aro_accession entry will be an array of false.

Corresponding model class: Database

sequence_db

This table stores all the sequences used to construct pATLAS, allowing users to download sequences from the application without the need to use efetch from NCBI, which often would fail. This is the larger database table and responsible for almost the full size of the database, since it stores the raw sequences.

This table uses the following structure of two columns:

  • plasmid_id: which is the accession number.
  • sequence_entry: which is the string with the full plasmid sequence.

Corresponding model class: SequenceDB

url_database

In this table pATLAS can store results for a given json sent to the database. The usage of such api is documented in the gitbook.

This table uses the following structure of three columns:

  • unique_id: which is an hash generated from the requested json file, that makes each query unique.
  • timestamp: stores the time at which the database entry was added. This is used by cron_delete.py to delete all entries in database older than 1 day.
  • json_entry: the json object that is used to make selections in the pATLAS frontend network and that will be associated with this unique hash and database entry.

Note that if two different users send an equal json with results it will have the same hash, but this will not be a problem because they are querying the database for the same selection and therefore in fact they want to see the same result. If the entry is already in the database, the view responsible for handling this will just return the url so that the second user sending the request can see its desired results without changing the database entry (which could lead to odd behaviors).

Corresponding model class: UrlDatabase

fasta_database

In this table pATLAS can temporarily store a list of accession numbers that a given user query for download sequences. This is used to prepare the download before sending it to the user, thus storing the accession numbers that the user requested the download, then it will prompt a message to the user when this is done and then the user will have to confirm the download which will trigger the download itself.

This table uses the following structure of three columns:

  • unique_id: which is an hash generated with all the entries in the list of accession numbers. timestamp: stores the time at which the database entry was added. In this case cron_delete.py will delete every entry older than 15 minutes.

Corresponding model class: FastaDatabase

metal_database

This database table stores the annotations for bacmet latest database.

This table follow a simple structure of two columns:

  • plasmid_id: which is the accession number.
  • json_entry: which is a json inside that column that the frontend uses to fetch several data.

This uses a similar structure to the one used in card. However, in this case the aro_accession entry will be an array of false.

Corresponding model class: MetalDatabase

Clone this wiki locally