-
Notifications
You must be signed in to change notification settings - Fork 10
Models
Models allow to interact with the database. Each modal interacts with a given database table. In pATLAS database there are 8 tables:
All models are described here.
This is the main table that stores all sequence metadata.
This table follow a simple structure of two columns:
-
plasmid_id
: which is the accession number.- example:
NZ_CP029440_1
- example:
-
json_entry
: which is a json inside that column that the frontend uses to fetch several data.- Example:
{ "length": 60653, "cluster": "4", "significantLinks": [{"accession": "NC_014725_1", "size": 72832, "distance": "0.0556462", "percentage_hashes": 0.184}, ...], "plasmid_name": "pCAV1947-61", "taxa": ["Klebsiella", "Enterobacteriaceae", "Enterobacterales"], "name": "Klebsiella_quasipneumoniae", }
In the above example we have a sequence with length
60653
bp, belonging to the cluster4
(calculated by MASHix.py), which is a plasmid namedpCAV1947-61
, described in the speciesKlebsiella quasipneumoniae
. Then, thetaxa
field stores:- Genus
- Family
- Order
The field
significantLinks
stores an array with all the links that a plasmid have with all other plasmids in the database. Each element in the array will be a dictionary in which theaccession
,size
,distance
(mash distance) andpercentage_hashes
(a representation of the shared sequence between the two plasmids) are stored for each pairwise comparison.
Corresponding model class: Plasmid
Although the name suggests that this database table stores card annotations, in fact it stores all database entries for antibiotic resistance databases per accession number (at the moment resfinder and card databases available in abricate.)
This table follow a simple structure of two columns:
-
plasmid_id
: which is the accession number.- example:
NZ_CP012569_1
- example:
-
json_entry
: which is a json inside that column that the frontend uses to fetch several data.- Example:
{ "accession": ["AF078527:4388-4841", "JQ808129:1599-2379", "FM207631", "JQ808129"], "seq_range": [[12345, 12798], [15817, 16597], [12255, 12797], [15817, 16596]], "database": ["card", "card", "resfinder", "resfinder"], "identity": [100.0, 100.0, 99.82, 100.0], "aro_accession": ["ARO:3002847", "ARO:3002666", null, "ARO:3002666"], "coverage": [100.0, 100.0, 100.0, 100.0], "gene": ["arr-2", "rmtF", "ARR-3_4", "rmtf_1"] }
In this json each one of the entries stores an array, in which the position of each element is relative to the same annotation between all the arrays in each database entry. For instance, the first element is:
-
accession
:AF078527:4388-4841
-
seq_range
:[12345, 12798]
(the range in the plasmid of this annotation) -
database
:card
(the database from which the annotation came from) -
identity
:100.0
(the identity reported by blast results) -
aro_accession
:ARO:3002847
(these are annotations exclusive of card databases) -
coverage
:100.0
(the coverage of the annotated sequence in the plasmid reported by blast results) -
gene
:arr-2
(the actual name of the gene)
Corresponding model class: Card
This database table stores the annotations for vfdb made with abricate.
This table follow a simple structure of two columns:
-
plasmid_id
: which is the accession number. -
json_entry
: which is a json inside that column that the frontend uses to fetch several data.
This uses a similar structure to the one used in card. However, in this
case the aro_accession
entry will be an array of false
.
Corresponding model class: Positive
This database table stores the annotations for plasmidfinder latest database.
This table follow a simple structure of two columns:
-
plasmid_id
: which is the accession number. -
json_entry
: which is a json inside that column that the frontend uses to fetch several data.
This uses a similar structure to the one used in card. However, in this
case the aro_accession
entry will be an array of false
.
Corresponding model class: Database
This table stores all the sequences used to construct pATLAS, allowing users
to download sequences from the application without the need to use efetch
from NCBI, which often would fail. This is the larger database table and
responsible for almost the full size of the database, since it stores the raw
sequences.
This table uses the following structure of two columns:
-
plasmid_id
: which is the accession number. -
sequence_entry
: which is the string with the full plasmid sequence.
Corresponding model class: SequenceDB
In this table pATLAS can store results for a given json sent to the database. The usage of such api is documented in the gitbook.
This table uses the following structure of three columns:
-
unique_id
: which is an hash generated from the requested json file, that makes each query unique. -
timestamp
: stores the time at which the database entry was added. This is used by cron_delete.py to delete all entries in database older than 1 day. -
json_entry
: the json object that is used to make selections in the pATLAS frontend network and that will be associated with this unique hash and database entry.
Note that if two different users send an equal json with results it will have the same hash, but this will not be a problem because they are querying the database for the same selection and therefore in fact they want to see the same result. If the entry is already in the database, the view responsible for handling this will just return the url so that the second user sending the request can see its desired results without changing the database entry (which could lead to odd behaviors).
Corresponding model class: UrlDatabase
In this table pATLAS can temporarily store a list of accession numbers that a given user query for download sequences. This is used to prepare the download before sending it to the user, thus storing the accession numbers that the user requested the download, then it will prompt a message to the user when this is done and then the user will have to confirm the download which will trigger the download itself.
This table uses the following structure of three columns:
-
unique_id
: which is an hash generated with all the entries in the list of accession numbers.timestamp
: stores the time at which the database entry was added. In this case cron_delete.py will delete every entry older than 15 minutes.
Corresponding model class: FastaDatabase
This database table stores the annotations for bacmet latest database.
This table follow a simple structure of two columns:
-
plasmid_id
: which is the accession number. -
json_entry
: which is a json inside that column that the frontend uses to fetch several data.
This uses a similar structure to the one used in card. However, in this
case the aro_accession
entry will be an array of false
.
Corresponding model class: MetalDatabase