Skip to content
qPCR4vir edited this page Nov 14, 2015 · 7 revisions

Columns with original data.

Ideally will be contributed by the original authors!

Column Description
MEGA name: (A) Have to exactly coincide with the sequence name in MEGA. Very helpfully if just the GenBank Accession ID. Repeated names are automatically colored (usually an error).
Tab-Pub: (B) Is a filter with the genotype or, for example: e --> some error (artificial, artifact, patents, etc), exclude this row from analysis. c -->clones, cells. n --> new, empty row, with the formats and formulas ready to use. (Provided as a place where you can add your new data). no --> the sequence have not been (yet) included in the alignment. o -->other HEV variants, possible not zoonotic
Strain name: (F) Contain the strain or isolate name. Take care to consistently enter the same name for all the sequences belonging to the same strain/isolate. In some few case there is know the strain and the isolate name: in these case the isolate name is written in the Isolate column.Repeated names are automatically colored (NOT an error).
Isolate: (G) It will be combined with the strain name in the Table column “Strain name / isolate”
Country: (H) The country of the institution where the sequences were obtained
Country cod: (I) The country of the Origin of the strain. The 3-letter country code after ISO 3166-1 alpha-3.
Continent: (J) The continent of the Origin of the strain.
Region:two columns (K,L) The region of the Origin of the strain. The first column is inserted in the seq. Label and could be just the initials of the second, but the second could be a more detailed location
Host (M), Source (N) source of the isolation
Y (O), M (P), D (Q) year, month an day of sampling
Inst (R) Institution authoring the isolation. Currently only German institutions. Contribution about others will be appreciated.
First_nt: (S) Position of the first nt in “Burma”coordinates. Initialy to be manually enter but can be calculate if the position in the alignment is know.
Length:

Others columns: mostly formulas or results of phylogenetic analysis or just filters

Column Description
Lu & Li: Subtype as appear in (Lu et al., 2006). Only for sequences cited there. Any value here will cause a mark (*) in the Table and in etiquetes.
Next 6 columns: CG, ORF1, HVR, RdRp, ORF3 and ORF2 (approximately): Result of the genotyping using these regions.
Next 6 columns:CG,ORF1,HVR,RdRp,ORF3andORF2: Regions present in the sequence (approximately)
Beg,End: Beginning and end of the sequence in “alignment”coordinates.
Length:Formula
Contain region: Formula. For use as filter. Will beTrueif the sequence fully span (contain) the region selected in the sheet “Regions”.
Next 3 columns: C1, C2 and C3: Free columns, to be use for comments or filters.
Next 3 columns:F.ORF1, F.HVR, F.RdRp, F.ORF3andF.ORF2: Filters used to decide the sequences to appear in the corresponding Fig. or Tree. Possible usage: n--> will not appear. s--> will appear. They also modify the corresponding column in the Table: add there a'+'when these filter begging with (sequence proposed to be a “standard” for classification of other sequences using these regions)
Next 3 columns: C4 and C5: Free columns, to be use for comments or filters.
NCBI: LinkFormula Is a WWW link to the NCBI site and will show the full sequence Item. Very handy while actualizing the data of the sequence, and to see other information not in these file. Especially useful to quickly find the original publication.
Next 3 columns: Etiq-Automat, Acc.Strain and Acc: Formulas. Possible different forms of sequence etiquette.
Clear, Span, Selected: Formula. After all the filters are applied, and only the sequences you want to analyze are displayed, one of this is the column you need to select, copy and paste in a new, empty text document. Save the document to be use as “group file” in MEGA.
5 columns: Etiq-Strain, Etiq-Strain+Automat+CG, Etiq-Strain+Automat+CG+y, Etiq-Strain+Automat, Etiq-Automat: Formulas. Different variants of etiquettes to be used in MEGAas “group file”.
Next 3 columns: g.subtype, g.grupe and g.genotype: Formulas. Different variants of grouping the sequences,to be used in MEGAas “group file”, especially to import into tree windows.
f.subt: Formula Is the best column to order and filter the sequences by subtype.
Next 9 columns:Accession no.,Strain name / isolate,Classification & country,CG,ORF1.250nt, HVR.247nt, RdRp.280nt, ORF3.225nt, andORF2.187nt: Formulas, Table. Together generate a Table as appear in Vina-Rodriguez, 2015.
Orig Gr Select f:(Original-Group Selected filter) When you finish in MEGA of classifying a new bath of sequences, you can export a group file containing all these information, open it in a text editor, copy all and then paste it into this column.
The next 5 columns will help you parse this information back to this Excel file.
Next 5 columns: MEGA, MEGA name, MEGAselect,AccNandStrainName:Formulas. You can copy from here and paste by value or content only into the corresponding initial columns of this Excel sheet.

The rest of the columns are used to generate the file you need to submit the sequences to GenBank, or just to keep track of the primers used to generate each fragment.