Population Structure

PCAngsd was used to examine population structure. This program is specifically designed to work with low coverage data and calculates a sample covariance matrix (used for PCA plotting) and admixture proportions based on the optimal number of clusters.

As input data PCAngsd we used filtered SNPs in vcf format that were called using Freebayes. The script 01_import_vcf.sh uses ANGSD to convert these Freebayes SNP calls into Beagle format while also filtering for Hardy Weinberg equilibrium (p < 1e-6). The command takes the form

angsd -vcf-gl <input_vcf> -out <beagle_output> -fai <genome_index> -nind 148 -doMaf 1 -doGlf 2 -doMajorMinor 1 -SNP_pval 1e-6

PCAngsd was then run on these Beagle formatted files using the command;

python pcangsd.py -beagle <beagle_file> -threads 40 -admix -admix_save -admix_auto 10000 -o <output_file>

The covariance matrix can be used as the basis for a PCA. Plotting the first two principle components reveals the clear Magnetic Island - North distinction. It also reveals two clear outliers MI-2-9 and MI-1-16.

Admixture proportions are also calculated by PCAngsd (based on optimal K = 2). These can be plotted in the style of a STRUCTURE plot as follows;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01_population_structure.md

01_population_structure.md

Population Structure

Files

01_population_structure.md

Latest commit

History

01_population_structure.md

File metadata and controls

Population Structure