Stacks: genotypes

genotypes

This program exports a Stacks data set either as a set of observed haplotypes at each locus in the population, or with the haplotypes encoded into genotypes. The -r option allows only loci that exist in a certain number of population individuals to be exported. In a mapping context, raising or lowering this limit is an effective way to control the quality level of markers exported as genuine markers will be found in a large number of progeny. If exporting a set of observed haplotypes in a population, the -m option can be used to restict exported loci to those that have a minimum depth of reads.

By default, when executing the pipeline (either denovo_map.pl or ref_map.pl) the genotypes program will be executed last and will identify mappable markers in the population and export both a set of observed haplotypes and a set of generic genotypes with -r 1. If SQL interaction is enabled, these files will be uploaded to the database where Stacks will store the genotyping information in a neutral way. From the web interface, additional, manual corrections can be made, as well as marker annotations and all of this data can be exported directly from the web, after specifying a particular map type (if exporting data from a genetic cross).

Making Corrections

If enabled with the -c option, the genotypes program will make automated corrections to the data. Since loci are matched up in the population, the script can correct false-negative heterozygote alleles since it knows the existence of alleles at a particular locus in the other individuals. For example, the program will identify loci with SNPs that didn’t have high enough coverage to be identified by the SNP caller. It will also check that homozygous tags have a minimum depth of coverage, since a low-coverage polymorphic locus may appear homozygous simply because the other allele wasn’t sequenced.

Correction Thresholds

The thresholds for automatic corrections can be modified by changing the default values for the min_hom_seqs, min_het_seqs, and max_het_seqs parameters to genotypes. min_hom_seqs is the minimum number of reads required to consider a stack homozygous (default of 5). The min_het_seqs and max_het_seqs variables represent fractions. If the ratio of the depth of the the smaller allele to the bigger allele is greater than max_het_seqs (default of 1/10) a stack is called a het. If the ratio is less than min_het_seqs (default of 1/20) a stack is called homozygous. If the ratio is in between the two values it is is unknown and a genotype will not be assigned.

Automated corrections made by the program are shown in the output file in capital letters.

Making genotypes appear in the web interface

If the -s option is specified, a second file will be output containing the genotypes in SQL format — which can be imported back in to the database (into the catalog_genotypes table). These genotypes can then be seen in the web interface and additional, manual corrections can be made through the web. The manual corrections can then be included in the output by exporting the results directly from the web interface.

Program Options

genotypes -b batch_id -P path [-r min] [-m min] [-t map_type -o type] [-B blacklist] [-W whitelist] [-c] [-s] [-e renz] [-v] [-h]

b — Batch ID to examine when exporting from the catalog.
r — minimum number of progeny required to print a marker.
c — make automated corrections to the data.
P — path to the Stacks output files.
t — map type to write. 'CP', 'DH', 'F2', 'BC1', and 'GEN' are the currently supported map types.
o — output file type to write, 'joinmap', 'onemap', 'rqtl', and 'genomic' are currently supported.
m — specify a minimum stack depth required before exporting a locus in a particular individual.
s — output a file to import results into an SQL database.
B — specify a file containing Blacklisted markers to be excluded from the export.
W — specify a file containign Whitelisted markers to include in the export.
e — restriction enzyme, required if generating 'genomic' output.
v — print program version.
h — display this help messsage.

Filtering options:

--lnl_lim [num] — filter loci with log likelihood values below this threshold.

Automated corrections options:

--min_hom_seqs [num] — minimum number of reads required at a stack to call a homozygous genotype (default 5).
--min_het_seqs [num] — below this minor allele frequency a stack is called a homozygote, above it (but below --max_het_seqs) it is called unknown (default 0.05).
--max_het_seqs [num] — minimum frequency of minor allele to call a heterozygote (default 0.1).

Manual corrections options:

--cor_path [path] — path to file containing manual genotype corrections from a Stacks SQL database to incorporate into output.

Example Usage

Exporting a set of observed haplotypes, with a minimum stack depth of 5 reads:

~/% genotypes -P ./stacks/ -b 1 -m 5 -r 3

Exporting a set of generic, map-agnostic genotypes, requiring a marker to be present in at least three progeny:

genotypes

Making Corrections

Correction Thresholds

Making genotypes appear in the web interface

Program Options

Filtering options:

Automated corrections options:

Manual corrections options:

Example Usage

Other Pipeline Programs

Raw Reads

Core

Execution control

Utilities