Stacks: Pipeline Components

Pipeline Core

ustacks - The unique stacks program will take as input a set of short-read sequences and align them into exactly-matching stacks. Comparing the stacks it will form a set of loci and detect SNPs at each locus using a maximum likelihood. [details]
cstacks - A catalog can be built from any set of samples processed by the ustacks program. It will create a set of consensus loci, merging alleles together. In the case of a genetic cross, a catalog would be constructed from the parents of the cross to create a set of all possible alleles expected in the progeny of the cross. [details]
sstacks - Sets of stacks constructed by the ustacks program can be searched against a catalog produced by the cstacks program. In the case of a genetic map, stacks from the progeny would be matched against the catalog to determine which progeny contain which parental alleles. [details]
gstacks - For de novo analyses, this program will pull in paired-end reads, if available, assemble the paired-end contig and merge it with the single-end locus, align reads to the locus, and call SNPs. For reference-aligned analyses, this program will build loci from the single and paired-end reads that have been aligned and sorted. [details]
populations - This program will be executed in place of the exisiting genotypes program when a population is being processed through the pipeline. The populations program will output site level SNP calls (in VCF format) and will compute summary statistics such as Pi, F_is, and F_st. [details]

Raw Reads

process_radtags - examines raw reads from an Illumina sequencing run and first, checks that the barcode and the RAD cutsite are intact (correcting minor errors). Second, it slides a window down the length of the read and checks the average quality score within the window. If the score drops below 90% probability of being correct, the read is discarded. [details]
process_shortreads - performs the same task as process_radtags for fast cleaning of randomly sheared genomic or transcriptomic data. [details]
clone_filter - take a set of paired-end reads and reduce them according to PCR clones (a PCR clone is a pair of reads that match exactly, while paried-end reads from two different DNA molecules will nearly always be slightly different lengths). [details]
kmer_filter - allows paired or single-end reads to be filtered according to the number or rare or abundant kmers they contain. Useful for both RAD datasets as well as randomly sheared genomic or transcriptomic data. [details]

Pipeline Execution

denovo_map.pl - executes each of the stacks components to create a genetic linkage map, or to identify the alleles in a free-standing population. It also handles uploading data to the database. [details]
ref_map.pl - takes reference-aligned input data and executes each of the stacks components, using the reference alignment to form stacks, and identifies alleles. Can be used in a genetic map of a free-standing population. It also handles uploading data to the database. [details]

Utility Programs

stacks-dist-extract - pull data distributions from the log and distribs files produced by the Stacks component programs. [details]
stacks-integrate-alignments - take loci produced by the de novo pipeline, align them against a reference genome, and inject the alignment coordinates back into the de novo-produced data. [details]
stacks-private-alleles - extract private allele data from the populations program outputs and output useful summaries and prepare it for plotting. [details]