Stacks

stacks-integrate-alignments

The stacks-integrate-alignments script will take loci produced by the de novo pipeline, along with a set of alignments of those loci against a reference genome, and inject the alignment coordinates back into the de novo-produced data. The program will extract the coordinates of the RAD loci from the given BAM file into a 'locus_coordinates.tsv' table, and it then rewrites the 'catalog.fa.gz' and 'catalog.calls' files so that they include the genomic coordinates given in the input BAM file.

These data can be aligned to any reference genome the user is interested in. This may include a closely-realted genome, such as from the organism itself, to a more distantly related genome. Of course, the more distantly related the genome, the fewer loci that will be successfully aligned. The user can also filter alignments to exclude poorly mapped loci using several options including, minimum mapping quality (provided by the alignment program and stored in the BAM file containing the alignments), minimum alignment coverage and minimum percent identity, both of which are calculated from the CIGAR strings in the supplied alignments.

Once this integration is complete, we can run the populations program to export, for example, smoothed FST statistics along the reference genome.

Program Options

stacks-integrate-alignments --in-path path --bam-path path --out-path path [-q min] [-a min] [-p min]

Example Usage

  1. We start with the output from the de novo pipeline, most easily produced by denovo_map.pl. Using the set of consensus seqeunces that make up the catalog, contained in the 'catalog.fa.gz' file, we will align those sequences against our reference genome of interest and convert the output to BAM:

    % bwa mem /path/to/reference/gaculeatus ./stacks_output/catalog.fa.gz | samtools view -h -b > ./stacks_output/integrate_alns/catalog.gacu.bam

    We next feed these alignments, along with the directory containing the Stacks de novo files and a separate output directory to stacks-integrate-alignments:

    % stacks-integrate-alignments --in-path ./stacks_output --bam-path ./stacks_output/integrate_alns/catalog.gacu.bam --out-path ./stacks/integrate_alns/

    Finally, we could run populations to generate smoothed statistics along the chromosomes of our integrated reference genome:

    % populations --in-path ./stacks/integrate_alns --ordered-export --fstats --smooth -r 0.8

Other Pipeline Programs

Raw reads

Core

Execution control

Utility programs