Stacks: stacks-integrate-alignments

stacks-integrate-alignments

The stacks-integrate-alignments script will take loci produced by the de novo pipeline, along with a set of alignments of those loci against a reference genome, and inject the alignment coordinates back into the de novo-produced data. The program will extract the coordinates of the RAD loci from the given BAM file into a 'locus_coordinates.tsv' table, and it then rewrites the 'catalog.fa.gz' and 'catalog.calls' files so that they include the genomic coordinates given in the input BAM file.

These data can be aligned to any reference genome the user is interested in. This may include a closely-realted genome, such as from the organism itself, to a more distantly related genome. Of course, the more distantly related the genome, the fewer loci that will be successfully aligned. The user can also filter alignments to exclude poorly mapped loci using several options including, minimum mapping quality (provided by the alignment program and stored in the BAM file containing the alignments), minimum alignment coverage and minimum percent identity, both of which are calculated from the CIGAR strings in the supplied alignments.

Once this integration is complete, we can run the populations program to export, for example, smoothed F_ST statistics along the reference genome.

Program Options

stacks-integrate-alignments --in-path path --bam-path path --out-path path [-q min] [-a min] [-p min]

-P, --in-path path — Path to a directory containing Stacks ouput files.
-B, --bam-path path — Path to a SAM or BAM file containing alignment of de novo catalog loci to a reference genome.
-O, --out-path path — Path to write the integrated ouput files.
-q, --min_mapq min — Minimum mapping quality as listed in the BAM file (default 20).
-a, --min_alncov min — Minimum fraction of the de novo catalog locus that must participate in the alignment (default 0.6).
-p, --min_pctid min — Minimum BLAST-style percent identity of the largest alignment fragment for a de novo catalog locus (default 0.6).
--verbose — Provide verbose output.

Example Usage

We start with the output from the de novo pipeline, most easily produced by denovo_map.pl. Using the set of consensus seqeunces that make up the catalog, contained in the 'catalog.fa.gz' file, we will align those sequences against our reference genome of interest and convert the output to BAM:

% bwa mem /path/to/reference/gaculeatus ./stacks_output/catalog.fa.gz | samtools view -h -b > ./stacks_output/integrate_alns/catalog.gacu.bam

We next feed these alignments, along with the directory containing the Stacks de novo files and a separate output directory to stacks-integrate-alignments:

% stacks-integrate-alignments --in-path ./stacks_output --bam-path ./stacks_output/integrate_alns/catalog.gacu.bam --out-path ./stacks/integrate_alns/

Finally, we could run populations to generate smoothed statistics along the chromosomes of our integrated reference genome:

% populations --in-path ./stacks/integrate_alns --ordered-export --fstats --smooth -r 0.8

stacks-integrate-alignments

Program Options

Example Usage

Other Pipeline Programs

Raw reads

Core

Execution control

Utility programs