The gstacks program will examine a RAD data set one locus at a time, looking at all individuals in the metapopulation for that locus.
For de novo analyses, the gstacks program will start with the results of the core single-end pipeline (ustacks→cstacks→sstacks→tsv2bam), incorporate the paired-end reads (if available), as fetched by tsv2bam, assemble the paired-end reads into a contig, merge the contig with the single-end locus, and align reads from individual samples to the locus.
For reference-aligned analyses, the gstacks program is the first program executed and it will create loci by incorporating single- or paired-end reads that have been aligned to the reference genome and sorted, using a sliding window algorithm.
In either mode, gstacks will identify SNPs within the meta population for each locus and then genotype each individual at each identified SNP. Once SNPs have been identified and genotyped, gstacks will phase the SNPs at each locus, in each individual, into a set of haplotypes.
The gstacks program is able to remove PCR duplicates (pairs of reads with identical insert lengths) if requested.
The gstacks program will output two major files, catalog.fa.gz, which contains the consensus sequence for each assembled locus in the data, as well as catalog.calls, a custom file that contains genotyping data. These files are intended to be read by the populations program, which can apply appropriate filters and export the data.
The input BAM file(s) must be sorted by coordinate. With -B, records must be assigned to samples using BAM "reads groups" (gstacks uses the ID/identifier and SM/sample name fields). Read groups must be consistent if repeated different files. With -I, read groups are unneeded and ignored.
Your Stacks directory should look similar to this, where the tags/snps/alleles/matches files were produced by the core pipeline (ustacks/cstacks/sstacks) and the matches.bam files were produced by tsv2bam:
% ls stacks/ sample_1020.alleles.tsv.gz sample_1069.alleles.tsv.gz sample_1086.alleles.tsv.gz sample_1095.alleles.tsv.gz sample_1020.matches.tsv.gz sample_1069.matches.tsv.gz sample_1086.matches.tsv.gz sample_1095.matches.tsv.gz sample_1020.matches.bam sample_1069.matches.bam sample_1086.matches.bam sample_1095.matches.bam sample_1020.snps.tsv.gz sample_1069.snps.tsv.gz sample_1086.snps.tsv.gz sample_1095.snps.tsv.gz sample_1020.tags.tsv.gz sample_1069.tags.tsv.gz sample_1086.tags.tsv.gz sample_1095.tags.tsv.gz
% gstacks -P ./stacks -M ./popmap -t 8
Your Stacks direcotry should look similar to this, where sample reads have been aligned and sorted by a standard aligner, such as bwa:
% ls aligned/ sample_1020.bam sample_1069.bam sample_1086.bam sample_1095.bam
% gstacks -I ./aligned -O ./stacks -M ./popmap -t 8
% gstacks -I ./aligned -O ./stacks -M ./popmap --rm-pcr-duplicates -t 8
Raw reads |
Core |
Execution control |
Utility programs |