Stacks: ustacks

Program Options

ustacks -f file_path -i id -o path [-m min_cov] [-M max_dist] [-p num_threads] [-d] [-t file_type]

f — input file path.
i — a unique integer ID to identify this sample.
o — output path to write results.
M — Maximum distance (in nucleotides) allowed between stacks (default 2).
m — Minimum depth of coverage required to create a stack (default 2).
N — Maximum distance allowed to align secondary reads to primary stacks (default: M + 2).
p — enable parallel execution with num_threads threads.
t — input file Type. Supported types: fasta, fastq, gzfasta, or gzfastq (default: guess).
--name [name] — a name for the sample (default: input file name minus the suffix).
R — retain unused reads.
H — disable calling haplotypes from secondary reads.

Stack assembly options:

d, --deleverage — enable the Deleveraging algorithm, used for resolving over merged tags.
--keep_high_cov — disable the algorithm that removes highly-repetitive stacks and nearby errors.
--max_locus_stacks [num] — maximum number of stacks at a single de novo locus (default 3).
--k_len [len] — specify k-mer size for matching between alleles and loci (automatically calculated by default).

Gapped assembly options:

--gapped — preform gapped alignments between stacks.
--max_gaps — number of gaps allowed between stacks before merging (default: 2).
--min_aln_len — minimum length of aligned sequence in a gapped alignment (default: 0.80).

Model options:

--model_type [type] — either 'snp' (default), 'bounded', or 'fixed'

For the SNP or Bounded SNP model:

--alpha [num] — chi square significance level required to call a heterozygote or homozygote, either 0.1, 0.05 (default), 0.01, or 0.001.

For the Bounded SNP model:

--bound_low [num] — lower bound for epsilon, the error rate, between 0 and 1.0 (default 0).
--bound_high [num] — upper bound for epsilon, the error rate, between 0 and 1.0 (default 1).

For the Fixed model:

--bc_err_freq [num] — specify the barcode error frequency, between 0 and 1.0.

Example Usage

Here we run ustacks against four samples from a genetic cross, two parents and two progeny. We assign each sample its own unique integer ID (-i) and we specify the parameters for creating putative alleles (-m) and merging alleles into putative loci (-M). We speed up the matching process by specifying 15 parallel threads.

% ustacks -f ./samples/f0_male.fq.gz -o ./stacks -i 1 -m 3 -M 4 -p 15 % ustacks -f ./samples/f0_female.fq.gz -o ./stacks -i 2 -m 3 -M 4 -p 15 % ustacks -f ./samples/progeny_01.fq.gz -o ./stacks -i 3 -m 3 -M 4 -p 15 % ustacks -f ./samples/progeny_02.fq.gz -o ./stacks -i 4 -m 3 -M 4 -p 15

Here we run ustacks against three samples from a population and we are allowing gapped alignments between alleles when forming putative loci.

% ustacks -f ./samples/sample_39-1.fq.gz -o ./stacks -i 1 -M 6 --gapped -p 15 % ustacks -f ./samples/sample_40-2.fq.gz -o ./stacks -i 2 -M 6 --gapped -p 15 % ustacks -f ./samples/sample_41-1.fq.gz -o ./stacks -i 3 -M 6 --gapped -p 15

ustacks

Program Options

Stack assembly options:

Gapped assembly options:

Model options:

For the SNP or Bounded SNP model:

For the Bounded SNP model:

For the Fixed model:

Example Usage

Other Pipeline Programs

Raw Reads

Core

Execution control

Utilities