Stacks

denovo_map.pl

The denovo_map.pl program will execute the Stacks pipeline by running each of the Stacks components individually. It is the simplest way to run Stacks and it handles many of the details, such as sample numbering and loading data to the MySQL database, if desired. The program performs several stages, including:

  1. Running ustacks on each of the samples specified, building loci and calling SNPs de novo in each sample.
  2. Executing cstacks to create a catalog of all loci across the population (or from just the parents if processing a genetic map). Loci are matched up across samples according to sequence similarity.
  3. Next, sstacks will be executed to match each sample against the catalog. In the case of a genetic map, the parents and progeny are matched against the catalog.
  4. In the case of a population analysis, the populations program will be run to generate population-level summary statistics. If you specified a population map (--popmap option) it will be supplied to populations. Computation is now complete.

Specifying Samples

The raw data for each sample in the analysis has to be specified to Stacks. All of your samples have to be sepcified together for a single run of the pipeline. This is done by specifying your list of samples to denovo_map.pl by using a population map (--popmap) as well as specifying the path to the directory containing all samples using the --samples option. denovo_map.pl will read the contents of the population map file and search for each specified sample in the --samples directory.

Using a population map

A population map is an optional file that contains assignments of each of your samples to a particular population. See the manual for more information on how they work. The denovo_map.pl program will not directly use the file, beyond reading the sample names out of it. It is the populations program that acutally uses the population map for statistical calculations, and denovo_map.pl will provide the map to the populations program. You can run the populations program by hand, specifying other population maps as you like, after the pipeline completes its first execution.

Passing additional arguments to Stacks component programs

The Stacks component programs contain a lot of possible options that can be invoked. It would be impractical to expose them all througth the denovo_map.pl wrapper program. Instead, you can pass additional options to internal programs that denovo_map.pl will execute using the -X. To use this option, you specify (in quotes) the program the option goes to, followed by a colon (":"), followed by the option itself. For example, -X "populations:--fstats" will pass the --fstats option to the populations program. Another example, -X "populations:-r 0.8" will pass the -r option, with the argument 0.8, to the populations program. Each option should be specified separately with -X. See below for examples.

When not to use denovo_map.pl

There are a few reasons to run the pipeline manually instead of using the denovo_map.pl wrapper.

  1. If you have a very large number of samples, you may not want to put them all in the catalog. In a population analysis, all of the samples specified to denovo_map.pl will be loaded into the catalog. In a de novo analysis, each sample added to the catalog will also add a small fraction of error to the catalog. With a very large number of samples, the error can overwhelm true loci in the population. In this case you may only want to load a subset of each population in your analysis.
  2. Again, if you have a lot of samples, you may want to speed your analysis by splitting up your samples and running them on a number of nodes in a cluster. In this case, you would have to queue up ustacks to run on different nodes with different samples. This can't be done using denovo_map.pl.

Program Options

denovo_map.pl --samples dir --popmap path -o dir [--paired] (assembly options) [-X prog:"opts" ...]

Input/Output files:

General options:

Stack assembly options:

SNP model options:

Paired-end options:

Miscellaneous:

Example Usage

  1. In this example, I will supply a population map to denovo_map.pl containing the names of the samples I want to analyze and I will tell denovo_map.pl the directory the samples are located in.

    % denovo_map.pl -M 3 -n 2 -T 15 -o ./stacks --popmap ./treestudy_popmap --samples ./samples

  2. In this example, I will specifically tell the populations populations program to enable F statistics.

    % denovo_map.pl -M 3 -n 2 -o ./stacks --popmap ./treestudy_popmap --samples ./samples -X "populations:--fstats"

Other Pipeline Programs

Raw Reads

Core

Execution control