The denovo_map.pl program will execute the Stacks pipeline by running each of the Stacks components individually. It is the simplest way to run Stacks and it handles many of the details. The program performs several stages, including:
The raw data for each sample in the analysis has to be specified to Stacks. The denovo_map.pl program expects (but does not require) that your raw sequencing data were demultiplexed and cleaned by the process_radtags program.
All of your samples have to be sepcified together for a single run of the pipeline. This is done by specifying your list of samples to denovo_map.pl by using a population map (--popmap) as well as specifying the path to the directory containing all samples using the --samples option. denovo_map.pl will read the contents of the population map file and search for each specified sample in the --samples directory.
A population map contains assignments of each of your samples to a particular population. See the manual for more information on how they work. The denovo_map.pl program will not directly use the file except to read the sample names for processing. It is the populations program that acutally uses the population map for statistical calculations, and denovo_map.pl will provide the map to the populations program. You can run the populations program by hand, specifying other population maps as you like, after the pipeline completes its first execution.
The Stacks component programs contain a lot of possible options that can be invoked. It would be impractical to expose them all througth the denovo_map.pl wrapper program. Instead, you can pass additional options to internal programs that denovo_map.pl will execute using the -X. To use this option, you specify (in quotes) the program the option goes to, followed by a colon (":"), followed by the option itself. For example, -X "populations:--fstats" will pass the --fstats option to the populations program. Each option should be specified separately with -X. See below for examples.
There are a few reasons to run the pipeline manually instead of using the denovo_map.pl wrapper.
Stacks is designed so that you run the core pipeline once (or several times to optimize parameters, then complete the final run), but once it is complete, the core information (assembled loci, genotyped and phased into haplotypes across the meta-population) is complete and does not need to be further modified. It is common to then run the populations program multiple times. populations will read the core pipeline outputs and can then be used to filter the data in many ways, to export the data in different formats, or to apply different population maps so the population genetics statistics are calculated according to those different maps (e.g. by geography, phenotype, or environmnetal vairable).
denovo_map.pl --samples dir --popmap path -o dir [--paired [--rm-pcr-duplicates]] (assembly options) (filtering options) [-X prog:"opts" ...]
Your samples directory should look similar to this, after processing with process_radtags:
% ls samples/ sample_06.fq.gz sample_07.01.fq.gz sample_07.02.fq.gz sample_09.fq.gz ...
The population map would look like this:
% cat ./treestudy_popmap sample_07.01 redlake sample_07.02 redlake sample_06 blueriver sample_09 blueriver ...
And we supply both of these paths to denovo_map.pl, along with an output directory to store results and some parameter settings for builing loci.
% denovo_map.pl -M 5 -T 8 -o ./stacks --popmap ./treestudy_popmap --samples ./samples
% denovo_map.pl -M 5 -o ./stacks --popmap ./treestudy_popmap --samples ./samples -X "populations:--write-single-snp"
% denovo_map.pl -M 4 -o ./stacks/M4 --popmap ./treestudy_popmap --samples ./samples --min-samples-per-pop 0.80 % denovo_map.pl -M 5 -o ./stacks/M5 --popmap ./treestudy_popmap --samples ./samples --min-samples-per-pop 0.80 % denovo_map.pl -M 6 -o ./stacks/M6 --popmap ./treestudy_popmap --samples ./samples --min-samples-per-pop 0.80 % denovo_map.pl -M 7 -o ./stacks/M7 --popmap ./treestudy_popmap --samples ./samples --min-samples-per-pop 0.80 % denovo_map.pl -M 8 -o ./stacks/M8 --popmap ./treestudy_popmap --samples ./samples --min-samples-per-pop 0.80
Your samples directory should look similar to this, after processing with process_radtags:
% ls samples/ sample_06.1.fq.gz sample_07.01.1.fq.gz sample_07.02.1.fq.gz sample_09.1.fq.gz sample_06.2.fq.gz sample_07.01.2.fq.gz sample_07.02.2.fq.gz sample_09.2.fq.gz ...
The population map would look like this:
% cat ./treestudy_popmap sample_07.01 redlake sample_07.02 redlake sample_06 blueriver sample_09 blueriver ...
And we supply both of these paths to denovo_map.pl, along with an output directory to store results and some parameter settings for builing loci.
% denovo_map.pl -M 5 -T 8 -o ./stacks --popmap ./treestudy_popmap --samples ./samples --paired
The denovo_map.pl program will handle specifying the single-ends to the core pipeline (ustacks→cstacks→sstacks) and the paired-ends to tsv2bam.
% denovo_map.pl -M 7 -T 8 -o ./stacks --popmap ./treestudy_popmap --samples ./samples --rm-pcr-duplicates --paired
The main output of denovo_map.pl is the log file, denovo_map.log (and, of course, all of the individual pipeline components will create their own output). The denovo_map.log file will capture all of the outputs from the component programs, so each sample run in ustacks, the cstacks, sstacks, tsv2bam, gstacks, and populations.
The denovo_map.log log file will also provide a table listing the depth of sequencing coverage for each sample processed (as calculated by ustacks) for the single-end reads. It will look similar to this:
sample loci assembled depth of cov max cov number reads incorporated % reads incorporated sample_07.01 41228 19.77 291 303663 81.6 sample_07.02 39101 18.62 212 249467 74.0 sample_06 35506 17.87 231 199709 86.0 sample_09 48445 12.24 295 233270 83.6 ...
A convenient way to extract this information from the large log file is to use the stacks-dist-extract utility, like this:
stacks-dist-extract --pretty denovo_map.log cov_per_sample
This is a key metric in any de novo map analysis.
Raw reads |
Core |
Execution control |
Utility programs |