Stacks 2.68 - August 23, 2024
-----------------------------
Bugfix: updated process_radtags so that the poly-G detection only turns on
automatically if --clean is also specified on the command line.
Bugfix: updated process_radtags so kmer length is adjusted depending on
adapter length; increased default kmer size to 9 from 5.
Bugfix: kmer coordinates were not quite right in adapter filtering after
removing barcodes from adapter seqeunce search, preventing adapters from
being properly filtered in some cases.
Stacks 2.67 - July 18, 2024
---------------------------
Feature: Added the --min-gt-depth filter to populations. Requires a called SNP to be supported
by a minimum number of reads, otherwise it is marked as missing data.
Feature: Added processing of UMI field to process_radtags. Raw Illumina data may include
an extra field in the FASTQ header which represents a unique molecular
identifier (UMI).
Feature: Added detection for runs of G nucleotides to process_radtags. In recent two-color
chemistries from Illumina, 'no signal' can be mistaken for high quality runs of 'G'.
This feature detects high quality of runs of G nucleotides at the 3' end of the read
and drops reads containing this loss of signal (if there are 10 or more Gs present).
Feature: Added the --basename option to process_radtags so a user can sepcify an output
filename when processing individual input files (i.e. -f, or -1/-2). (Useful,
e.g., when processing individual samples from SRA.)
Bugfix: Corrected a bug in process_radtags when run with multiple threads: if more than one
barcode mapped to the same output file it could crash due to a lack of thread
synchronization. Code now assigns all barcodes pointing to the same output files to
the same output thread.
Bugfix: Corrected a bug in filtering adapter sequence in process_radtags that occurred
when a barcode was part of the adaptor sequence, but near the end of that adaptor
sequence, it could cause a read to be inadvertantly discarded.
Bugfix: Corrected a regression in process_radtags where FASTQ custom
headers (e.g. from the Sequence Read Archive) were not properly parsed.
Also added a check to ensure "/1" and/or "/2" are removed from the header
if present so as not to be duplciated after processing.
Bugfix: Corrected stacks-private-alleles to work properly with denovo
data.
Bugfix: Updated tsv2bam to properly assign IDs to samples that were not
part of the catalog. There was an off-by-one error in assigning the proper
IDs to these samples.
Stacks 2.66 - December 5, 2023
------------------------------
Feature: Rewrote stacks-dist-extract in Python including new support for partial
section names, streaming capability, and other improvements.
Feature: Included new stacks-private-alleles script that will extract private allele
data from the populations program outputs and output useful summaries and
prepare it for plotting.
Bugfix: In clone_filter, users sometimes specified a single oligo sequence on
the paired read, but the length of that oligo with --oligo-len-2 instead
of --oligo-len-1. Added code to use oligo length from either parameter
when a single sequence is specified.
Bugfix: private allele summary count in populations.log could be
incorrect, values in populations.sumstats.tsv were not affected.
Bugfix: when running in parallel with paired-end reads and retaining discarded
reads, process_radtags could segfault. Corrected threads writing to discard files.
Updated naming of discard output file.
Bugfix: corrected two small memory access errors in process_radtags.
Stacks 2.65 - August 18, 2023
-----------------------------
Feature: Added a "properly paired" reads counter to process_radtags.
Feature: extended populations filtering parameters to apply to fixed nucleotide sites, this
is applied in exports such as --vcf-all.
Feature: denovo_map.pl now accepts a second population map if you have a large data set and
would like to only load a subset of samples into the catalog.
Feature: Updated ustacks, cstacks, sstacks, and tsv2bam to no longer require
external sample ID numbers (-i option to ustacks). The pipeline will now
internally generate IDs when necessary, but most parts of the pipeline do
not need them any longer.
Feature: If a VCF input file for populations contains contig definitions
for a reference genome, those contigs will now be properly exported in any
VCF exports.
Feature: Added HaeII restriction enzyme.
Stacks 2.64 - March 5, 2023
---------------------------
Bugfix: the VCF export from populations could contain illegal fields for samples that have a
genotype call but effectively have no reads (reads contain unknown alleles or Ns in that
position). This could throw errors if a user then tried to import the VCF to popoulations
using --in-vcf.
Bugfix: private alleles. In the case of having two populations, where a particular site was differentally
fixed between the populations, both alleles would not be marked as private (neither would).
Now both alleles are marked as privte.
Stacks 2.63 - October 23, 2022
------------------------------
Feature: added AslI restriction enzyme.
Bugfix: fixed error in stacks-integrate-alns script which could cause a
failure when processing single-end data ('contig' is not defined in
catalog.fa.gz for single-end data).
Stacks 2.62 - June 28, 2022
---------------------------
Feature: added a '--vcf-all' export to populations which will export fixed
and variable sites to a VCF file. If '--ordered' is also specified, it
will only export non-overlapping sites.
Feature: improved ustacks logging to include final number of assembled
loci; modified denovo_map.pl to include this in its logging output.
Improved logging in populations program.
Feature: Added variant of PstI cutsite to process_radtags: pstishi only
leaves "GCAG" as the cutsite remnant, published in:
Shirasawa, Hirakawa, and Isobe, 2016, DNA Research; doi: 10.1093/dnares/dsw004
Bugfix: fixed assert() failure in populations when no population map was
specified.
Bugfix: updated stacks-dist-extract --pretty print to better handle
printing comments.
Stacks 2.61 - April 19, 2022
----------------------------
Feature: parallelized process_radtags. Can now run on multiple cores (max of 24 cores), resulting
in a speedup of 2-3x, depending on physical I/O and number of cores used. Minor improvements
to output status messages.
Feature: added '--pretty' print option to stacks-dist-extract script.
Bugfix: corrected bug in parsing of bootstrap archive file, long lines were not properly handled.
Feature: Added HhaI restriction enzyme.
Stacks 2.60 - October 26, 2021
------------------------------
Feature: memory usage reduction in populations. Some examples of memory savings:
- De novo and Ref-aligned included f-statistic calculations; no filtering employed.
- Ref-aligned includeds smoothed values.
- 2 populations, 10 samples, 99,505 RAD loci, ~435bp, paired-end reads:
* Reference-aligned: 2.2Gb vs. 0.9Gb, 59% reduction
* De novo analysis: 1.5Gb vs. 0.4Gb, 73% reduction
- 4 populations, 78 samples, 190,912 RAD loci, 94bp, single-end reads:
* Reference-aligned: 1.8Gb vs. 0.7Gb, 62% reduction
* De novo analysis: 1.6Gb vs. 0.8Gb, 51% reduction
- 18 populations, 241 samples, 626,584 RAD loci, ~370bp, paired-end reads:
* Reference-aligned: 9.3Gb vs. 4.2Gb, 56% reduction
* De novo analysis: 10.6Gb vs. 6.1Gb, 42% reduction
Feature: re-implemented bootstrapping for smoothed population statistics values calculated in the
populations program. Bootstrapping is now a two stage process: 1) run populations with the
popmap of choice and specify --bootstrap-archive to generate values for resampling. 2) Re-run
populations with specific boostrap flags (--bootstrap*) to generate p-values for specific
statistics. Populations will locate bootstrap parchive from previous run to conduct resampling.
Feature: updated gstacks to output a list of chromosomes to 'catalog.chrs.tsv' when processing
reference-aligned data. These data will then be incorporated by populations into VCF exports allowing
easier interoperability with vcftools and bcftools.
Feature: simplified SNP-based Fst output files, discarded some outputs rarely used for memory savings. Reduced
significant digits of some outputs (log-odds and confidence intervals) to save internal memory.
Bugfix: There was a small regression in clone_filter causing it to mishandle --null-index style oligos.
Bugfix: loci could be presented out of order in populations.snps.vcf and populations.haps.vcf when they
originated from consecutive scaffolds with single loci on each. This prevented bcftools and other
programs from properly indexing the VCF files.
Bugfix: populations.phistats.tsv had last line truncated due to file not being closed properly.
Bugfix: instituted maximum thread count for component programs.
Stacks 2.59 - July 21, 2021
---------------------------
Feature: updated populations to output the number of missing sites and loci, per sample to the
populations.log.distribs file.
Feature: replaced stacks-integrate-alignments with a new Python program. This new program allows for greater
filtering of alignments and more error checking for alignments where an fragment alignment could
associate SNPs within loci that had non-existent coordinates (on the reference genome).
Feature: updated populations to look for and if found, load the file 'catalog.chrs.tsv' in the Stacks output
directory. This is then exported as part of the VCF headers to supply contig names/lengths.
Feature: updated the process_radtags log file to have similar headers to the *.distribs files and the ability
to extract portions of the log with stacks-dist-extract utility.
Feature: updated denovo_map.pl to print more data to logfile for ustacks executions, including max depth,
number and percent of reads incorporated. Made output compatible with stacks-dist-extract utility.
Feature: updated denovo_map.pl and ref_map.pl to print more detailed message upon failure, including last
command executed.
Bugfix: In the populations program, the phylip-all export would throw an error (or misprint the sequences)
if the population names were of different lengths.
Stacks 2.58 - June 08, 2021
----------------------------
Bugfix: Fixed several memory errors in ustacks related to processing trimmed reads.
Stacks 2.57 - May 10, 2021
--------------------------
Feature: updated process_radtags so that if you specify the same sample name in the barcodes file for multiple
barcodes, the program will merge the output for those barcodes into the single, specified output file.
Feature: changed the default 'smoothed' and 'bootstrap' values in output files to contain a -1.0 if a particular locus
was not included in the smoothing/bootstrap algorithms (this occurs when RAD loci overlap the same genomic
region and only one of the loci can be included in the smoothing/bootstrapping).
Bugfix: Reverted earlier changes to ensure all mentions of column position (within a RAD locus) are zero-based, while
reference-based coordinates are one-based.
Bugfix: Updated populations VCF export so that snp_cols variable (tells you where the individual SNPs come from for a
set of haplotypes is in the proper order when the locus is on the negative strand (we reversed the order here).
Stacks 2.56 - March 16, 2021
----------------------------
Bugfix: Corrected process_radtags when processing dual index barcodes but only the second, i7 barcode is an actual
barcode, referred to as --null_index in the process_radtags barcodes options. In these cases, the first, i5
index barcode is being used as a random oligo to remove PCR duplicates.
Stacks 2.55 - January 07, 2021
------------------------------
Feature: Added NgoMIV restriction enzyme to process_radtags.
Feature: Added GTF export to populations, for reference-aligned data.
Stacks 2.54 - September 03, 2020
--------------------------------
Feature: Added BtgI, PacI, and PspXI, HpyCH4IV restriction enzymes to process_radtags.
Bugfix: stacks-integrate-alignments, tab characters fed to the grep were not being interpreted properly
Stacks 2.53 - March 28, 2020
----------------------------
Bugfix: denovo_map.pl was broken for running cstacks on non-genetic map datasets.
Stacks 2.52 - March 5, 2020
---------------------------
Feature: denovo_map.pl now has a --resume options, which will restart the pipeline if a previous run failed
to complete.
Bugfix: Improved denovo_map.pl wrapper so that if a genetic map is specified in the population map, only samples
labeled 'parent' are loaded into the catalog during the cstacks stage.
Bugfix: corrected malfunctioning error message in populations when improper population names supplied for genetic map.
Bugfix: populations VCF export: changed the ID field (for denovo), paritally reverting it back to v1 format. The first
three colums, 'chr basebapir ID' are now represented in the de novo format as 'cloc col1 cloc:col1',
where cloc is the catalog locus number, col1 is the 1-based position of the SNP within the locus and the ID field is a
concatenation of the two (making each SNP have an ID that is a combination of lucus ID and column).
Bugfix: Change the Phylip-var-all export from populations to insert a tab after the sample name, instead of padding with space.
Stacks 2.5 - December 16, 2019
------------------------------
Feature: genotyping for mapping crosses has been (re)added to the populations program. (In Stacks v1, this was done by
the now deprecated genotypes program. Mapping genotypes can be exported for JoinMap, r/QTL, or OneMap by
specifying the --map-type and --map-format options (with a parent/progeny population map) to populations.
Feature: gstacks: catalog.fa.gz files are now directly indexable.
Bugfix: denovo_map.pl: added code to properly handle '.1' suffix on input files without having to modify the population map.
Bugfix: gstacks: fixed target indexes being shifted in the BAM files produced by --write-alignments.
Bugfix: gstacks: fixed --write-alignments not respecting -O.
Bugfix: populations: fixed polyallelic SNPs causing an abort near PopSum.cc:96 (cf. marukihigh model & external VCFs).
Stacks 2.41 - July 8, 2019
--------------------------
Feature: populations: calculates haplotype-based Dxy (Nei, 1987) and provides for smoothing if a
reference genome is available.
Feature: populations: re-implemented full sequence export for phylip format, including partitioning information.
Stacks 2.4 - May 9, 2019
------------------------
Feature: populations: re-implemented HZAR export.
Feature: added reporting code to detect issues with inconsistent versions of libz on a host system
causing Stacks components to fail to open compressed files.
Feature: gstacks: improved PCR duplicate reporting to be per-sample.
Bugfix: populations: fixed an issues where the basepair position of a small number of loci was reported
incorrectly -- they were shifted by a small, fixed offset.
Stacks 2.3 - Jan 11, 2019
--------------------------
Feature: populations: Backwards-compatibly worked on filtering options; added long names for -r and -p and
added --min-samples-overall and --filter-haplotype-wise.
Feature: populations: Implemented --treemix.
Feature: gstacks: Improved RAD-loci reference sequences around the end of forward (restriction site-bound) reads.
Feature: gstacks: Improved the way 2-microsatelittes are dealt with.
Feature: gstacks: Changed the default value for --var-alpha from 0.05 to 0.01 (--gt-alpha is unchanged at 0.05).
Feature: gstacks: Improved PCR duplicates-related log information (distribution of clone sizes).
Feature: gstacks: Added an option to save read alignments (--write-alignments).
Feature: Backwards-compatibly switched to hyphens in command line options (underscores remain accepted
where they previously were).
Feature: cstacks/sstacks now report an error when the disk becomes full.
2.3b - Jan 23, 2019
----------
Bugfix: Fixed some limit cases causing an abort at gstacks.cc:1752.
Bugfix: Fixed some limit cases causing an abort at debruijn.cc:60.
Bugfix: Fixed some limit cases apparently causing an infinite loop in the de Bruijn code.
Bugfix: Restored compilation with the oldest C++11 GCC versions (4.9 and 5.0).
2.3c - Feb 27, 2019
---------
Bugfix: Fixed assert failure at gstacks.cc:1171 (corrected with a return on gstacks.cc:1126)
Bugfix: inadvertantly compiling out BAM support from process_radtags due to the removal of the HAVE_BAM
config option, which occurred when we moved the BAM library internally to Stacks.
Bugfix: corrected infinite loop in populations when --write-single-snp and -r were enabled.
Bugfix: correct missing comment marker in population's FASTA exports and missing ']' character in FASTA comments.
2.3d - Feb 28, 2019
---------
Bugfix: the snps_per_loc_postfilters distribution in the populations.log.distrib file was slightly off due to counting
the number of SNPs at loci where despite SNP objects present at the locus, all sites were fixed in the
focal populations.
Bugfix: Some haplotypes could pass through the filter, even after particular SNPs were filtered from them.
Bufgix: Corrected the samples per locus and absent-samples per locus distributions from populations.
2.3e - Mar 20, 2019
---------
Bugfix: the --write-random-snp flag was causing an infinite loop in populations.
Stacks 2.2 - Aug 22, 2018
--------------------------
Feature: Added the --bestrad flag to process_radtags. When used it will look for reads that need transposed
before they are processed.
Feature: gstacks: New option --max-debrujin-reads to control the construction of the de Bruijn graph; replaces
--min-kmer-freq which is now deprecated
Bugfix: Fixed a breaking circumstantial segmentation fault in populations
Bugfix: Added run number to output FASTQ headers in process_radtags to make sure read IDs are always unique.
Stacks 2.1 - June 25, 2018
--------------------------
Bugfix: Fixed a performance regression in sstacks. Recent changes in sstacks made it more likely to
invoke the gapped algorithm to match to the catalog. In some cases, matches to the catalog
would be marked as ambiguous alignments and dropped from the next stages of analysis due to
differences in CIGAR strings from the gapped alignment.
Feature: ustacks Changed --high_cov_thres default value from 2.0 to 3.0.
Feature: gstacks: Changed --min-kmer-freq default value from 0.05 to 0.01.
Feature: Added further checks on zlib calls.
Stacks 2.0 - Apr 23, 2018
-------------------------
Feature: modified cstacks and ustacks gapped alignment algorithms to always align the two stacks/alleles
with the most k-mers in common, removed the previous minimum k-mer limit.
Feature: modified cstacks so that when a sample locus matches two or more catalog loci, those catalog
loci are combined, or rolled-up, reducing undermerged loci that generate excess homozygote calls.
Feature: modified tsv2bam so that when two loci from the same sample match the same catalog locus, those
loci are combined.
Bugfix: corrected BbvCI restriction enzyme to add the missing negative strand sequence.
Bugfix: corrected the catalog writing routines for unzipped output files to include missing column.
Bugfix: populations: Fixed the filtering of monomorphic loci.
Bugfix: populations: Now preserving sample ordering in all outputs.
Bugfix: Fixed ICPC compilation.
2.0b - May 1, 2018
Feature: gstacks: Removed the assertion that the first basepair of each locus should
be part of a cutsite (now a warning).
Feature: gstacks: The reported effective coverage is now a more realistic weighted mean.
Bugfix: populations: Fixed STRUCTURE output being corrupted for some unordereds population maps.
Stacks 2.0 Beta 10 - Apr 10, 2018
---------------------------------
Feature: Improved gapped alignment for secondary reads in ustacks.
Feature: Improved populations performance.
Feature: Added enzymes Cac8I, MslI.
Feature: Made population maps more tolerant to spurious extra spaces and lines.
Feature: populations: VCF output: changed the format of the catalog locus field and made the column 1-based.
Feature: gstacks: increased haplotyping rates by adding a filtering of spurious SNPs step.
Bugfix: Fixed populations dramatric slow-down on datasets with more than several hundred samples.
Bugfix: Restored the NS/locus distribution in populations's distributions log.
Bugfix: Fixed the populations --radpainter export.
Bugfix: stacks-dist-extract: Fixed OSX compatibility.
Bugfix: Fixed breaking bug in populations --in-vcf mode filtering statistics.
Stacks 2.0 Beta 9 - Mar 12, 2018
--------------------------------
Feature: Cleaned up tags/snps/alleles/matches files. We removed the batch ID from ustacks and cstacks
output, and the deprecated log likelihood fields from ustacks and cstacks. We also removed
the chromosome/bp/strand fields as they are no longer used in these files.
Feature: Renamed gstacks output files that represent the new components of the catalog:
gstacks.fa.gz => catalog.fa.gz; gstacks.vcf.gz => catalog.calls
Feature: Removed read length restrictions from ustacks/cstacks/sstacks core, reads/loci can vary in
length throughout the pipeline.
Feature: Reimplemented PLINK export format for the populations program.
Bugfix: Updated to HTSLib 1.7; changed to a custom build system that will work with the Stacks build
system.
Bugfix: Made gapped alignments mandatory in ustacks, cstacks, and sstacks. Added check for frameshift
at 3' end of the read -- if found, a match is deferred to the gapped aligner.
Stacks 2.0 Beta 8 - Feb 03, 2018
--------------------------------
Feature: populations: Now calculated deviation from Hardy-Weinberg equilibrium at the SNP level
(using an exact test), and at the haplotype level (using Guo+Thompson's MCMC algorithm).
Feature: populations: Added an export type for FineRADStructure.
Feature: populations: Added the GQ/GL fields in the VCF SNPs output.
Feature: gstacks: Made the default behavior regarding paired-end reads more logical (in
reference-based mode --paired has been replaced with --unpaired).
Feature: gstacks: Added details about samples and coverages to the log outputs.
Feautre: Added enzymes NspI, BbvCI, fixed BfuCI.
Bugfix: corrected a major performance bottleneck in populations when smoothing population statistics
across the genome.
Bugfix: populations: The VCF output now preserves the input sample order.
Bugfix: gstacks: Fixed the handling of a rare special case in the PCR duplicates code.
Bugfix: gstacks: Fixed 100% being added to all per-thread timings.
Stacks 2.0 Beta 7 - Dec 29, 2017
--------------------------------
Feature: gstacks: Added an option to remove PCR duplicates based on insert
size (--rm-pcr-duplicates, plus the related --rm-unpaired-reads).
Feature: populations: Added a haplotype Genepop export.
Feature: populations: improved the help; changed the output names for SNP
files to 'populations.snps.EXT'; added option --no_hap_exports.
Feature: gstacks and populations: Clarified the logs; moved distributions
to a separate '.xlog' file and added script stacks-xlog-extract.
Feature: gstacks: Tweaked the help/interface; especially, replaced --spacer
with --suffix (for BAM directory input).
Feature: Added enzymes BfuI and HinP1.
Feature: Added option --inline_null to clone_filter.
Bugfix: gstacks: Fixed a typo preventing the paired reads from being merged.
Bugfix: populations: Fixed a segfault that occurred with some large datasets.
Bugfix: Made VCF outputs more standard compliant.
Bugfix: populations: Repaired --fasta_samples and --fasta_samples_raw.
Bugfix: populations: Fixed population aborting at the end of the run
when an export option was specified multiple times.
Bugfix: gstacks: Adjusted progression report for catalog asymmetry.
Bugfix: Fixed installation of stacks-integrate-alignments on MacOS.
Stacks 2.0 Beta 6 - Dec 02, 2017
--------------------------------
Feature: Implmented the VCF haplotypes output.
Bugfix: Corrected asset failure in populations when exporting data for genepop or structure output.
Stacks 2.0 Beta 5 - Nov 27, 2017
--------------------------------
Feature: Reimplemented structure, phylip, and phylip_var exports.
Bugfix: Tightened up the overlap algorithm to require 80% of overlapping sequence to be
aligned and of the aligned sequence, 80% must be identities.
Bugfix: Fixed segfault in gstacks when compiled with CLANG on OS X.
Bugfix: gstacks: Fixed how misphasings are reported.
Stacks 2.0 Beta 4 - Nov 07, 2017
--------------------------------
Bugfix: Continued improving overlap algorithm to join SE and PE contigs.
Bugfix: Improved build system to handle new timing functions in gstacks.
Stacks 2.0 Beta 3 - Nov 01, 2017
--------------------------------
Feature: Added output to populations describing mean PE contig size and mean number of
genotyped sites per locus, which reflects the current filtering paramters.
Feature: Improved the output of gstacks and populations.
Feature: Added script `stacks-integrate-alignments`.
Bugfix: made further improvements to the single-end/paired-end locus overlapping algorithm.
Bugfix: fixed all depths being null in populations' VCF output.
Bugfix: Numerically tweaked the marukilow model to remove a limit case.
Stacks 2.0 Beta 2 - Oct 19, 2017
--------------------------------
Feature: gstacks: Made it possible to read from multiple BAM files at the same time; modified the
interface accordingly.
Feature: gstacks: Parallelized the reference-based mode.
Feature: gstacks: Added various statistics & improvements to the log output.
Feature: gstacks: Improved how the forward & paired-end reads are merged (in denovo mode; no more trimming).
Feature: populations: Added code to calculate the overlap between RAD loci when a reference is available.
Feature: populations: Added VCF ouput (--vcf).
Feature: Updated the denovo_map.pl and ref_map.pl wrappers, samples must now be specified using --samples and --popmap.
Bugfix: Fixed three memory leaks in populations; improved reference-aligned batch logic.
Bugfix: Improved overlapping code in gstacks to merge more single and paired-end contigs together.
Bugfix: Now compiles on Apple OS X.
Bugfix: Fixed a bug that skewed the fixed-site (no-SNP) likelihood in the marukilow model.
Stacks 2.0 Beta 1 - Oct 09, 2017
--------------------------------
Feature: Paired-end sequencing data can be utilized fully. In particular, when the shearing-based
protocol is used, the software will assemble a local contig from the paired reads across
the population, possibly overlap it with the forward-reads region, then align all reads to the
assembled contig. This new approach also fully supports double-digest protocols.
Feature: Haplotype calling and diploidy-violation dectection now rely on a novel, more powerful algorithm.
Feature: SNP and genotype-calling now uses the diploid models of Maruki and Lynch (2017).
Feature: The rxstacks program has been replaced with the gstacks program, and there is no need to re-run
some of the earlier steps of the pipeline anymore.
Feature: The memory footprint of the populations program has been considerably reduced and can be scaled
for any size data set.
Feature: The reference-based pipeline has been simplified, and now only comprises two steps: gstacks and populations.
Feature: Added --null_inline mode to clone_filter (and process_radtags) for previously unseen type
of oligo combination.
Stacks 1.48 - Nov 20, 2017
---------------------------
Feature: Added HinP1I restriction enzyme.
Feature: Added --null_inline mode to clone_filter (and process_radtags) for previously unseen type
of oligo combination.
Stacks 1.47 - Sept 06, 2017
---------------------------
Feature: Improved populations's fasta output options (especially,
added a option to export locus consensus sequences).
Feature: denovo_map.pl and red_map.pl now stop if a component
of the pipeline fails.
Feature: Improved the output of denovo_map.pl and ref_map.pl.
Bugfix: Added a format check in Fasta/GzFasta to avoid a potential
segfault when working on FastQ files.
Bugfix: Fixed a bug in count_fixed_catalog_snps.py that could cause
overwrites when working with uncompressed files.
Stacks 1.46 - Apr 17, 2017
--------------------------
Feature: Added HaeIII enzyme.
Bugfix: Corrected memory leaks in rxstacks.
Bugfix: Corrected non-functioning --min_mapq parameter for pstacks.
Bugfix: Corrected segfault when combining a VCF input file to populations,
with genomic output and masking a restriction enzyme.
Stacks 1.45 - Feb 24, 2017
--------------------------
Feature: Tweaked the interfaces of most programs:
* cstacks and sstacks now accept a population map as input.
* process_radtags will now reuse the input directory name in its log file name.
* Reworked pstacks output.
* Batch ID now defaults to 1 in cstacks, and sstacks and other will try to guess
it from the contents of the given directory/catalog path.
* pstacks/ustacks/process_radtags will now try to guess file formats.
* Default (fallback) format in process_radtags/process_shortread is now gzfastq.
* pstacks: Substituted --max_clipped to --min_aln_pct.
* ustacks -r has become the default; --keep-high-cov reverses it.
* cstacks now checks for sample ID unicity.
* Updated help messages.
Feature: populations now logs the 'number of SNPs per locus' distribution.
Feature: Added mapping quality filter in pstacks (--min_mapq).
Feature: Added enzyme ApaLI.
Bugfix: populations: Corrected a VCF-related segfault (current use of VCF's GL field was
improper and was removed).
Bugfix: rxstacks: Corrected a bug that affected locus likelihood medians.
Bugfix: pstacks/ustacks: Corrected a bug that affected coverage standard deviations.
Bugfix: populations: Fixed parsing of option --sigma.
Bugfix: Fixed process_radtags writing fasta (instead of fastq) discard files when input
files were gzfastq.
Bugfix: kernel smoothing was not working correctly for Fis values (values were too negative).
Bugfix: fixed a regression for gapped alignments in cstacks that was causing a buffer overflow.
Stacks 1.44 - Oct 11, 2016
--------------------------
Bugfix: corrected an error in pstacks where '=' and 'X' symbols were not recognized properly in SAM/BAM
CIGAR strings.
Bugfix: corrected some typos in pstacks/populations help output.
Stacks 1.43 - Oct 05, 2016
--------------------------
Feature: added alignment controls to pstacks, allowing the program to discard secondary alignments
and to discard alignments where a significant portion of the read was not aligned (soft-masked).
Bugfix: corrected a very small memory leak in the gapped alignment code, found by Valgrind.
Feature: updated configure test to check if compiler can handle c++11 standard.
Bugfix: rxstacks was not generating model files.
Bugfix: corrected an uncaught exception in cstacks when processing gapped alignments. In some cases when a
multiple alignment had to be recomputed the initial CIGAR string was not parsed properly leading to the
catalog and query sequences coming out of sync in their length (which could throw the exception).
Feature: reduced memory usage in ustacks and pstacks by not retaining all reads from a collapsed locus.
Bugfix: corrected -V option for populations, which was causing a crash (although --in_vcf worked).
Stacks 1.42 - Aug 05, 2016
--------------------------
Feature: Added Csp6I restriction enzyme.
Feature: populations program is now able to calculate populations statistics using arbitrary VCF files
as input.
Feature: Upgraded to the latest release of HTSLib (1.3.1) for reading BAM files. Embedded the library
in the Stacks distribution to remove previous libbam dependency.
Feature: Added an output directory option to 'populations' (--out_path).
Feature: Added restriction enzymes BsaHI, HpaII, NcoI; corrected NdeI.
Bugfix: Made the VCF output by 'populations' more standard-compliant.
Bugfix: Some output files included 0-based genomic coordinates, changed them to 1-based.
Bugfix: Replaced populations IDs with populations names in 'populations' output.
Bugfix: Corrected a bug affecting clone_filter when input was non-gzipped paired-end data.
Stacks 1.41 - June 22, 2016
---------------------------
Bugfix: the kernel-smoothing procedure in populations (used for Fst, Pi, heterozygosity etc. smoothing)
was not functioning at sizes larger than the default size. A bug was creating incorrect weights for the
smoothing operation when the sliding window size was set to a large value causing the smoothing
window to have a maximum size after which increasing the size did not change the smoothing.
Bugfix: cstacks was reporting gapped alignments even when --gapped was not enabled. This affected
a small number of (mostly) confounded catalog loci.
Feature: Added the Csp6I restriction enzyme.
Stacks 1.40 - May 04, 2016
--------------------------
Feature: Changed process_radtags and process_shortreads to print FASTQ/FASTA headers using
"/1" and "/2" to represent the read number, instead of "_1" and "_2".
Bugfix: fixed a regression where allele depths were not being loaded due to the use of the new
*.models.tsv file. This file lacks the raw reads and therefore we can't count the raw stack depth
when running sstacks.
Bugfix: cstacks was calling errant SNPs in loci with a sample containing one gapped locus and
one ungapped locus matching the same catalog locus.
Stacks 1.39 - April 23, 2016
----------------------------
Bugfix: rxstacks was not adjusting reads/SNPs to account for alignment gaps. There was also an
bug in reading the input files.
Bugfix: denovo_map.pl and ref_map.pl were not processing parents/progeny properly.
Stacks 1.38 - April 18, 2016
----------------------------
Feature: denovo_map.pl and ref_map.pl now print depth of coverage for each sample. The ustacks
program now prints depth of coverage after each algorithm stage to see how each stage improves
(or not) the depth of coverage.
Feature: complete refactoring of denovo_map.pl and ref_map.pl. Separated computation from
SQL loading. Added auto creation/deletion of database. Enabled samples to be read from population
map instead of specifying them on the command line.
Feature: added Needleman–Wunsch algorithm to ustacks, cstacks, sstacks to provide for gapped
alignments. Includes --max_gaps and --min_aln_len parameters to contain crazy
alignments. sstacks now includes a CIGAR string describing the alignment to the catalog.
Feature: optimized ustacks for a 33% decrease in run time.
Feature: added new file, sample_X.models.tsv.gz, produced by ustacks and pstacks. Contains a subset
of the information in the sample_X.tags.tsv.gz file, allows for data to be loaded much faster in the
later stages of the pipeline, greatly speeding up run times.
Bugfix: added code to prevent populations from improperly reading SNP positions past the length of
a particluar locus (that is shorter than the catalog locus).
Bugfix: corrected bug in process_radtags when using inline barcodes on paired-end reads. The paired-
end reads were not being truncated uniformly.
Bugfix: corrected bug in populations where if enough empty files were fed into the program
it could place files in the wrong population or segfault.
Bugfix: corrected PHP files for exporting to include LnL filter.
Bugfix: corrected mappable markers filter in web interface.
Stacks 1.37 - Feb 24, 2016
--------------------------
Feature: converted PHP database code from MDB2 to MySQLi. MDB2 is no longer a
prerequisite for installing Stacks.
Stacks 1.36 - Feb 18, 2016
--------------------------
Feature: Added the BfaI, BspDI, AseI, and AciI restriction enzymes to process_radtags.
Feature: Changed the way denovo_map.pl and ref_map.pl run sstacks. It is now set to run
sstacks once for all samples, instead of one time per sample. Should provide a significant
speed-up.
Bugfix: corrected error in pstacks when handling long reads with complex SAM/BAM alignments.
Bugfix: fixed memory leak in sstacks when more than one sample file was specified.
Bugfix: corrected error in clone_filter causing it to fail when processing gzipped data
without a random oligo attached.
Bugfix: corrected error when reading gzipped FASTA files and the last sequence of the file
was improperly doubled in length.
Stacks 1.35 - Sept 09, 2015
---------------------------
Feature: Added --retain_header flag to process_radtags/process_shortreads which will keep
the unmodified FASTQ header in the output. This allows clone_filter/process_radtags/
process_shortreads to be run in different sequences and more than one time.
Feature: Added --treemix to the populations program, allowing SNPs to be output in
TreeMix format.
Feature: Added --phylip_var_all to the populations program. This option outputs the full
sequence from each variable locus, encoding polymorphisms using IUPAC notation.
-This option will also output a file containing the coordinates of each RAD locus so they
can be input to phylogenetic software (such as RAxML) to partition each RAD locus out
and then build the phylogenetic tree independently for each partitioned locus.
Feature: Added the AgeI restriction enzyme.
Feature: refactored clone_filter to handle random oligo sequences used as inline/indexed
barcodes to identify and discard PCR duplicates.
Bugfix: added code to process_radtags/process_shortreads to handle cases when data writes
fail due to a filled disk or other error conditions.
Bugfix: kmer_filter was not handling gzipped FASTQ files properly when filtering rare kmers.
Stacks 1.34 - July 26, 2015
---------------------------
Bugfix: fixed phylip output to again include nucleotides from subsets of the full set
of populations.
Bugfix: private alleles were being associated to the incorrect population at a particular
locus (the counts and summary statistics of private alleles were not affected).
Stacks 1.33 - July 22, 2015
---------------------------
Bugfix: Corrected the second-stage filtering of the populations program to properly
respect the -p flag.
Bugfix: Corrected the display of individual samples in the web interface (tags.php file).
Stacks 1.32 - June 18, 2015
---------------------------
Bugfix: Updated the Phylip output to reflect the changed meaning of 'fixed' as
determined in the PopSum::tally() function.
Stacks 1.31 - June 17, 2015
---------------------------
Bugfix: site-level filtering in the populations program was not working correctly
when dealing with sites that were fixed within populations but variable among
populations. The code in the PopSum::tally() function was not correctly identifying
sites as not fixed in these cases causing them to be incorrectly filtered out.
Bugfix: --write_random_snp was causing a segfault in the populations program in
some cases.
Feature: changed the default setting for the -n option of cstacks (number of fixed
differences allowed between loci) to 1 (at the request of
Josie Paris
).
Bugfix: made some tweaks to improve layout in the web interface.
Bugfix: single-end reads, with paired barcodes (inline/index) were not being handled
properly, resulting in a segfault.
Bugfix: process_radtags was allowing a non-null barcode type to be specified without
specifying a barcode file, which caused a segfault.
Feature: exposed kmer length setting in ustacks and cstacks. This allows the kmer
length used for sequence matching to be set manually. While this can result in some
missed matches (there is a trade off between kmer length and sequence length when
searching for matches between the two) it also allows the algorithm to run at faster
speeds.
Feature: Changed default database engine type to be excplicitly MyISAM. Previously
Stacks just used the default which at one time was MyISAM but has recently changed in
many systems to be INNODB. Using MyISAM should provide much faster imports of data
and ultimately use less disk space (as the space is reclaimed when databases are
deleted).
Stacks 1.30 - May 07, 2015
--------------------------
Feature: sstacks can now accept multiple sample files at a time, saving run
time by only processing the catalog once.
Feature: changed batch_X.sumstats.tsv file so the P/Q alleles are always
presented in the same order in each local population (according to the
overall frequency of the allele across all populations). This will sync results
with the VCF exports but will sometimes cause the frequency of p in the local
population to be less than 0.5 (up until now the frequency of p has always
represented the most frequenct allele in the local population).
Feature: added an maximum observed heterozygosity filter to populations program.
Bugfix: Fis values in batch_X.sumstats_summary.tsv were incorrect (although raw
values in batch_X.sumstats.tsv were correct).
Bugfix: corrected the allele depth output in the VCF export to follow defacto
standards used by other programs.
Bugfix: in some cases loci were sneaking past the --write_single_snp directive in
in populations (due to interactions with pruning out SNPs that fail the MAF filter).
Feature: Updated the Stacks web interface. The web app is now almost 100% dynamic
(parts of the page are draws on demand instead of fetching new, full pages from
the server) using local javascript to draw the population view of genotypes, summary
statistics, and the view of raw stacks. The web app uses asynchronous AJAX queries
that trade data encoded in JSON to fetch the necessary data for dynamic display.
Feature: added DdeI, RsaI, AluI restriction enzymes to process_radtags.
Bugfix: sstacks could generate extra matching haplotpyes in a very small number of
cases.
Stacks 1.29 - Mar 21, 2015
--------------------------
Feature: added the --ordered_export option to the populations program. For the VCF,
GenePop, and Structure exports, if this option is specified, only one copy of each
SNP is exported in the case where one nucleotide position is covered by more than one
RAD locus. Most useful for ddRAD data.
Feature: VCF export now includes individual allele depths for each SNP call.
Feature: improved the filtering logging code in populations, if the --verbose flag
is specified, a reason is provided for each pruned site, or each removed locus.
Bugfix: PHASE output was broken in the populations program. SNP pruning/filtering
code did not update the catalog copies of the alleles after pruning which are needed
by the PHASE output code.
Bugfix: adjusted the filtering code in populations to not exclude fixed loci.
Bugfix: removed extra tab from ID line for Structure export.
Bugfix: fixed issue in genepop output that may have overfiltered some loci.
Bugfix: fixed small problems with --write_single_snp/--write_random_snp in the
populations program. Some polymorphic loci were erroneously being omitted.
Stacks 1.28 - Mar 06, 2015
--------------------------
Feature: added a second barcode distance to process_radtags/process_shortreads.
This allows you to specify two distances for recovering barcodes if you are using
combinatorial barcodes (e.g. a 12bp barcode inline on the single-end read plus a
6bp index). I have changed the meaning of the parameter from "distance between
barcodes" to "number of allowed mismatches when correcting barcodes."
The --barcode_dist parameter is now --barcode_dist_1, and --barcode_dist_2
was added.
Bugfix: the process_shortreads/process_radtags programs were trimming sequence
as if an inline barcode was specified, even when it was an index barcode and no
sequence should have been trimmed.
Bugfix: the process_shortreads program was outputting FASTA even when FASTQ was
requested due to not handling gzipped outputs properly.
Bugfix: fixed segfault in populations that could occur when using a whitelist that
contained loci that were being filtered out due to -p/-r constraints.
Stacks 1.27 - Feb 25, 2015
--------------------------
Bugfix: the minor allele frequency filter and the proceny limit filter were not working
properly in all cases with the other filters.
Bugfix: barcode length (href->inline_bc_len) was not being correctly set for single-end,
inline line barcodes of variable length.
Stacks 1.26 - Feb 23, 2015
--------------------------
Bugfix: if you are running non-compressed data, then version 1.25 broke the parsing code.
If your data were zipped (or a BAM file) when it went through pstacks/ustacks, then
there was no bug.
Feature: refactored the filtering code in the populations program to add a second
filtering step. In previous versions the -r (sample limit) and -p (population limit)
were applied on the basis of the entire RAD locus. This could lead to situations
where a RAD locus remained in the data set while one or more of the individual SNPs
on that locus were missing data and were below the -r or -p limits. Now, the filters
are applied to individual SNPs after the filters are applied to the RAD loci. This
greatly affects the -r (sample limit) filter with more SNPs being pruned out, as well
as the -a (minor allele frequency filter) such that all SNPs below the MAF are
pruned fully from the data set and will not appear in any statistical results or
downstream exports.
Feature: added restriction enzyme kpnI.
Feature: added code to check for the existence of the loci and SNPs provided in a
whitelist.
Stacks 1.25 - Feb 17, 2015
--------------------------
Feature: added support for unaligned BAM files for process_radtaags and
process_shortreads. The two programs can now read paired data that is interleaved in
a single file (which is required to support paired-end data in BAM format).
Feature: Haplotypes can now be output in VCF format from the populations program using
the --vcf_haplotypes option.
Feature: added --fasta_strict option to populations program. Will output full sequence for
each individual at each haplotype at each locus, but only for biologically plausible loci.
It won't output loci with more than two haplotypes and will output single haplotypes twice,
once per allele.
Feature: Changed the sumstats/hapstats files to output a one-based genome base pair position
so it matches other export formats.
Bugfix: fixed problem with gzipped files where last line of file was not read properly
causing the program to output an erroneous error message.
Bugfix: The FASTA output from the populations program was reporting the internal value
(zero-based index) for the basepair position of each read (the first nucleotide of the
cutsite) causing an off-by-one error for all reads and reads on the negative strand had
the coordinate for the cutsite end of the read (right-most end) reported instead of the
standard left-most end.
Bugfix: the log likelihood filter was not working properly in export_sql.pl, causing many
genotypes to be excluded during export.
Bugfix: process_radtags was not looking for the paired-end RAD cutsite in the proper location
when dealing with double-digest, inline/index barcoded reads.
Feature: added initial, internal support for merging and phasing loci that overlap at a
restriction enzyme cut site.
Feature: code now prints program version and generation date to all internal Stacks files.
Stacks 1.24 - Jan 07, 2015
--------------------------
Feature: added restriction enzyme ecoRV.
Bugfix: fixed segmentation fault in process_radtags/process_shortreads when resizing sequence
and phred internal buffer sizes.
Stacks 1.23 - Dec 12, 2014
--------------------------
Bugfix: Fixed a segfault bug in process_radtags where the process_barcode function returned
prematurely when one barcode was correct and one was incorrect in paired cases.
Bugfix: fixed compiler warnings when building with CLANG.
Stacks 1.22 - Dec 08, 2014
--------------------------
Feature: process_radtags and process_shortreads now support variable barcode lengths. In
process_radtags sequences will automatically be trimmed to keep stacks a uniform length
with the variable barcode lengths.
Feature: a filename can now be specified in the barcodes file for process_radtags and
process_shortreads. When a filename is specified, process_radtags will write data to
this filename instead of a filename made up of the barcode.
Feature: process_radtags and process_shortreads will now output gzipped files if
provided gzipped inputs or if requested using the '-y' option.
Feature: Added SacI and BgIII restriction enzymes.
Bugfix: Tightened up parsing of FASTQ ID field to prevent a buffer overrun (and subsequent
segfault) in FASTQ headers that look like the Illumina format but are malformed.
Bugfix: Fixed GenePop output of populations program as last locus on second line was missing
commas if more than one SNP was present at that locus.
Bugfix: -R option to retain unused reads was not being recognized by ustacks.
Bugfix: changed populations to record program run parameters and execution time to log file.
Bugfix: corrected Makefile.am to include Sparsehash compile flags for process_radtags.
Bugfix: corrected load_radtags.pl so as not to try and load the population ID as a
number to the samples table (and instead as a string).
Stacks 1.21 - Oct 02, 2014
--------------------------
Feature: Added the XbaI, BstYI, and XhoI restriction enzymes.
Feature: Added ability to specify column position in whitelist along with locus ID in
populations program. This allows for specific SNPs within specific loci to be processed.
Feature: In populations program, changed implementation of --write_single_snp to create
an internal whitelist from the first SNP in each catalog locus. Added a new command
line option, --write_random_snp to select a single, random SNP per RAD locus using the
same internal mechanism.
Feature: Added HZAR, Hybrid Zone Analysis in R output to populations program.
Bugfix: Added code in populations program to handle cases where a haplotype contains one
or more uncalled bases (Ns). These haplotypes are now excluded from haplotype-based
statistical calculations.
Bugfix: In Phi_st/ct/sc calculations of populations program, total population count was
not adjusted downward when one of the populations dropped out of the analysis at a
particular locus in the all-populations, haplotype-based AMOVA calculation (batch_X.phistats.tsv).
Bugfix: "All positions" Fis measure in batch_X.sumstats_summary.tsv file too negative due
to internal logic error.
Bugfix: updated queries in index_radtags.pl to account for new 'type' variable in SNPs tables.
Stacks 1.20 - Jul 29, 2014
--------------------------
Synced corrections module branch with main Stacks branch.
***
The internal formats of the *.tags.tsv, *.snps.tsv, and *.matches.tsv files have changed
and therefore version 1.20 programs cannot be used on earlier generated data sets. However,
the convert_stacks.pl script is included in this release to convert an older data set into
the new formats.
***
Feature: Implemented new haplotype trimming algorithm for rxstacks.
Feature: new script, convert_stacks.pl, to convert previous Stacks files to new format.
Feature: Modified VCF output to include likelihood values from heterozygous and homozygous
SNP model calls.
Feature: added log likelihood filter to genotypes and populations programs and to web interface.
Feature: Added SpeI restriction enzyme to process_radtags.
Feature: Modified Beagle output formats in populations program to be population-specific and
not to include monomorphic nucleotide positions.
Stacks 1.19 - Apr 23, 2014
--------------------------
Feature: the populations program now calculates Fst' and D_est on haplotypes between all pairwise
populations. Our implementations are based on:
Bird, Karl, Smouse & Toonen. (2011) Detecting and measuring genetic differentiation.
D_est: Jost. (2008) Gst and its relatives do not measure differentiation.
Fst': based on modifying the AMOVA implementation from Excoffier, Smouse, & Quattro (1992).
Feature: we have refactored the populations program to use a common framework for kernel smoothing
and bootstrapping. This has allowed us to add smoothing and bootstrapping to all statistics calculated
by the populations program: pi, Fis, Fst, Fst', D_est, Phi_st, Phi_ct, Phi_sc, Haplotype diversity,
gene diversity.
Feature: we have implemented fine-grained control of bootstrapping by providing flags to turn on
bootstrapping for each group of population statistics, as well as providing a bootstrapping whitelist
allowing only certain loci to be included in the bootstrapping calculations.
Stacks 1.18 - Apr 04, 2014
--------------------------
Feature: we now use chi squared segregation ratios to detect missing alleles in parental mapping markers.
in F1 crosses (CP map type). We can now map ab/a- and -a/ab as ab/--, and --/ab markers; we can map
ab/c- and -c/ab markers as ab/cd markers; we can map aa/b- and -a/bb markers as ab/-- and --/ab markers.
Feature: in F1 crosses we are now mapping ab/cc and cc/ab markers as ab/-- and --/ab markers.
Feature: reworked genetic map display of web interface. Included chisq p-value from segregation distortion
test as a filter.
Feature: implemented measure of segregation distortion in genotypes program based on chi square test of
genotype counts. Removed deprecated measure of F, inbreeding coefficient, replaced it with segregation
distortion.
Bugfix: corrected calling of markers in genotypes program. When a whitelist with a small number of markers
is specified, some of the parental IDs could be missed, causing markers not to be called and hence dropped
from the analysis.
Bugfix: changed genotype mappings for generic map types to make certain non-biologically plausible genotype
combinations illegal.
Bugfix: fixed compilation issues when using Google's SparseHash (thanks to khuck@cs.uoregon.edu for the patch).
Stacks 1.17 - Mar 26, 2014
--------------------------
Bugfix: Added #ifdefs to deal with missing functions in older versions of zlib.
Stacks 1.16 - Mar 25, 2014
--------------------------
Feature: added haplotype counts for each population and locus to the batch_X.hapstats.tsv file.
Feature: haplotype F statistics are now calculated for the whole set of populations (one analysis
of variance calculation for all populations), and also as a set of pairwise calculations to mirror
the existing Fst calculations.
Bugfix: fixed small bug in calculation of MSD(Total) component of Phi_st (haplotype F statistics).
Bugfix: fixed bug in parsing of populations maps when using strings for population identifiers.
Bugfix: kernel-smoothing not correct for haplotype/gene diversity.
Stacks 1.15 - Mar 15, 2014
--------------------------
Bugfix: fix various bugs related to gzip support.
Stacks 1.14 - Mar 14, 2014
--------------------------
Feature: Stacks files are now kept in gzipped format if FASTQ data is fed into pipeline gzipped or as a BAM.
Bugfix: fixed some compile bugs on OSX Mavericks.
Stacks 1.13 - Feb 24, 2014
--------------------------
Feature: We have implemented the first set of haplotype-level population genetics statistics. Specifically,
we are now calculating gene diversity and haplotype diversity (pi) for each locus, as well as F statistics
for haplotypes including, Phi_st, Phi_ct, and Phi_sc, which are calculated using Analysis of Molecular
Variance (AMOVA):
Excoffier, Smouse, & Quattro, (1992). Analysis of molecular variance inferred from metric distances
among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics.
Data can be analyzed as populations of individuals (the previous default) and now using populations
of individuals, and groups of populations.
Feature: If a reference genome is available, haplotype F statistics can also be kernel-smoothed.
Feature: populations in the population map can now be specified as text strings or numbers. Groups
of populations can now be specified by adding a third column to the population map for each individual
and listing the group they belong to (again as a text string or number).
Bugfix: allow batch IDs of 0 in populations and genotypes.
Bugfix: in populations, changed VCF output to be ordered by basepair.
Bugfix: in populations, change value of expected homozygosity to be set to 1 - expected heterozygosity
instead of 1 - Pi. Pi (computed as [1 - ((p choose 2) + (q choose 2) / (n choose 2))] and expected
heterozygosity (2pq) can produce sligthly different estimates resulting in exp het + exp hom != 1.
Stacks 1.12 - Jan 21, 2014
--------------------------
Bugfix: accidentally broke gzipped FASTQ support through a typo in gzFastq.h.
Stacks 1.11 - Jan 09, 2014
--------------------------
Feature: changed build to work properly with g++ and clang, which is the native compiler on
Apple's OS X.
Feature: Added NheI restriction enzyme.
Bugfix: changed logging in denovo_map.pl/ref_map.pl to write outputs from Stacks programs continuously
instead of waiting until the program completed to write output to log file.
Bugfix: corrected parsing of population map for gzipped input files for denovo_map.pl.
Stacks 1.10 - Dec 10, 2013
--------------------------
Feature: Added phased output for PHASE and Beagle. The phased output writes multiple SNPs
in a single RAD locus as an already phased haplotype, leaving PHASE and Beagle to only phase
between these haplotypes, instead of having to re-phase SNPs from within the same RAD site.
Bugfix: corrected the SNP genotype output for Beagle.
Bugfix: Corrected PHP warnings; enabled scrolling in catalog.php for iframes.
Bugfix: allele percentages from ustacks were off since ustacks was changed to load/unload
read IDs from disk (Stacks 0.99995). Only the calculation of the percentages was affected,
not the underlying algorithms.
Stacks 1.09 - Oct 30, 2013
--------------------------
Feature: added export support for F2 and backcross map types for Onemap to genotypes.
Feature: added EaeI, ClaI, and TaqI restriction enzymes to process_radtags.
Feature: changed populations bootstrap to use AMOVA Fst.
Feature: added bootstrap whitelist to populations, so users can restrict the loci that
are bootstrapped to a particular set (e.g. on a single chromosome).
Bugfix: modified PHASE output so that SNPs are ordered properly. Previously, although
RAD loci are ordered properly, some individual SNPs between RAD loci could still be output
out of order.
Bugfix: corrected onemap CP output so that B3.7 markers are output as "ab", not "2ab".
Stacks 1.10.Beta1 - Sept 30, 2013
---------------------------------
Feature: completed implementation of rxstacks.
Bugfix: when merging a homozygous locus into the catalog, if homozygous allele conflicted
with existing catalog SNP alleles, new allele was not added to SNP object (but was added to
the allele list).
Bugfix: found small memory leak in cstacks - old SNP objects were not being freed when new
SNPs were merged into the catalog.
Bugfix: empty alleles were being output to the batch_X.catalog.alleles file by cstacks. Did
not affect the function of the program.
Stacks 1.08 - Sept 24, 2013
---------------------------
Feature: added a FASTA output to populations to output the full locus sequence
for each allele at each sample locus, applying any filters or whitelists supplied to
populations.
Stacks 1.07 - Sept 23, 2013
---------------------------
Bugfix: updated process_radtags to drop reads shorter than length
limit when read trimming turned on.
Bugfix: corrected build failures on Mac OS X due to Samtools' bam.h header conflicting
with Stacks' Bam.h header when building on OS X's case insensitive file system.
Feature: changed process_radtags to drop reads already shorter than limit if sequence
truncation turned on. You can also specify the read length limit to drop reads if your
data have already been trimmed.
Bugfix: Updated VCF ouput, missing genotypes now reported as "./." instead of "."
Bugfix: Updated VCF ouput, alleles reported on the negative strand are now complemented
so their positive strand conterparts are reported and will align aginst a reference genome.
Bugfix: Updated VCF ouput, "reference allele" is now always reported as most frequent allele.
Stacks 1.06 - August 28, 2013
-----------------------------
Bugfix: Illumina FASTQ header specifying read pair could override internal enumeration
of read pair if paired-end data was fed in as a single-end file.
Bugfix: corrected locus starting base in reference-aligned data.
Feature: refactored sort_read_pairs.pl to process input files one at a time, without retaining
them in memory. The program should now be able to handle an arbitrary number of samples.
Feature: sort_read_pairs.pl can now read gzipped files directly.
Stacks 1.05 - August 17, 2013
-----------------------------
Bugfix: adapter filtering code in process_radtags/process_shortreads bit rotted and was not
properly functioning. Switching from deprecated hash function to TR1 hash broke the expected
hashing behavior for char *.
Bugfix: modified process_radtags/process_shortreads to handle single adapters when processing
paired-end data (previously you had to specify two adapters for paired data).
Bugfix: corrected barcode-specific counters in process_radtags/process_shortreads. Overall counts
were correct but counts for barcodes were off due to shuffling of code that happened with support
of combinatorial barcodes.
Stacks 1.04 - July 25, 2013
---------------------------
Bugfix: process_radtags was not properly handling index_index and inline_inline barcode types.
Bugfix: the hindIII restriction enzyme sequence was incorrectly specified in renz.h.
Bugfix: ustacks wasn't properly removing file suffix when gzip files are processed.
Stacks 1.03 - June 28, 2013
---------------------------
Bugfix: non-barcoded data were not being handled properly by process_radtags/process_shortreads.
Stacks 1.02 - June 24, 2013
---------------------------
Bugfix: single-end barcode, double-digested data were not being handled properly by
process_radtags causing a crash.
Feature: added support for PLINK and Beagle output files from the populations program.
Feature: Modified the minor allele frequency (MAF) filter to remove polymorphic nucleotide
SNPs from Stacks output on a per-population basis. So, if a second allele is present
at a frequency below the MAF, that nucleotide site is not output (although other sites
at the same RAD locus could still be output).
Bugfix: Tri-allelic loci were being output into the STRUCTURE, GENEPOP and PHASE
output (but not in sumstats or Fst).
Stacks 1.01 - June 07, 2013
---------------------------
Bugfix: an off-by-one error was preventing haplotypes from being verified by sstacks
if a SNP occurred in the last position of the read. This could cause tags to fail
to match to the catalog if there is a SNP in the final position.
Stacks 1.0 - June 06, 2013
--------------------------
Feature: added XbaI and BamHI restriction enzymes to process_radtags.
Feature: added code to output genotypes in PHASE/fastPHASE format.
Feature: extended combinatorial barcodes support so one can process single-end data
that contains both an inline and indexed barcode.
Feature: added command line option and supporting code to cstacks to allow samples
to be added to an existing catalog.
Feature: refactored command line handling in denovo_map.pl and ref_map.pl to be much
more flexible. Arbitrary command line options can now be passed to particular pipeline
programs using the -X flag.
Feature: for genetic maps, catalog may now be constructed out of mulitple parents,
genotypes is smart enough to cross check the parents used to construct the catalog
against those submitted to genotypes for producing a map. Will allow for a single
catalog to be used across a series of crosses so all maps share the same catalog IDs.
Feature: added option to genotypes to import manual corrections exported from Stacks
SQL database.
Feature: added --log_fst_comp option to populations to log components of the Fst
calculations to a file for debugging / testing purposes.
Bugfix: corrected handling of files in kmer_filter. Adding support for gzipped files
broke file handling in some cases.
Stacks 0.999991 - May 14, 2013
------------------------------
Feature: changed populations to use AMOVA Fst for batch_1.fst_summary.tsv file. Previously
it used the Binomial Fst.
Bugfix: If --write_single_snp not specified, Structure output was not naming loci properly (it
was naming each SNP from the same RAD locus using the same ID, instead of differentiating each
SNP in each RAD locus).
Feature: Added Sau3AI and SexAI restriction enzymes. Fixed bug in specificaion of MseI, MspI
enzymes.
Bugfix: changed VCF and Fst code in populations to output SNPs from reads aligned to the
negative strand on a reference genome correctly.
Stacks 0.99999 - May 06, 2013
-----------------------------
Bugfix: process_shortreads/process_radtags not working with non-barcoded data.
Stacks 0.99998 - May 01, 2013
-----------------------------
Feature: Added option to sort_read_pairs.pl to output FASTQ if desired.
Bugfix: make sort_read_pairs.pl understand new file naming scheme.
Feature: added mseI, mspI restriction enzymes to process_radtags.
Bugfix: corrected sphI cutsite sequence in process_radtags.
Bugfix: stopped "uninitialized value" errors in export_sql.pl when marker type is
undefined for a particular map.
Stacks 0.99997 - April 01, 2013
-------------------------------
Bugfix: paired barcode could become uninitialized on second pair of files in
process_radtags/process_shortreads causing all barcodes to mismatch. Made Read
class explicitly initialize everything.
Stacks 0.99996 - March 24, 2013
-------------------------------
Feature: major overhaul of the process_radtags / process_shortreads programs to support
combinatorial barcodes and double-digested data. Programs now support a mixture of
barcodes from single-end inline or index barcodes, to mixtures of inline/index barcodes.
1) changed naming scheme for process_radtags/process_shortreads output files for
paired reads. Changed file suffix to properly be ".fq" or ".fa", with paired-reads named
sample_XXX.1.fq and sample_XXX.2.fq instead of the previous ".fq_1" and ".fq_2".
2) Paired-reads remain synced in output files, with sinlgetons written to
sample_XXX.rem.1.fq and sample_XXX.rem.2.fq.
2) changed Phred+33 to be the default encoding scheme (previously was the now
deprecated Phred+64)
3) Combinatorial barcdoes are specified as --inline_index or --inline_inline among a
number of other supported possibilities. Barcodes are listed in the barcode file as either
a single column or two, tab-separated columns.
4) Two restriction enzymes can now be specified via --renz_1 and --renz_2 to have the
program check (and correct) the restriction enzyme cut site on the first and second read
respectively.
5) programs now properly ignore files starting with "." which is required for Mac
OS X's ".DS_Store" files and for "." and ".." on Linux.
Bugfix: processing paired-end data with process_radtags could incorrectly alter the first
few nucleotides of the paired-read when correcting barcodes.
Bugfix: two regressions were fixed in process_shortreads causing all reads to be
improperly trimmed.
Bugfix: VCF output did not include sites fixed within and variable among populations.
Bugfix: changed the parsing code to accept a wider range of Illumina named, paired-end
files in process_radtags/shortreads.
Bugfix: gzipped files were not read properly in process_radtags/shortreads when a directory
was specified with -P.
Bugfix: setting secondary read distance to 0 in ustacks (-N) was ineffective.
Bugfix: changed the PHP code to remove 'Strict Standards' warnings and a few other warnings.
Thanks to Yue Yu for tracking down the proper changes to avoid the warnings.
Stacks 0.99995 - February 19, 2013
----------------------------------
Feature: added support for using Google's Sparsehash Object: http://code.google.com/p/sparsehash/
If enabled at compile time, this object will replace all the hash maps with Google's sparsehash
saving significant memory.
Feature: removed the -S command line option from cstacks and sstacks. These programs now read this ID
directly from the Stacks input files.
Feature: altered ustacks to no longer store FASTQ/FASTA IDs from input files in memory to lower
memory usage. Instead, an integer representing the read is stored and the IDs are read back in from
disk just before results are written.
Feature: added the '--write_single_snp' option to populations. When writing Genepop or Structure files
this option will cause populations to write just the first SNP per locus to the file, avoiding potential
problems with linked SNPs originating from the same locus.
Feature: compressed the Hval/Stack/Rem objects to remove convenience integer variables to save memory.
Feature: updated Stacks programs to use the newer TR1 unordered_map hash object instead of the
deprecated SGI hash_map object.
Bugfix: fixed a memory leak in cstacks in which not all of the Locus Class elements was being properly
freed (only the SNP objects were being freed).
Bugfix: Added code to denovo_map.pl/ref_map.pl to remove from the logfile the 'counter' lines that
printed when initially loading radtags data.
Stacks 0.99994 - February 12, 2013
----------------------------------
Bugfix: process_radtags/process_shortreads, when adding support for reads of different length, I
clobbered the sequence truncation option. Fixed this regression.
Bugfix: the kernel smoothing algorithms for calculating Fst, Pi, and Fis could sometimes segfault
as some RAD sites can overlap. Added code to find and describe overlapping RAD sites and report these
to the user.
Stacks 0.99993 - January 30, 2013
---------------------------------
Feature: process_radtags/process_shortreads/ustacks can now read gzipped Fasta/Fastq input files.
Feature: ref_map.pl/pstacks now supports the use of BAM alignment files. This feature is optional and must
be enabled during compilation. It requires the Samtools library to be installed.
Bugfix: When using referenced aligned data, soft-masked alignments (Ns) were getting imporperly injected into
the SNP models, which would call them as Homozygous Ns, and this data would eventually be passed to the
summary statistics in populations, which would make errant Fst calculations.
Bugfix: In rare cases, sequences aligned to the negative strand had their base pair positions slightly off,
this could cause a segfault during populations' kernel-smoothed Fst calculations.
Bugfix: In populations, fixed a rare, infinite loop condition in Fisher's exact test for Fst calculations.
Could occur due to a floating point rounding error when calculating allele frequencies for Fst calculation.
Stacks 0.99992 - January 8, 2013
--------------------------------
Bugfix: floating point command line options were not being processed correctly and may have been
truncated.
Stacks 0.99991 - December 17, 2012
----------------------------------
Feature: process_shortreads and process_radtags can now filter for adapter sequence in raw data, trimming
(process_shortreads) or discarding (process_shortreads/process_radtags) it. Mismatches to the adapter
sequence are allowed to accomodate for sequencing error.
Bugfix: added --merge flag to process_shortreads/process_radtags to handle regression where unbarcoded
data should be merged together into single output files.
Bugfix: code in cstacks to characterize differentially fixed SNPs was only running with -n > 0, but
should also run by default if -g is specified.
Feature: made automated correction thresholds for the genotypes program accessible from the command
line, including --min_hom_seqs, --min_het_seqs, and --max_het_seqs options.
Feature: refactored clone_filter to be more functional. Now can output sequences in FASTA or FASTQ
(FASTA will save memory). Keeps sequence headers intact, can capture discarded reads, and prints
a distribution of the number of cloned read pairs.
Bugfix: Remainder reads weren't being written properly as the file handles weren't properly closed.
Bugfix: Processing paired reads with process_radtags/process_shortreads was not functioning correctly,
barcode was not being transferred properly from P1 to P2 read. Regression introduce Aug 21, 2012.
Feature: added support for OneMap CP map export in genotypes.
Bugfix: Fixed some bugs in pstacks/ustacks command line processing involving --alpha and --model_type.
Bugfix: several bugs in the exact and approximate bootstrap algorithms were corrected. These algorithms
are now robust.
Bugfix: Added code to ensure command line IDs are in fact integers.
Bugfix: fixed nucleotide positions were not being tallied across populations properly resulting in an
incorrect value for number of sites and percent polymorphic sites in the sumstats_summary file.
Bugfix: pstacks could identify a locus that despite having SNPs would have no haplotypes generated.
This would late cause sstacks to segfault. Added code in pstacks to blacklist these loci and code
in sstacks to catch this case and not segfault, now will print a warning.
Stacks 0.9999 - October 03, 2012
--------------------------------
Feature: two bootstrapping procedures have been introduced into the populations program to
determine the statistical significance of kernel smoothed windows. These algorithms are controlled
by the --bootstrap and --bootstrap_reps command line options.
Feature: summary summary statistics are now written for all populations, giving the mean, variance,
and standard error for each of the population-specific summary stats. In addition, private alleles
are identified and marked in the sumstats file, and summarized across populations. Number and
percent of polymorphic loci are also reported. The actual variable nucleotides at each site are now
reported in the sumstats file.
Feature: the populations program can now generate kernel-smoothed values for Fis and Pi, in addition
to the current support for Fst.
Feature: the populations program can now output SNP data for use in the program Structure.
Feature: various sections of the populations program have been parallelized.
Feature: the populations program can now output SNP data in the Phylip file format. If --phylip is
specified, the populations program will identify SNPs that are fixed within populations, but variable
between populations and output these in a Phylip file. This file can then be fed into any phylogenetics
program, such as PhyML. This feature is equivalent to the analysis done in Emerson, et al., 2010. In
addition, if the --phylip_var flag is specified as well, variable sites within populations are encoded
into the Phylip file using standard alternative nucleotide encodings.
Feature: for ustacks/pstacks, the alpha significance level can now be specified on the command line.
Specifying --alpha to ustacks or pstacks will set the chi square significance level to determine
whether a heterozygous or homozygous model call is statistically significant. Legal values of alpha are
0.1, 0.05 (the previous default), 0.01, or 0.001.
Feature: for ustacks/pstacks, a new bounded SNP calling model has been introduced, allowing limits to
be set on the error rate. This model allows the calling of SNPs to be affected by prior knowledge
as to how likely polymorphism is in the data set. This behavior is controlled by the --bound_low and
--bound_high parameters to ustacks and pstacks.
Feature: additional sections of ustacks has been parallelized. In addition, stack merging has been
changed to occur in a single step (instead of in rounds as done previously).
Feature: the deleveraging algorithm in ustacks has been replaced with a simple algorithm
based on a minimum spanning tree. A new parameter has been introduced, --max_locus_stacks,
which controls the number of stacks allowed to be merged together into a single locus. Loci that
contain more than --max_locus_stacks stacks are set aside and not added to the catalog later on.
Feature: export_sql.pl now has two depth parameters, allele and locus depth, allowing for the filtering
of loci based on either one.
Feature: added a 'dry run' flag (-d) to denovo_map.pl and ref_map.pl to allow the pipeline to be tested
to see what it would execute, before actually executing any programs.
Bugfix: problem with the FASTA parser fixed (it was introduced with fixes to handle windows-style files).
Bugfix: sample counts where off in batch_*.haplotypes.tsv file generated by populations program.
Stacks 0.9996 - August 24, 2012
-------------------------------
Bugfix: fixed significant memory leak in Kmer hashing for both ustacks and cstacks. Results in an
approximately 3.4x reduction in memory use for cstacks, and an approximately 1.6x reduction in
ustacks.
Feature: process_radtags and process_shortreads can handle non-Illumina FASTQ headers (any generic FASTQ type).
Feature: process_radtags can process data without barcodes.
Feature: process_radtags and process_shortreads can handle Illumina barcodes, when the barcode is not
inline but is instead provided in the FASTQ header.
Bugfix: Corrected the behavior of the '-m' parameter to populations and genotypes. It is meant to apply
to the total depth of a stack at a locus, but was instead being applied to the depth of each allele at
each locus.
Feature: process_radtags and process_shortreads can now automatically discard reads marked as
'failed' by Illumina's chastity/purity filter.
Feature: added ecoT22I, mluCI, nlaIII, and sphI restriction enzymes to process_radtags
Bugfix: modified Stacks programs to handle Windows-style line endings ('\r\n') from FASTQ, FASTA, and
SAM files as well as population maps.
Bugfix: changed populations' genepop output to only include loci that are variable in the populations
specified. Previously, in some cases, additional fixed loci were included, which are not included in the
VCF output, causing the two files to have different loci present.
Bugfix: expected homozygosity and observed homozygosity were not being reported correctly in the sumstats
files. The other population statistics were not affected by the bug.
Feature: process_radtags and process_shortreads now print command and time executed to log file.
Stacks 0.9995 - July 05, 2012
-----------------------------
Bugfix: Fst summary matrix was being incorrectly written.
Stacks 0.9994 - July 01, 2012
-----------------------------
Feature: the populations program can now write a file in the GenePop format. GenePop files can be read
by the GenePop program and converted for other population genetics programs such as Arlequin. Caution: you
may not be able to include all loci from a Stacks run in the output as these programs aren't necessarily
capable of handling such a volume of data. However, you can use populations' whitelist feature to only
include certain loci in the output.
Feature: the populations program now writes an Fst summary file providing a matrix of mean Fst measures
for each pair of populations in the analysis.
Feature: added two filters to populations to require a locus to be present in a certain percentage of
individuals in a population, and requiring a locus to be present in a certain number of populations. If
the former criteria is not reached, the locus is zeroed out only in the specific population, if the latter
criteria is not met, the locus is discarded from the analysis.
Feature: three Fst corrections are now provided by the populations program: requiring a locus to have a significant
p-value (smaller than 0.05, although its configurable), applying a Bonferroni correction according to the number
of data points in the sliding window, and applying a Bonferroni correction according to the number of data points
in the genome. Loci that fail to reach statitical significance in each case are considered not different from zero
and are set to zero.
Feature: a filter can be specified to the populations program requiring a minimum allele frequency (MAF) at
a locus to consider the locus variable. If an allele at a locus is below the MAF, the locus is considered fixed.
Feature: when using a reference genome, Stacks can now work with samples of different sequence lengths.
This means one can combine samples generated from different Illumina runs of different length. Each
individual sample must be of the same length internally, however.
Feature: pstacks can now handle gapped alignments properly. It parses the CIGAR string in the SAM file
and inserts/removes Ns to accomodate indels and soft-masked alignment fragments. This prevents the SNP
model from mistakenly calling polymorphisms due to indel frameshifts.
Bugfix: Removed O(n^2) algorithm from Sliding window Fst calculation in populations program, significant
speedup acheived.
Bugfix: Updated load_radtags.pl to support population types and to import sumstats, fst, and genotypes
files.
Bugfix: fixed a small memory leak in DNANSeq.
Stacks 0.9993 - June 07, 2012
--------------------------------
Feature: Added Fisher's Exact Test statistics to Fst estimates. This provides a p-value, an odds ratio
along with a 95% confidence interval and a Log of Odds (LOD) score for each Fst estimate. These
statistics allow one to decide if a particular Fst measurement is significant.
Feature: denovo_map.pl and ref_map.pl now import population statistics files into the database (fst
and sumstats files).
Feature: Web interface now displays summary statistics and Fst values for every locus.
Feature: population names can now be directly added through the web interface and they will be stored
in the database and propogated.
Stacks 0.9992 - May 22, 2012
--------------------------------
Bugfix: fixed massive memory leak in Fst calcuations in populations program.
Bugfix: if using a population map to calculate Fst in the populations program, some individuals could
be inadvertently attributed to the wrong populations, due to a mismatch between the indices of the
population map (PopMap.h) and the indexes recorded for making the population summary (PopSum.h).
Feature: population map can now be specified to denovo_map.pl and ref_map.pl. This data is
populated into the database and samples are displayed according to their population in the web interface.
Feature: improved denovo_map.pl and ref_map.pl to check for existence of input files.
Bugfix: export_sql.pl wasn't properly using the new filters that use a lower and upper bound (snps, alle,
pare).
Feature: improved how values are generated for web-based filters, allowing for larger populations/maps.
Improved HTML rendering for extremely long haplotype strings.
Bugfix: corrected alleles to be output as "unphased" in VCF file; corrected homozygotes to be printed as
diploid values, e.g. '0/0' or '1/1' instead of just '0'.
Bugfix: changed reporting of SNPs on samples.php page to specify total number of SNPs and the number
of polymorphic loci (containing one or more SNPs).
Bugfix: an extra tab was being placed in the VCF output file.
Feature: added flag to process_radtags to disable checking the integrity of the RAD site in each raw
read. Added a flag to allow more nucleotide mismatches in the barcode when rescuing barcodes.
Stacks 0.9991 - April 17, 2012
--------------------------------
Bugfix: replaced bit-rotted code causing all nucleotides to be masked as 'N' when fixed model engaged
on ustacks.
Stacks 0.999 - April 11, 2012
--------------------------------
Feature: Added support for the 1000 Genomes Project, Variant Call Format (VCF) in the populations
program. (http://www.1000genomes.org/node/101). This file output includes the genotype calls for
every individual for each locus, allele depth, and likelihood values for heterozygous SNP calls.
Feature: implemented a three-bit compression scheme so that uncalled bases ('N's) can be stored
in compressed format in pstacks. Other stacks programs currently use two-bit compression which is
more compact, but can only store plain nucleotides ('A', 'C', 'G', 'T'). This restores earlier behavior
that allowed Ns in pstacks prior to the implementation of the two-bit compression scheme.
Bugfix: the populations program was only outputing sites to the summary statistics file (*.sumstats.tsv)
if they were heterozygous in a population. This could give the impression that the same site may be
absent in other populations when in reality it was simply fixed in the other populations. Now, if a
site is heterozygous in any of the populations, it will be output for all populations.
Bugfix: added lots of error checking code to populations so it properly handles
poorly formatted population maps, missing files, and similar errors.
Bugfix: added uncalled bases ('n', 'N', and '.') to the reverse complement function (reads
aligned on the negative strand and processed by pstacks will be stored reverse complement.
Bugfix: updated the PHP code as well as export_sql.pl to properly use the new filters for
chromosome, basepair, as well as lower and upper ranges to various filters.
Other: Removed the deprecated markers.pl, genotypes.pl, and process_radtags.pl programs from the distribution.
Stacks 0.998 - January 06, 2012
--------------------------------
Feature: Pipeline is now aware if samples are submitted as a 'population' or a 'mapping cross'.
A new command line option, -s, has been added to denovo_map.pl and ref_map.pl that will label
the dataset as a population. The -p/-r flags continue to keep the samples as a mapping cross.
Feature: The web interface has been updated to display more information specific to populations.
The filtering code has been changed to include lower and upper limits for filter fields such
as SNPs, alleles, and number of parents/samples.
Feature: A new program, populations, has been written to be executed in place of the exisiting
genotypes program when a population is being processed through the pipeline. A map specifiying
which individuals belong to which population is submitted to the program and the program will then
calculate population genetics statistics, expected/observed heterzygosity, Pi, and Fis at each
nucleotide position.
Feature: the populations program will compare all populations pairwise to compute Fst. If a set
of data is reference aligned, then a kernel-smoothed Fst will also be calculated.
These statistics were originally designed by Paul Hohenlohe and Bill Cresko, and are
described in the paper: Population Genomics of Parallel Adaptation in Threespine Stickleback
using Sequenced RAD Tags,
http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000862
They have been implemented independently in Stacks.
Feature: added the DpnII enzyme to the process_radtags program.
Feature: Added new 'model' line to *.tags.tsv files. This line records the output of the SNP
model at every position in the read as either Homozygous (O), Heterozygous (E), or unknown (U).
Previously only polymorphic loci were recorded in the SNPs file (and this remains unchanged). The
model output line is now also available in the web interface.
Bugfix: fixed crasher bug in cstacks when parallel processing was enabled for genomic-aligned data.
Bugfix: allele depths are now properly reported in reference-aligned data.
Stacks 0.997 - November 22, 2011
--------------------------------
Feature: new program, called clone_filter, that will take a set of paired-end reads and
reduce them according to PCR clones (a PCR clone is a pair of reads that match exactly,
while paried-end reads from two different DNA molecules will nearly always be slightly
different lengths).
Feature: new program, called kmer_filter, that allows paired or single-end reads to be
filtered according to the number or rare or abundant kmers they contain. Useful for both
RAD datasets as well as randomly sheared genomic or transcriptomic data.
Feature: new program, called process_shortreads, performs the same task as process_radtags
for fast cleaning of randomly sheared genomic or transcriptomic data (a 'beta' version of
this program has actually been distributed in the last few Stacks releases).
Feature: the Stacks tags.tsv file format has a new column to record the DNA strand that a
particular read is aligned to, currently only used in datasets aligned to a reference genome.
Feature: pstacks now reverse complements all stacks aligned to the negative strand and
stores them in this orientation in the output files and database. (All aligners always present
these reads in the positive orientation.) This change allows one to align reads to a reference
genome using a gapped aligner, such as Tophat or GSNAP and have the RAD site still align with
genomic data. (One can then compare genomic RAD tags along with cDNA RAD tags.)
Feature: added the '-d' flag to export_sql.pl to export allele depths from the database.
Feature: altered process_radtags to store orphaned, paired-end reads in a remainder file,
keeping paired-reads in frame.
Bugfix: fixed the handling of the paired-end barcode in process_shortreads, added a check
to make sure the barcodes from both pairs of a read match.
Bugfix: genotypes was not capitalizing auto-corrected genotypes in the generic format (it
was in joinmap/rqtl specific formats).
Bugfix: corrected cut site sequence for ApeKI in process_radtags.
Bugfix: process_radtags inadvertantly used newly initialized memory that had not been
cleared, causing rare parsing errors when uncleared memory resembled portions of a FASTQ record.
Bugfix: the default MySQL permissions were not being properly passed to index_radatags.pl.
Bugfix: changed load_radtags.pl to extract parental IDs from directly catalog files, instead of
relying on file names.
Feature: added a 'dry run' option to load_radtags.pl so it will print what it intends to do
without actually doing it.
Stacks 0.996 - October 5, 2011
---------------------------------
Web interface updates:
* If the RAD tags are aligned to a reference genome, a filter is now available to view markers
from a particular genomic region.
* The individual RAD tag viewer now scrolls while keeping the scale view and consensus sequence
always visible.
* The RAD tag viewer now highlights columns for which the catalog locus shows a SNP, but the
RAD tag does not.
* In the genotype viewer, the map between the haplotype and genotype is now available.
* The depth of each RAD tag is now visible in the genotype viewer.
* The genotype viewer has now been integrated with the observed haplotype viewer. You can
make changes/corrections to genotypes directly now, no need to submit a form and wait for
the page to reload.
Bugfix: process_radtags wasn't properly parsing the names of v1 Illumnina BUSTARD files.
Bugfix: process_radtags wasn't counting the total number of barceded paired-end reads correctly.
Bugfix: sstacks' impute_haplotype() was causing spurious matching in some, error-based cases.
Bugfix: build system was not properly replacing the _PKGDATADIR_ variable in denovo/ref_map.pl
programs.
Stacks 0.995 - September 23, 2011
---------------------------------
Feature: sstacks can now handle samples and catalogs that have different length reads.
Each individual sample must be constructed from the same length reads (by ustacks and cstacks),
but between samples there can be different lengths, e.g. a catalog of length 50bp and samples
of length 100bp, or vice versa.
Feature: Added the ApeKI restriction enzyme to process_radtags
Feature: process_radtags can now capture discarded reads to a file.
Bugfix: a coding limitation was removed that required polymorphic sites in the catalog to
contain only two alleles. Now, all four alleles can be recorded at a single site in a locus in
the catalog.
Bugfix: Exporting results from the web interface was not including manual genotype corrections
when requested.
Stacks 0.994 - August 08, 2011
------------------------------
Feature: added catalog index structure to cstacks to speed construction of catalog
when using reference-aligned sequences.
Feature: added a new output type, 'genomic' to genotypes. Outputs SNPs individually,
encoded as a set of integers, for reference-aligned reads.
Bugfix: pstacks was not writing individual stack sequences properly.
Bugfix: process_radtags was still checking the quality of sequence that was
destined to be truncated off the read.
Bugfix: process_radtags segfault fixed, parsing stop position
mis-specified in parse_input_record().
Stacks 0.993 - August 05, 2011
------------------------------
Memory usage optimization: Individual sequence reads are now stored internally
using a 2-bit encoding of DNA nucleotides. Some simple benchmarking of
ustacks (previous version / new version):
Sample size Elapsed Time Used Memory
------------- ----------------- -------------
3.78m reads 3:16 / 3:23 4.64G / 1.86G
17.62m reads 1:31:21 / 1:43:54 55.55G / 45.42G
Feature: Added the programs sort_read_pairs.pl, exec_velvet.pl, load_sequences.pl
to facilitate the assembly of paired-end RAD-Tags into mini-contigs and allow them
to be uploaded into and viewed from the web interface.
Bugfix: made process_radtags emit an error when an unrecognized
restriction enzyme is specified.
Bugfix: made process_radtags accept barcodes with trailing whitespace,
such as would be seen in a DOS text file or if errant tabs are
specified.
Stacks 0.992 - July 04, 2011
----------------------------
Feature: process_radtags can now handle Phred+33 or Phred+64 encodings, Phred+33 is
the new default encoding in Illumina's CASAVA software (v1.8).
Bugfix: Changed the sql input parser to handle variable length input
lines. Necessary if loading tens of individuals into a catalog.
Bugfix: Added command line options to ustacks to better control the use of secondary reads
in the stack-building procedure.
Stacks 0.991 - June 06, 2011
----------------------------
Bugfix: genotypes was failing to parse Stacks output files with
unanticipated names.
Bugfix: when using ref_map.pl, tags without SNPs were failing to match
against the catalog.
Stacks 0.99 - May 20, 2011
--------------------------
*A new C++ genotypes program has been added. This program works independently from the
database allowing the pipeline to fully function without installing the database. The
new program performs the tasks once completed by markers.pl and genotypes.pl.
- The pipeline has been modified to now automatically execute the genotypes program
as the last stage in an analysis. It will generate a file containing the observed
haplotypes and an additional file containing a map-agnostic set of genotype calls.
- If SQL interaction is enabled, the genotypes will be imported to the database and
serve as a base to export genotypes directly from the web interface for a particular
map and using the set of filters available online.
- If a population is being examinined, the observed haplotypes file can be imported into
Microsoft Excel or another tab-separated file viewer to immediately see the results.
- By replacing the Perl version of genotypes.pl we also no longer need to install or
worry about the caching mechanism for auto-correcting stacks, the C++ version can do
this by directly reading the Stacks output files.
*markers.pl and genotypes.pl are now deprecated and will no longer be supported.
*Feature: When exporting observed haplotypes, you can now specify a
minimum stack depth to include a particular individual at a locus.
*Feature: map-specific genotypes can now be exported directly from the
database/web server.
*Bugfix: genotypes.pl: make script ignore parental genotypes based on
the sample type from the MySQL table, not based on the file name.
*Bugfix: genotypes.pl: some loci were sneaking in despite being under
the progeny limit.
*Bugfix: made process_radtags Bustard file parser check number of fields to prevent
attempting to parse FASTQ (and segfaulting). Thanks to
Maureen.Liu -at- nottingham.ac.uk for reporting it.
*Bugfix: in sstacks, when matching to the catalog using reads aligned
to a reference genome (-g), sstacks did not verify that haplotypes
matched exactly, causing some spurious matching, which later
translated into dropped genotypes.
*Bugfix: in markers.pl, the ratio observed alleles in the progeny was
not being tallied correctly for ab/ac markers.
Stacks 0.984 - May 04, 2011
---------------------------
*Bugfix: renamed constants.php to constants.php.dist to avoid
overwriting an existing file on reinstallation.
*Feature: process_radtags has been converted to a C++ program
increasing its speed by approximately 25x. The parameters were
modified to be a little more intuitive and parameters were added to
control the size and score limit of the sliding window. The program
can process a GAII lane in about 5 minutes, a HiSeq lane in about 12
minutes, depending on the hardware used.
Stacks 0.983 - Apr 30, 2011
---------------------------
*Bugfix: sstacks segfault when running parallelized. Improper
insertion into map object when it should have only been checking for
element presence/absence. Thanks to
for first reporting it.
*Feature: added code to impute the genotype of a missing, second
parent for some map types. This code adds up all the observed
haplotypes in the progeny and then compares their frequencies against
those that would be expected for the marker under Hardy-Weinberg
equilibrium, choosing the marker type that best fits the
Hardy-Weinberg expectation.
Stacks 0.982 - Mar 29, 2011
---------------------------
*Bugfix: process_radtags.pl was not properly parsing FASTQ formated,
paired-end file names.
*Bugfix: counts of matching parents/progeny were sometimes incorrect
due to a slightly promiscuous SQL query in index_radtags.pl.
Stacks 0.98 - Feb 25, 2011
---------------------------
Note: if you have pre-existing databases, you must rebuild the catalog
index (index_radtags.pl -D db -c) so that they are compatible with
the new elements of the web interface.
*Added option to pstacks to require a minimum depth of coverage for
a stack aligned to the refernce genome before reporting it.
*Added double haploid (DH) and F2 export types to the genotypes.pl
script.
*Added option to output any map in R/QTL output in genotypes.pl
*Added feature to filter by number of available genotypes in progeny
*Added command line option to ustacks to capture and output unused
reads.
*Added display of chromosome/base pair to web interface for stacks
aligned to a reference genome.
*Bugfix: FASTA parser was missing records due to a bug introduced from
a FASTQ parser fix.
*Bugfix: process_radtags.pl was not properly checking the integrity of
the RAD site after adding restriction enzymes with alternate
nucleotides.
*Bugfix: when constructing the catalog, some tags being added to the
did not have their genomic location transferred over to a new catalog
tag.
*Modified sstacks to include an option to match stacks against the
catalog based on the genomic location (assuming individuals were
processed with pstacks).
*Bugfix: Lots of clean-ups and command line option fixes, thanks
to .
Stacks 0.971 - Jan 30, 2011
---------------------------
*Illumina software version 1.3 produces Phred scores that can begin
with a '@' character, throwing off the FASTQ parser. Added code to
clear the read buffer in between records to solve the problem. Thanks
to Aarti for finding the bug.
Stacks 0.97
---------------------------
*ustacks now detects when there are uncalled nucleotides in FASTA or
FASTQ input files, replaces those bases with 'A'.
*process_radtags.pl now detects barcode length automatically. Removed
spurious error messages when no data is processed.
Stacks 0.96 - Jan 7, 2011
---------------------------
*Fixed typo in README giving the wrong file path for the Apache
configuration file.
*Fixed several hard-coded paths in PHP files that referred to our local
system.