Stacks: process_shortreads

Program Options

process_shortreads [-f in_file | -p in_dir [-P] | -1 pair_1 -2 pair_2] -b barcode_file -o out_dir
[-i type] [-y type] [-c] [-q] [-r] [-E encoding] [-t len] [-D] [-w size] [-s lim] [-h]

f — path to the input file if processing single-end seqeunces.
i — input file type, either 'bustard' for the Illumina BUSTARD format, 'bam', 'fastq' (default), or 'gzfastq' for gzipped FASTQ.
y — output type, either 'fastq', 'gzfastq', 'fasta', or 'gzfasta' (default is to match the input file type).
p — path to a directory of single-end Illumina files.
P — specify that input is paired (for use with '-p').
I — specify that the paired-end reads are interleaved in single files.
1 — first input file in a set of paired-end sequences.
2 — second input file in a set of paired-end sequences.
o — path to output the processed files.
b — a list of barcodes for this run.
c — clean data, remove any read with an uncalled base.
q — discard reads with low quality scores.
r — rescue barcodes.
t — truncate final read length to this value.
E — specify how quality scores are encoded, 'phred33' (Illumina 1.8+, Sanger) or 'phred64' (Illumina 1.3 - 1.5, default).
D — capture discarded reads to a file.
w — set the size of the sliding window as a fraction of the read length, between 0 and 1 (default 0.15).
s — set the score limit. If the average score within the sliding window drops below this value, the read is discarded (default 10).
h — display this help message.

Barcode options:

--inline_null: barcode is inline with sequence, occurs only on single-end read (default).
--index_null: barcode is provded in FASTQ header, occurs only on single-end read.
--inline_inline: barcode is inline with sequence, occurs on single and paired-end read.
--index_index: barcode is provded in FASTQ header, occurs on single and paired-end read.
--inline_index: barcode is inline with sequence on single-end read, occurs in FASTQ header for paired-end read.
--index_inline: barcode occurs in FASTQ header for single-end read, is inline with sequence on paired-end read.

Adapter options:

--adapter_1 [sequence]: provide adaptor sequence that may occur on the first read for filtering.
--adapter_2 [sequence]: provide adaptor sequence that may occur on the paired-read for filtering.
--adapter_mm [mismatches]: number of mismatches allowed in the adapter sequence.

Output options:

--merge: if no barcodes are specified, merge all input files into a single output file (or single pair of files).

Advanced options:

--no_read_trimming: do not trim low quality reads, just discard them.
--len_limit [limit]: when trimming sequences, specify the minimum length a sequence must be to keep it (default 31bp).
--filter_illumina: discard reads that have been marked by Illumina's chastity/purity filter as failing.
--barcode_dist: provide the distace between barcodes to allow for barcode rescue (default 2)
--mate-pair: raw reads are circularized mate-pair data, first read will be reverse complemented.
--no_overhang: data does not contain an overhang nucleotide between barcode and seqeunce.

Example Usage

If your data are paired-end, Illumina HiSeq data, in a directory called raw:

~/raw% ls lane4_NoIndex_L004_R1_001.fastq lane4_NoIndex_L004_R1_009.fastq lane4_NoIndex_L004_R2_005.fastq lane4_NoIndex_L004_R1_002.fastq lane4_NoIndex_L004_R1_010.fastq lane4_NoIndex_L004_R2_006.fastq lane4_NoIndex_L004_R1_003.fastq lane4_NoIndex_L004_R1_011.fastq lane4_NoIndex_L004_R2_007.fastq lane4_NoIndex_L004_R1_004.fastq lane4_NoIndex_L004_R1_012.fastq lane4_NoIndex_L004_R2_008.fastq lane4_NoIndex_L004_R1_005.fastq lane4_NoIndex_L004_R2_001.fastq lane4_NoIndex_L004_R2_009.fastq lane4_NoIndex_L004_R1_006.fastq lane4_NoIndex_L004_R2_002.fastq lane4_NoIndex_L004_R2_010.fastq lane4_NoIndex_L004_R1_007.fastq lane4_NoIndex_L004_R2_003.fastq lane4_NoIndex_L004_R2_011.fastq lane4_NoIndex_L004_R1_008.fastq lane4_NoIndex_L004_R2_004.fastq lane4_NoIndex_L004_R2_012.fastq

Then you run process_shortreads like this:

% process_shortreads -P -p ./raw/ -o ./samples/ -b ./barcodes/barcodes_lane4 -r -c -q

process_shortreads

Program Options

Barcode options:

Adapter options:

Output options:

Advanced options:

Example Usage

Other Pipeline Programs

Raw reads

Core

Execution control

Utility programs