RADinitio is a simulation pipeline for the assessment of RADseq experiments. Genetic data are forward simulated for
a population of individuals from a designated reference genome. The per-individual sequences are then treated to an
in silico RADseq library preparation and sequencing process, allowing for the
exploration of parameters including restriction enzyme selection, library insert size, PCR duplicate distribution,
and sequencing coverage. RADinitio allows researchers to ensure that their protocol selection and library preparation
is performed optimally, within the limitations of technical and experimental error.
Recent Changes [updated September 19, 2019]
What do I need to run RADinitio?
- Reference genome FASTA file
- RADinitio simulates and extracts reads from the sequences in this file. It can be both compressed or uncompressed. The file used does not need to contain a chromosome-level genome assembly. RADinitio can still be run over scaffold-level sequences.
- If no reference genome is available for the species of interest, user can run RADinitio using the genome of a closely related species to model the number and distribution of expected RAD loci.
What does RADinitio do?
RADinitio simulates different stages of the RADseq library preparation and sequencing process, starting from the sampling of a study population. From the sequences of a reference genome file, RADinitio will:
- Generate and process genetic variants from a simulated study population.
- These variants are generated using the coalescent simulator msprime
(Kelleher, et al. 2016). Users can specify the
demographic parameters of this model, including the number of populations and sampled individuals, and the size and migration between these
- Extract RAD alleles for each sample.
- The simulated genomes are in silico digested with either single or double-digest RAD protocols, generating a series of RAD loci across the
genome. For each sample we extract the sequences belonging to each locus adding the simulated individual-specific variants.
- Simulate library enrichment and sequencing.
- For each individual, we sample the pool of alleles to generate paired-end sequences of each allele at a desired sequencing coverage. In
addition, the pool of alleles can be amplified using an in silico PCR model to generate duplicate reads.
The final output of the RADinitio pipeline are paired-end RADseq reads for each from the population of simulated individuals.
RADinitio is implemented in Python and is released under the
GNU GPL license.
RADinitio was developed by Angel Rivera-Colón
<> with constributions from Nicolas Rochette
<> and Julian Catchen