RADinitio is a pipeline for the assessment of RADseq experiments via prospective and retrospective data simulation. Sequencing data is generated de novo from a population of individuals via a coalescent simulation under a user-defined demographic model using msprime. The genetic variants in each sample are simulated in a genomic context that can impact downstream generation of RAD loci and sequencing reads. The per-individual sequences are then treated to an in silico RADseq library preparation. The components of the library are defined by the user, allowing for the exploration of parameters including restriction enzyme selection, insert size, sequencing coverage, and PCR duplicate distribution (generated using the decoratio software). RADinitio simulations can also be applied retrospectively by comparing and modelling sources of error in empirical datasets. The purpose of RADinitio is for researchers to fully explore possible variables in their data generation process to ensure that their protocol selection and library preparation is performed optimally, within the limitations of technical and experimental error.

Download RADinitio
version 1.2.1

Recent Changes [updated June 26, 2023]

RADinitio Manual

What do I need to run RADinitio?

  1. Reference genome FASTA file
    • RADinitio simulates and extracts reads from the sequences in this file. It can be both compressed or uncompressed. The file used does not need to contain a chromosome-level genome assembly. RADinitio can still be run over scaffold-level sequences.
    • If no reference genome is available for the species of interest, user can run RADinitio using the genome of a closely related species to model the number and distribution of expected RAD loci.

What does RADinitio do?

RADinitio simulates different stages of the RADseq library preparation and sequencing process, starting from the sampling of a study population. From the sequences of a reference genome file, RADinitio will:

  1. Generate and process genetic variants from a simulated study population.
    • These variants are generated using the coalescent simulator msprime (Kelleher, et al. 2016; Baumdicker et al. 2022). Users can specify the demographic parameters of this model, including the number of populations and sampled individuals, and the size and migration between these populations.
  2. Extract RAD alleles for each sample.
    • The simulated genomes are in silico digested with either single- or double-digest RAD protocols, generating a series of RAD loci across the genome. For each sample we extract the sequences belonging to each locus adding the simulated individual-specific variants.
  3. Simulate library enrichment and sequencing.
    • For each individual, we sample the pool of alleles to generate paired-end sequences of each allele at a desired sequencing coverage. In addition, the pool of alleles can be amplified using an in silico PCR model (Rochette et al. 2023) to generate duplicate reads.

The final output of the RADinitio pipeline are paired-end RADseq reads for each from the population of simulated individuals.

Implementation

RADinitio is implemented in Python3 and is released under the GNU GPL license.

Citing RADinitio

If you use RADinitio on your work, please cite our Mol Ecol Resour manuscript:

Rivera-Colón AG, Rochette NC, Catchen JM. (2021) Simulation with RADinitio improves RADseq experimental design and sheds light on sources of missing data. Mol Ecol Resour 21: 363-378. DOI: 10.1111/1755-0998.13163

Authors

RADinitio was developed by Angel G. Rivera-Colón <arcolon14@gmail.com>, with constributions from Nicolas Rochette <nic.rochette@gmail.com> and and Julian Catchen <jcatchen@illinois.edu>.