RADinitio is a pipeline for the assessment of RADseq experiments via prospective and retrospective data simulation. Sequencing data is generated de novo from a population of individuals via a coalescent simulation under a user-defined demographic model using msprime. The genetic variants in each sample are simulated in a genomic context that can i mpact downstream generation of RAD loci and sequencing reads. The per-individual sequences are then treated to an in silico RADseq library preparation. The components of the library are defined by the user, allowing for the exploration of parameters including restriction enzyme selection, insert size, sequencing coverage, and PCR duplicate distribution (generated using the decoratio software). RADinitio simulations can also be applied retrospectively by comparing and modelling sources of error in empirical datasets. The purpose of RADinitio is for researchers to fully explore possible variables in their data generation process to ensure that their protocol selection and library preparation is performed optimally, within the limitations of technical and experimental error.
RADinitio simulates different stages of the RADseq library preparation and sequencing process, starting from the sampling of a study population. From the sequences of a reference genome file, RADinitio will:
The final output of the RADinitio pipeline are paired-end RADseq reads for each from the population of simulated individuals.
RADinitio is implemented in Python3 and is released under the GNU GPL license.
If you use RADinitio on your work, please cite our Mol Ecol Resour manuscript:
Rivera-Colón AG, Rochette NC, Catchen JM. (2021) Simulation with RADinitio improves RADseq experimental design and sheds light on sources of missing data. Mol Ecol Resour 21: 363-378. DOI: 10.1111/1755-0998.13163
RADinitio was developed by Angel G. Rivera-Colón <arcolon14@gmail.com>, with constributions from Nicolas Rochette <nic.rochette@gmail.com> and and Julian Catchen <jcatchen@illinois.edu>.