RADinitio Manual

Angel Rivera-Colón1, Nicolas Rochette1, and Julian Catchen1

1Department of Animal Biology
University of Illinois at Urbana-Champaign
Urbana, Illinois, 61820

  1. Introduction
  2. Installation
  3. Pipeline Structure
    1. Variant generation and processing
    2. Extraction of RAD alleles
    3. Simulate library enrichment and sequencing
  4. Command line interface
    1. Usage & options
    2. Examples
    3. Outputs
  5. Tutorial
  6. Python Interface

Introduction [⇑top]

RADinitio is a pipeline for the assessment of RADseq experiments via prospective and retrospective data simulation. Sequencing data is generated de novo for multiple individuals via a coalescent simulation under a user-defined demographic model using msprime. The genetic variants in each sample are simulated in a genomic context that can impact downstream generation of RAD loci and sequencing reads. The per- individual sequences then treated to an in silico RADseq library preparation. The components of the library are defined by the user, allowing for the exploration of parameters including restriction enzyme selection, insert size, sequencing coverage, and PCR duplicate distribution. RADinitio simulations can also be applied retrospectively by comparing and modelling sources of error in empirical datasets. The purpose of RADinitio is for researchers to fully explore possible variables in their data generation process to ensure that their protocol selection and library preparation is performed optimally, within the limitations of technical and experimental error.

Installation [⇑top]

Build the software

RADinitio can be installed from the Python Package Index using:

% python3 -m pip install radinitio

More information regarding this installation can be obtained at the RADinitio (PyPI Project page)

Users can install RADinitio in their local Python directories using the --user flag:

% python3 -m pip install radinitio --user

Alternatively, the software can be downloaded from the (Catchen Lab) website and installed using the following commands:

% python3 -m pip install radinitio-version.tar.gz

Or for a local installation:

% python3 -m pip install radinitio-version.tar.gz --user

For more information regarding the pip installation, please visit the (pip user-guide).

Running the pip installation, from both PyPI or the the direct installation will take care of all dependencies, including msprime (Kelleher, et al. 2016)), scipy, numpy.

Pipline Structure [⇑top]

RADinitio is designed as a series of independent functions that simulate different aspects of the RADseq library preparation process. The pipeline can be broken down into three main steps:

Variant generation and processing

Variants are generated with msprime from an user-defined demographic. Independent simulations are run for different chromosomes present in an user-provided reference genome. The simulated variants are then projected against the reference genome to obtain the reference alleles, which are then converted into alternative alleles using an user-defined model.

Extraction of RAD alleles

The reference genome is in silico digested to obtain a series of refence RAD loci. This can be done using a single or double enzyme digestion. The positions of the refernce loci in the genome then used to filter the simulated variants for all samples to include only variants present in RAD loci, improving downstream performance. For each sample, we extract the RAD alleles-the reference RAD loci are modified to include the corresponding genetic variants for each sample. This process can alter a cutsite's sequence, resulting in a dropped allele for that sample.

Simulate library enrichment and sequencing

Once extracted, the alleles for each sample are randomly sampled to obtain paired-end sequences for each allele template. The alleles are sampled with replacement, proportional to the desired sequencing coverage of the library. Each iteration of the sampling is treated as an independent sequence template that is truncated to a random lenth sampled from a simulated insert size distribution. A PCR duplicate distribution, generated from an user defined model, can be applied to the sampling process, resulting in the duplicate sampling of unique template sequences. This process can also introduce random error to the sequence, simulating the generation of PCR errors. Finally, paired end sequence are generated from the each of each template, each with its own unique sequencing error.

The corresponding functions for each stage can be run independently. We do provide a wrapper script, radinitio, that calls the top-level radinitio.simulate() function which runs the pipeline from start to finish. Advanced users can run the pipeline through the Python API, which allows for the generation of more complex demographic models, define finer details of the library preparation process, and running componments of the pipeline independently.

Command Line Interface [⇑top]

The simplest way to run RADinitio is to execute the module via the radinitio command line wrapper, which then calls the top-level radinitio.simulate() function, which runs all the stages of the pipeline.

Different usages of the pipeline, such as simulating a population or counting the number of RAD loci, can also be performed using the different pipeline stages options.

Usage and Options

The program options are the following:

radinitio --genome path --chromosomes path --out-dir dir [pipeline-stage] [(demographic model options)] [(library options)]

Pipeline stages (these options are mutually exclusive)

  • --simulate-all— Run all the RADinitio stages (simulate a population, make a library, and sequence) (Default)
  • --make-population— Simulate and process variants. Produces genome-wide VCF.
  • --make-library-seq— Simulate and sequence a RAD library. Requires exising variants.
  • --tally-rad-loci— Calculate the number of kept rad loci in the genome.

Input/Output files

  • -g,--genome— Path to reference genome (fasta file, may be gzipped). Required.
  • -l,--chromosomes— File containing the list of chromosomes (one per line) to simulate. Required.
  • -o,--out-dir— Path to an output directory where all files will be written. Required.

Demographic model (simple island model)

  • -p,--n-pops [int]— Number of populations (demes) in the island model (default = 2)
  • -n,--pop-eff-size [float]— Effective population size of each simulated deme (default = 5000.0)
  • -s,--n-seq-indv [int]— Number of individuals sampled from each population (default = 10)

Library preparation/sequencing

  • -b,--library-type— Library type (sdRAD or ddRAD) (default = 'sdRAD')
  • -e,--enz— Restriction enzyme (SbfI, PstI, EcoRI, BamHI, etc.) (default = 'SbfI')
  • -d,--enz2— Second restriction enzyme for double digest (MspI, MseI, AluI, etc.). (default = 'MspI')
  • -m,--insert-mean [int]— Insert size mean in bp. (default = 350)
  • -t,--insert-stdev [int]— Insert size standard deviation in bp. (default = 37)
  • -c,--pcr-cycles [int]— Number of PCR cycles (default = 0)
  • -v,--coverage [int]— Sequencing coverage (default = 20)
  • -r,--read-length [int]— Sequence read length. (default = 150)

make-library-seq specific options

  • --make-pop-sim-dir— Directory containing a previous radinitio.make_population run. Cannot be the same as out-dir.

Additional options

  • -V,--version— Print program version.
  • -h,--help— Display this help message.


# Simulating a sdRAD library: radinitio --simulate-all \ --genome ./genome/reference.fa.gz \ --chromosomes ./genome/chrom.list \ --out-dir ./simulations_sdRAD/ \ --n-pops 4 --pop-eff-size 2500 --n-seq-indv 10 \ --library-type sdRAD --enz SbfI --insert-mean 350 --insert-stdev 35 \ --pcr-cycles 9 --coverage 20 --read-length 150 # Simulating a ddRAD library: radinitio --simulate-all \ --genome ./genome/reference.fa.gz \ --chromosomes ./genome/chrom.list \ --out-dir ./simulations_ddRAD/ \ --n-pops 4 --pop-eff-size 2500 --n-seq-indv 10 \ --library-type ddRAD --enz PstI --enz2 MspI \ --insert-mean 350 --insert-stdev 35 \ --pcr-cycles 9 --coverage 20 --read-length 150 # Make a tally of sdRAD loci radinitio --tally-rad-loci \ --genome ./genome/reference.fa.gz \ --chromosomes ./genome/chrom.list \ --out-dir ./count_rad_loci/ \ --library-type sdRAD --enz SbfI # Make a tally of ddRAD loci radinitio --tally-rad-loci \ --genome ./genome/reference.fa.gz \ --chromosomes ./genome/chrom.list \ --out-dir ./count_ddrad_loci/ \ --library-type ddRAD --enz NlaIII --enz2 MluCI \ --insert-mean 320 --insert-stdev 25 # Simulate a population only radinitio --make-population \ --genome ./genome/reference.fa.gz \ --chromosomes ./genome/chrom.list \ --out-dir ./simulated_population/ \ --n-pops 4 --pop-eff-size 2500 --n-seq-indv 10 # Simulate library and sequencing radinitio --make-library-seq \ --genome ./genome/reference.fa.gz \ --chromosomes ./genome/chrom.list \ --out-dir ./SbfI_library/ \ --make-pop-sim-dir ./simulated_population/ \ --library-type sdRAD --enz SbfI \ --insert-mean 350 --insert-stdev 35 \ --pcr-cycles 9 --coverage 20 --read-length 150

Explanation of options

Different pipeline stages of RADinitio can be run independently:

  • --simulate-all runs all the RADinitio stages. Start with a reference genome and specify chromosomes, simulate a population, prepare a library and sequence. This is the default in the wrapper.
  • --make-population just generates a genome-wide VCF with the variants for a simulated population. A single simulated population can be later sequencing using different library parameters.
  • --make-library-seq simulates library preparatation and sequencing, resulting in paired-end reads for the simulated inidivuals. It requires a population of individuals, likely the output of a previous make-population run.
  • --tally-rad-loci for a given library configuration - library type and enzyme(s) - calculate the number of RAD loci in the genome.

reference.fa.gz is a genome fasta file. RADinitio can simulate using both compressed and uncompressed fasta files.

The --out-dir is the output directory for the simulations. Inside it series of subdirectories and files will be generated (described bellow).

chrom.list is contains a list with all the chromosome ids to simulate. Only the chromosomes on the list will be used as input for the simulations. This is important when working with highly fragmented assemblies or those with many small unplaced scaffolds. The structure of chrom.list is the following:

chrom_1 chrom_2 chrom_3 ...

--n-pops, --n-seq-indv, and --pop-eff-size contains the parameters of a simple msprime island demographic model. In this example, we are simulating 4 populations with an effective population size of 2,500 individuals each, from which we sample 10 individuals each. More complex demographic parameters can be generated using additional RADinitio functions. For more information regarding the demographic model parameters, please check the [msprime documentation].

For the sdRAD example below (--library-type sdRAD), --enz is the main restriction enzyme used for generating a single-digest RADseq library, as described by the protocols by [Baird, et al. 2008] and [Etter, et al. 2011]. In this example, the restriction enzyme SbfI is used, but other enzymes such as EcoRI, PstI, BamHI, among others, are available. The simulated library will have an average insert size of 350bp (+-35bp), and 2x150bp paired end reads. Additional library parameters can be generated using additional RADinitio functions.

The ddRAD example (--library-type ddRAD) uses a double restriction enzyme combination, as described in the protocol by [Peterson, et al. 2012], where --enz is the rare (main) cutter, PstI in this case, and --enz2 is the common (or double) cutter enzyme (MspI). The simulated library will have an minimum insert size of 250bp, a maximum insert size of 500bp, and 2x150bp paired end reads. Additional library parameters can be generated using additional RADinitio functions.

--pcr-cycles defines a RAD library enriched using 9 cycles of PCR. The library has a 2:1 template molecules to sequenced reads ratio. More complex PCR parameters can be generated using additional RADinitio functions.

--coverage defines the per-locus sequencing depth of the library, in this case 20X.


Running the radinitio wrapper script on the simulations directory will generate a series of subdirectories and files containing the outputs of the pipeline. The structire of the output directory is the following:

simulations: popmap.tsv radinitio.log sequenced_clone_distrib.tsv simulations/msprime_vcfs: chrom_1.vcf.gz chrom_2.vcf.gz chrom_3.vcf.gz chrom_4.vcf.gz chrom_5.vcf.gz simulations/ref_loci_vars: reference_rad_loci.fa.gz reference_rad_loci.stats.gz ri_master.vcf.gz simulations/rad_alleles: msp_0.alleles.fa.gz msp_1.alleles.fa.gz msp_2.alleles.fa.gz msp_3.alleles.fa.gz msp_4.alleles.fa.gz simulations/rad_reads: msp_0.1.fa.gz msp_0.2.fa.gz msp_1.1.fa.gz msp_1.2.fa.gz msp_2.1.fa.gz msp_2.2.fa.gz msp_3.1.fa.gz msp_3.2.fa.gz msp_4.1.fa.gz msp_4.2.fa.gz

Top level directory

A popmap.tsv file is generated in the top level output directory. It will contain the ids for all samples and their population of origin in the msprime demographic model. Samples are named with the msp_ suffix (for msprime sample) and a id number.

msp_0 pop0 msp_1 pop0 msp_2 pop1 msp_3 pop1 msp_4 pop2

The sequenced_clone_distrib.tsv contains information regarding the sequenced clone distribution generated during the most recent run. This distribution is the product of RADinitio's PCR model and is representative of the PCR duplicates observed in the simulated reads, if any. By default, no PCR cycles are enabled, thus the model returns:

clone_size n_errors clone_prob 0 0 0 1 0 1.0 1 1 0

In this distribution, 100% of reads originate from a sequenced clone of size 1 (a clone containing a single molecule) without any errors. Since there was no PCR amplification, the original template molecule is sampled and "sequenced". However, if PCR cycles are defined by the user using the --pcr-cycles option more complex clones can be sampled, resulting in PCR duplicates.

clone_size n_errors clone_prob 0 0 0 1 0 0.722149 1 1 8.58233e-05 2 0 0.211999 2 1 3.77304e-05 2 2 1.25639e-05 3 0 0.0521028 3 1 1.10092e-05 3 2 5.04165e-06 3 3 1.72311e-06 ...

Here, the clone_size describes the number of molecules in a clone. A clone size of n means that n molecules where sampled from a single clone. n copies of the same sequence will be saved as reads, with n - 1 being PCR duplicates. Moreover, clones can also contain molecules with PCR error (n_errors). A clone of size n can contain up to n molecules with error. clone_prob is the probability if sampling a clone with given size and error properties, and describes the distribution of PCR duplicates and errors in the simulated data.

The radinitio.log file is generated on the top level outoput directory. It contains information on the RADinitio version, a time stamp on the current run, as well as information on the different stages of the program.

msprime_vcfs/ directory

The msprime_vcfs/ directory contains output VCFs product of the msprime simulations. Chromosomes are simulated independently, thus one VCF is generated per chromosome using msprime.write_vcf(). These reference and alternative alleles present in these files are placeholders, and are processed when the variants are merged.

ref_loci_vars/ directory

The ref_loci_vars/ directory contains the processed genome-wide variants, stored in the ri_master.vcf.gz file. Variants in this file have been merged from all the simulated VCFs and have been projected against the defined reference genome to obtain the correct reference alleles, while the alternative alleles are mutated accordingly. The new alleles might contains indels simulated using the parameters specified in ri.MutationModel(). By default, indels have a 1% frequency.

reference_rad_loci.fa.gz contains all the reference RAD loci obtained from the reference genome. By default, the base reference loci are 1Kb long for single-digest RAD libraries. In ddRAD, the length of the reference loci is dependent on the position of the second cutsite and the desired insert range. For each locus, the fasta headers contains the id of the locus and its positon in the reference. The ids are composes of t for "true locus", number representing the ordering of the cutsite in the genome, and either n or p to denote loci in the negative and positive strand, respectively.


reference_rad_loci.stats.gz contains information on the position and status of every loci found on the genome with the defined restricton enzyme(s). Only the RAD loci marked as kept are mantained for further processing in the pipeline. Loci can be removed due to a variety of reasons, including lacking a second restriction site (for ddRAD), being to close to the end of chromosomes, and overlapping with adjacent loci when the cutsites are too close to one another. The contents of this file can be used to generate a tally of the number of cutsites in the genome, and the distribution of inter-cutsite distances.

# chrom_id cut_pos tag_dir status chrom_1 11811 neg kept chrom_1 11811 pos kept chrom_1 33252 neg kept chrom_1 33252 pos kept chrom_1 49706 neg kept chrom_1 49706 pos kept chrom_1 65488 neg rm_too_close chrom_1 65488 pos rm_too_close chrom_1 65746 neg rm_too_close ...

rad_alleles/ directory

rad_alleles/ contains per-individual gzipped fasta files contining the RAD alleles for each simulated sample. For each allele, the fasta headers contains the following information:


Here, reference_locus_id referes to the original reference locus the allele was generated. sample_id is the msprime sample id, and allele_i refers to the allele number (a1 or a2 in diploid individuals).


rad_reads/ directory

rad_reads/ contains per-individual gzipped paired-end fasta files contining the RAD reads for each simulated sample. For each individual read, the fasta headers contains the following information:


The basename of the read, which contains >reference_locus_id:sample_id:allele_i, comes from the source locus of the reads. clone_id referes to the source template id, while duplicate_i is the duplicate number for each read in the clone. read_pair is the standard designation for paired-end reads, /1 for forward read and /2 for the corresponding paired read.

Notice bellow that for reads 3 and 4, they both have a clone_id of 3. Read 3 has a clone_i of 1, meaning is the first read in the clone. Read 4 has a clone_i of 2, meaning it is the second read in the clone and thus a duplicate product of PCR.


The contents of rad_reads/ contain the main output of RADinitio. These reads behave as empirical paired-end reads and are fully compatible with the majority of bioinformatic software.

Tutorial [⇑top]

Preparing input files

Once RADinitio has been installed, any of the simulation stages can be run using the radinitio wrapper script. For all the stages, we first need a reference genome to simulate over. If you don't have any genomes available, you can download one from a public database. For this example, we are downloading the three-spine stickleback (Gasterosteus aculeatus) from the [ENSEMBL database]. Let's create a new simulations directory and download the genome. Run the following commands:

# Make new simulations directory mkdir simulations cd simulations # Download reference genome from ENSEMBL wget ftp://ftp.ensembl.org/pub/release-95/fasta/gasterosteus_aculeatus/dna/Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz

Let's now generate a chrom.list file. In this case, we want to run the simulations only on the main stickleback chromosomes (named as groups in this assembly). For time, we are simulating only the first 3 chromosomes in this example. Run the following command to create the list.

# Create a chrom.list comprising the first three linkage groups zcat Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz | grep '^>group' | cut -f1 -d ' ' | tr -d '>' | head -n 3 > chrom.list

While the previous command work for this specific genome, we recommend generating the chrom.list using an indexing of the reference genome using samtools faidx command. This will provide the official chromosome IDs, in addition to the sequence lengths, which can be helpful for filtering smaller scaffolds in the assembly. samtools documentation is available on the [htslib] website.

# Generate fasta index (fai) from reference genome. # Unzip reference genome. gunzip Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz # Run samtools faidx samtools faidx Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa # Compress genome gzip Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa # Generate chrom.list from fai cat Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.fai | cut -f 1 | head -n 3 > ./chrom.list

Now that we have the reference genome and the list of chromosomes we can run radinitio. We will use the stickleback reference genome and our list of selected chromosomes across the different pipeline stages.

Running the complete RADinitio pipeline

The pipeline stage option simulate-all allows you to run all the stages of RADinitio, including simulating a population, RAD library preparation and sequencing. We will simulate 2 populations, samping 10 individuals each. The sdRAD library will be prepared using the SbfI restriction enzyme, no PCR enrichment, and a sequencing coverage of 10X. Note: Running this command will take around 15 minutes.

# Make output directory mkdir ./ri_tutorial # Run simulation radinitio --simulate-all --genome Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz \ --chromosomes chrom.list --out-dir ri_tutorial --library-type sdRAD \ --enz SbfI --n-pops 2 --n-seq-indv 10 --coverage 10

Once completed, check the contents of the new directory you can move to the new output directory and check its contents by running:

ls ri_tutorial/*

The output should look like this:

popmap.tsv radinitio.log sequenced_clone_distrib.tsv msprime_vcfs: groupI.vcf.gz groupII.vcf.gz groupIII.vcf.gz rad_alleles: msp_00.alleles.fa.gz msp_10.alleles.fa.gz msp_01.alleles.fa.gz msp_11.alleles.fa.gz msp_02.alleles.fa.gz msp_12.alleles.fa.gz msp_03.alleles.fa.gz msp_13.alleles.fa.gz msp_04.alleles.fa.gz msp_14.alleles.fa.gz msp_05.alleles.fa.gz msp_15.alleles.fa.gz msp_06.alleles.fa.gz msp_16.alleles.fa.gz msp_07.alleles.fa.gz msp_17.alleles.fa.gz msp_08.alleles.fa.gz msp_18.alleles.fa.gz msp_09.alleles.fa.gz msp_19.alleles.fa.gz rad_reads: msp_00.1.fa.gz msp_04.1.fa.gz msp_08.1.fa.gz msp_12.1.fa.gz msp_16.1.fa.gz msp_00.2.fa.gz msp_04.2.fa.gz msp_08.2.fa.gz msp_12.2.fa.gz msp_16.2.fa.gz msp_01.1.fa.gz msp_05.1.fa.gz msp_09.1.fa.gz msp_13.1.fa.gz msp_17.1.fa.gz msp_01.2.fa.gz msp_05.2.fa.gz msp_09.2.fa.gz msp_13.2.fa.gz msp_17.2.fa.gz msp_02.1.fa.gz msp_06.1.fa.gz msp_10.1.fa.gz msp_14.1.fa.gz msp_18.1.fa.gz msp_02.2.fa.gz msp_06.2.fa.gz msp_10.2.fa.gz msp_14.2.fa.gz msp_18.2.fa.gz msp_03.1.fa.gz msp_07.1.fa.gz msp_11.1.fa.gz msp_15.1.fa.gz msp_19.1.fa.gz msp_03.2.fa.gz msp_07.2.fa.gz msp_11.2.fa.gz msp_15.2.fa.gz msp_19.2.fa.gz ref_loci_vars: reference_rad_loci_SbfI.fa.gz ri_master.vcf.gz reference_rad_loci_SbfI.stats.gz

The simulated reads are stored in rad_reads/ and can then be used for other bioinformatic applications applications.

Simulating a population

The pipeline stage option make-population will run msprime on the selected chromosomes using an user-defined demographic model and process variants into a single genomic VCF. The simulated population can then be used to generate multiple RAD libraries with different parameters.

Here, we simulate 4 populations each with an effective population size of 2500 individuals, sampling 15 individuals from each population.

# Make output directory mkdir ./sim_population_p4_s15_ne2500 # Run simulation radinitio --make-population --genome Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz \ --chromosomes chrom.list --out-dir ./sim_population_p4_s15_ne2500 \ --n-pops 4 --n-seq-indv 15 --pop-eff-size 2500

Once completed, check the contents of the new directory you can move to the new output directory and check its contents by running:

ls sim_population_p4_s15_ne2500/*

The output should look like this:

popmap.tsv radinitio.log msprime_vcfs: groupI.vcf.gz groupII.vcf.gz groupIII.vcf.gz ref_loci_vars: ri_master.vcf.gz

Simulate library and sequencing

Tutorial under construction

Count the number of RAD loci

tally-rad-loci will extract the reference RAD loci sequences under user-defined library parameters, particularly the library type and enzyme combinations.

Here, we will tally the number of RAD loci obtained for two different library configurations. First a single-digest RAD library with SbfI, and with a double-digest library with EcoRI and MspI.

# Make output directory for sdRAD library mkdir ./loci_sdrad_sbfi # Run simulation radinitio --tally-rad-loci --genome Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz \ --chromosomes chrom.list --out-dir ./loci_sdrad_sbfi \ --library-type sdRAD --enz SbfI

The output should look like this:

radinitio.log ref_loci_vars: reference_rad_loci_SbfI.fa.gz reference_rad_loci_SbfI.stats.gz

The number of total loci found can be obtained from the reference_rad_loci.stats.gz file, stored in the ref_loci_vars directory, by counting the number of kept loci.

zgrep -c 'kept' ./loci_sdrad_sbfi/ref_loci_vars/reference_rad_loci.stats.gz

Now, lets simulate the ddRAD library and count the number of loci.

# Make output directory for ddRAD library mkdir ./loci_ddrad_ecori-mspi # Run simulation radinitio --tally-rad-loci --genome Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa.gz \ --chromosomes chrom.list --out-dir ./loci_ddrad_ecori-mspi \ --library-type ddRAD --enz EcoRI --enz2 MspI # Count loci zcat ./loci_ddrad_ecori-mspi/ref_loci_vars/reference_rad_loci.stats.gz | grep -c 'kept'

Once completed, check the contents of the new directory you can move to the new output directory and check its contents by running:

ls sim_population_p4_s15_ne2500/*

The output should look like this:

Python Interface [⇑top]

Any custom arguments to msprime.simulate() may be supplied in the form of a dictionary, which must be passed to radinitio.simulate() using the msprime_simulate_args argument. However, the length of each chromosome is taken from the supplied reference genome, and the recombination rates must be supplied separately using radinitio.simulate()'s chrom_recomb_rates argument, which may be a single float or a dictionary specifying per-chromosome recombination rates.

The parameters for simulating the preparation of the RAD library and its sequencing can be controlled via dedicated classes. For instance, the library_opts parameter of radinitio.simulate() should be an instance of the LibraryOptions type.

For example, this the radinitio.main() function (as of v1.0.0):

import radinitio, msprime def main(): args = parse_args() # Msprime parameters # Create the population(s). pops = [ msprime.PopulationConfiguration( sample_size = 2 * args.n_seq_indv, initial_size = args.pop_eff_size, growth_rate = 0.0) for i in range(args.n_pops) ] msprime_simulate_args = { 'mutation_rate' : 7e-8, 'population_configurations' : pops } if args.n_pops > 1: # Create the (symmetric) migration matrix. # In msprime, each element `M[j,k]` of the migration matrix defines the fraction # of population `j` that consists of migrants from population `k` in each generation. # Additionally, there should be zeroes on the diagonal. pop_immigration_rate = 0.001 # Total per-population per-generation immigration rate. m = pop_immigration_rate / (args.n_pops - 1) migration_matrix = [ [ 0 if k == j else m for k in range(args.n_pops) ] for j in range(args.n_pops) ] msprime_simulate_args['migration_matrix'] = migration_matrix chromosomes = open(args.chromosomes).read().split() recomb_rate = 3e-8 # RADinito options muts_opts = radinitio.MutationModel() library_opts = radinitio.LibraryOptions( library_type = args.library_type, renz_1 = args.enz, renz_2 = args.enz2, coverage=args.coverage) pcr_opts = ri.PCRDups( pcr_c = args.pcr_cycles) # Call radinitio.simulate radinitio.simulate( out_dir = args.out_dir, genome_fa = args.genome, chromosomes = chromosomes, chrom_recomb_rates = recomb_rate, msprime_simulate_args = msprime_simulate_args, library_opts = library_opts, mutation_opts = muts_opts, pcr_opts = pcr_opts)