CatchenLab: Chromonomer Manual

Introduction [⇑top]

Chromonomer is a program designed to integrate a genome assembly with a genetic map. Chromonomer tries very hard to identify and remove markers that are out of order in the genetic map, when considered against their local assembly order; and to identify scaffolds that have been incorrectly assembled according to the genetic map, and split those scaffolds.

Installation [⇑top]

Build the software

Chromonomer uses the standard autotools install:

% tar xfvz chromonomer_x.xx.tar.gz % cd chromonomer_x.xx % ./configure % make (become root) # make install

You can change the root of the install location (/usr/local/ on most operating systems) by specifying the --prefix command line option to the configure script.

% ./configure --prefix=/home/smith/local

A successful build will place the chromonomer program in /usr/local/bin and create a directory (by default) in /usr/local/share/chromonomer to hold the web interface.

Enable the Chromonomer web interface in the Apache webserver.

Add the following lines to your Apache configuration to make the Chromonomer PHP files visible to the web server and to provide a easily readable URL to access them:

<Directory "/usr/local/share/chromonomer/php"> Order deny,allow Deny from all Allow from all Require all granted </Directory> Alias /chromonomer "/usr/local/share/chromonomer/php"

A sensible way to do this is to create the file chromonomer.conf with the above lines.

If you are using Apache 2.4 or later:

Place the chromonomer.conf file in

/etc/apache2/conf-available

directory. Then restart Apache. Like so:

# vi /etc/apache2/conf-available/chromonomer.conf # ln -s /etc/apache2/conf-available/chromonomer.conf /etc/apache2/conf-enabled/chromonomer.conf # apachectl restart

Executing Chromonomer [⇑top]

Align markers against the assembled genome

First you must align your markers against the newly built reference genome. Be very careful about allowing too much promiscuity in your alignments as this will lead to spurious marker alignments that will cause more scaffolds to be split. Restrict the usage of gaps in the alignment to the best extent possible.

To do the alignments, one method is to create a FASTA file containing the sequence of each marker. The ID of each marker must match the ID of the markers that you provide to describe your genetic map. If you used Stacks to generate your markers, it is easy to export the consensus sequence for each marker from the Stacks Catalog using a few UNIX commands. Here is an example of a FASTA file containing markers, which can be fed to an alignmnet program (such as BWA or GSnap):

>10311 TGCAGGTCATCAAACCTGCCTCCACACTGGTGAGCTCAAGAAATTCCCACAAATGTTGTTGTCCCCAAAAAACTTCTTTTTTTTGTTTGGGAGTT >1825 TGCAGGCGACTCACGCGGTCCTCACGGGCACCCTGGTGCCCGCGGGCATCGTGTTGGTGACCCCTGCTGTAGAAGGTACCTAAATGCACCACAGC >19504 TGCAGGAAGTTCAGCGAGCGCGTTCAGCGAGCGCGTTCAGCGAGCGCGTATAGCGAGGTGTGACTCCAACGACGATATTAATGAGCTTTGGGATA >11977 TGCAGGACGGAGGTGCTCATACAGACATGTATCGACTTCAAATCAGTACTCGTTTTTTTAATGCGCGGAAAAGCAAGTTGCGCGACATTTTACGC >16603 TGCAGGATGTGTTTGGAGCACATTGTGAGATTCAAACTTTCAAAACAAAGAAACTAGCGTCTCCCACTACATGTACCTTTATGTACTCTATCCAG >10785 TGCAGGAAATTGAGAAAGAGAACAAGAACTCCCAGCATAAACCAGGTGAGAATTGTCATCCTTGGGAATGTTCACGAGATTTACACAATCTCTGC ...

The result of the alignment should be a SAM or BAM file, which will contain the marker IDs and the alignment positions of each marker.

Describe the genetic map

You will need a tab separated list describing your genetic map -- the markers in the map along with their linkage group and centiMorgan position. Chromonomer was designed to handle genetic maps built from RAD data using Stacks ( http://catchenlab.life.illinois.edu/stacks) although any genetic map where the markers can be aligned to the genome should work. The file should be formated in the following way:

Linkage Group<tab>Locus (Marker ID)<tab>cM Position

Here is an example:

lg1 10311 0 lg1 1825 0 lg1 19504 0.958 lg1 11977 0.97 lg1 16603 0.97 lg1 10785 0.97 lg1 18192 0.97 lg1 16462 0.981 lg1 17763 0.981 ... lg1 585 64.061 lg1 11929 64.332 lg2 6504 0 lg2 2294 0 lg2 13138 0.785 lg2 11849 1.887 lg2 18900 1.887 ...

Prepare to run Chromonomer

Create a directory to hold the Chromonomer output. For example, 20150603. Create a directory with the same name under the web interface:

% mkdir /usr/local/share/chromonomer/php/20150603

Execute Chromonomer

Run Chromonomer, specifying the proper input and output paths and providing the directory you created using the --data_version flag. For example:

% chromonomer -p ~/research/20150603_linkage_map.tsv \ -o ~/research/20150603/ -s ~/research/markers.sam \ -a ~/research/final.assembly.agp --data_version 20150603

For Chromonomer to work, it must be able to sync the markers in the genetic map with the markers in the SAM/BAM alignments, and it must be able to sync the scaffold names in the AGP file with the scaffold names in the SAM/BAM alignment files. Depending on where you got your AGP file and your genetic map description, you may need to edit these files to make sure all the IDs match (e.g. sometimes the scaffold IDs in the AGP file have additional verbiage not present in the FASTA file that contained your genome assembly).

Once Chromonomer is complete, you can copy the files from the output directory to the web interface:

% sudo cp ~/research/20150603/* /usr/local/share/chromonomer/php/20150603/.

You should be ready to view the output in the web interace.

As you make corrections to your map, you will re-execute chromonomer. It is easy to create another directory under the web interface and store the results there. This way you can keep versions of the assembly around for comparison as you make imrpovements.

Chromonomer output files

Chromonomer creates a voluminous output, trying to document how and why the genome was (or was not) assembled. A number of files are precomuted for the web interface, but almost all of the same inforamtion is available in the text output.

chromonomer_summary.log	Records the date/time Chromonomer was run along with the exact command. The file contains a summary of the run, including number of scaffold splits and the total length of each chromosome.
problem_scaffolds.tsv	Contains a list of all scaffolds where a split occurred, where a split was attempted but failed, or where markers were pruned. The file is printed in a sorted order to help prioritize which scaffolds were the most problematic.
split_scaffold_map.tsv	Contains a list of all scaffolds that were split and the coordinates upon which they were split.
XXX_genome.agp	An AGP file that contains a definition of the final, chromonomed assembly. Each linkage group, with its complement of scaffolds and gaps is defined. Scaffolds are defined in terms of their original names, with basepair boundaries of each scaffold modified to represent any scaffold splits.
XXX_unplaced_scaffolds.agp	Scaffolds that could not be integrated into the genetic map are placed in this file, unmodified from their original input.
XXX_scaffolds.agp	An AGP file that remains defined at the scaffold level (no chromosomes are present) with the scaffold splits defined. The scaffolds generated by splitting are listed here, with their constiuent contig/gap components along side the other scaffold components that were not split.
scaffold_XXX.log	A record of what happened to each scaffold that was in any way modified (with markers pruned or split). See the next section for details as to how interpret these files.
LGX_before.php LGX_after.php LGX_before.json LGX_after.json	These files are used by the web interface to report what a linkage group and set of scaffolds looked like before Chromonomer integrated them and after the integration is complete. The *.json files feed the web visualizations of each linkage group.

Chromonomer web interface

The web interface places all of the information from a Chromonomer integration into a single place and links pieces of the integration together, allowing for easy access. You can easily place successive runs of the software together and the web interface will allow you to compare them:

For a particular integration, the web interace will give you access to all the linkage group information along with the summary statistics from the integration (these are also available in the chromonomer_summary.log file):

The web interace will also give you fast access to all the problem scaffolds, in a usefully sorted order (these are also available in the problem_scaffolds.tsv file):

Finally, for each linkage group, the web interface will draw a before and after visualization, so you can see how the scaffolds go together, where the markers fall, and providing links to the scaffold logs for each modified scaffold:

How do I interpret a scaffold report? [⇑top]

Chromonomer tries to document all of the sources of data available to it while it is integrating scaffolds with the genetic map. For each scaffold, a log file is kept to record any conflicting markers found, or any splits that were made to the scaffold. These scaffold logs can be used to identify markers with a large amount of genotyping errors or to identify scaffolds that were likely assemnbled incorrectly.

Splitting a scaffold between linkage groups

Once data is read into memory, the first thing Chromonomer will do is to check that each scaffold is anchored to only a single linkage group. Since it would require a large amount of error for markers to be assigned to the wrong linkage group by the linkage mapping program, Chromonomer assumes that the map assignment is correct and will split any scaffold assigend to two or more linkage groups.

Log JH556806.1

Scaffold JH556806.1 is mapped to multiple linkage groups: LG1, LG5 001 23951 LG5 25.122cM <=> 56806bp => segment 1 [75M] 002 20261 LG5 25.122cM <=> 67933bp => segment 1 [75M] 003 20245 LG5 25.122cM <=> 160155bp => segment 1 [75M] 004 13763 LG5 25.122cM <=> 169232bp => segment 1 [75M] 005 30731 LG5 25.122cM <=> 174168bp => segment 1 [75M] 006 16263 LG5 25.122cM <=> 174239bp => segment 1 [75M] 007 4380 LG5 25.122cM <=> 253524bp => segment 1 [75M] 008 32387 LG1 1.547cM <=> 370115bp => segment 2 [3S72M] 009 42748 LG1 1.547cM <=> 621143bp => segment 2 [75M] 010 24855 LG1 1.547cM <=> 634734bp => segment 2 [75M] 011 39286 LG1 1.547cM <=> 650355bp => segment 2 [75M] 012 33770 LG1 1.547cM <=> 715930bp => segment 2 [75M] 013 49251 LG1 1.547cM <=> 771348bp => segment 2 [75M] 014 29824 LG1 1.547cM <=> 886979bp => segment 2 [75M] 015 47744 LG1 1.161cM <=> 981099bp => segment 2 [75M] 016 48147 LG1 1.161cM <=> 981170bp => segment 2 [75M] 017 18493 LG1 1.161cM <=> 1005349bp => segment 2 [75M] 018 12653 LG1 1.161cM <=> 1005420bp => segment 2 [75M] 019 51316 LG1 1.161cM <=> 1209947bp => segment 2 [75M] 020 43261 LG1 1.161cM <=> 1219754bp => segment 2 [75M] 021 49433 LG1 1.161cM <=> 1251527bp => segment 2 [75M]

Segment 1: 56,806bp - 253,524bp Segment 2: 370,115bp - 1,251,527bp

Splitting scaffold JH556806.1, 147 components Scaffold JH556806.1, 147 components. 000 Contig start: 1, end: 17910, size: 17910 001 Gap start: 17911, end: 18595, size: 685 Segment 1 begin 002 Contig start: 18596, end: 63432, size: 44837 003 Gap start: 63433, end: 64185, size: 753 004 Contig start: 64186, end: 99886, size: 35701 005 ................. 022 Contig start: 236582, end: 237119, size: 538 023 Gap start: 237120, end: 238119, size: 1000 024 Contig start: 238120, end: 271850, size: 33731 025 Gap start: 271851, end: 272840, size: 990 Segment 1 end 026 Contig start: 272841, end: 281639, size: 8799 027 Gap start: 281640, end: 281659, size: 20 028 Contig start: 281660, end: 291126, size: 9467 029 ....... 036 Contig start: 320151, end: 346605, size: 26455 037 Gap start: 346606, end: 346625, size: 20 038 Contig start: 346626, end: 348159, size: 1534 039 Gap start: 348160, end: 349159, size: 1000 Segment 2 begin 040 Contig start: 349160, end: 373803, size: 24644 041 Gap start: 373804, end: 373991, size: 188 042 Contig start: 373992, end: 393167, size: 19176 043 ............................................................ ......................... 128 Contig start: 1187265, end: 1240360, size: 53096 129 Gap start: 1240361, end: 1242478, size: 2118 130 Contig start: 1242479, end: 1253597, size: 11119 131 Gap start: 1253598, end: 1253707, size: 110 Segment 2 end 132 Contig start: 1253708, end: 1257732, size: 4025 133 Gap start: 1257733, end: 1257753, size: 21 134 Contig start: 1257754, end: 1261818, size: 4065 135 ............

Interpolating between markers: Segment 1/end (025) -> Segment 2/begin (039) Maximum gap at position 29 (14745bp) Scaffold JH556806.1, 147 components. 000 Contig start: 1, end: 17910, size: 17910 001 Gap start: 17911, end: 18595, size: 685 Segment 1 begin 002 Contig start: 18596, end: 63432, size: 44837 003 Gap start: 63433, end: 64185, size: 753 004 Contig start: 64186, end: 99886, size: 35701 005 ..................... 026 Contig start: 272841, end: 281639, size: 8799 027 Gap start: 281640, end: 281659, size: 20 028 Contig start: 281660, end: 291126, size: 9467 029 Gap start: 291127, end: 305871, size: 14745 Segment 1 end Segment 2 begin 030 Contig start: 305872, end: 310813, size: 4942 031 Gap start: 310814, end: 310833, size: 20 032 Contig start: 310834, end: 311767, size: 934 033 ............................................................ ................................... 128 Contig start: 1187265, end: 1240360, size: 53096 129 Gap start: 1240361, end: 1242478, size: 2118 130 Contig start: 1242479, end: 1253597, size: 11119 131 Gap start: 1253598, end: 1253707, size: 110 Segment 2 end 132 Contig start: 1253708, end: 1257732, size: 4025 133 Gap start: 1257733, end: 1257753, size: 21 134 Contig start: 1257754, end: 1261818, size: 4065 135 ............

Creating scaffold 'XMA00098.0' with 29 elements. Creating scaffold 'XMA00099.0' with 117 elements. Scaffold 'JH556806.1' successfully split into scaffolds XMA00098.0, XMA00099.0

Splitting a scaffold within a linkage group

Chromonomer will check that each scaffold is mapped to a single, continuous part of the linkage group. If the scaffold spans more than one node in the linkage group, then its orientation can be determined.

If a scaffold exists on two subsets of map nodes that are not continuous, then there are other scaffolds in the assembly that fall in the between the current scaffold. This can be caused either because of an assembly error, e.g. the scaffolding algorithm made an incorrect join, of because there is another scaffold that occurs inside a gap of the existing scaffold.

Log scaffold_59

Orientation of scaffold_59 is reverse Unmodified marker ordering: 000 15109 LG2 0.000cM <=> 893973bp scaffold_59 [95M] 001 18920 LG2 0.000cM <=> 634663bp scaffold_59 [95M] 002 10869 LG2 0.000cM <=> 634572bp scaffold_59 [95M] 003 10051 LG2 0.000cM <=> 516764bp scaffold_59 [95M] 004 7897 LG2 0.000cM <=> 494390bp scaffold_59 [95M] 005 1428 LG2 0.000cM <=> 384467bp scaffold_59 [95M] 006 10164 LG2 0.000cM <=> 235696bp scaffold_59 [95M] 007 7841 LG2 0.000cM <=> 8623bp scaffold_59 [95M] 008 12169 LG2 0.935cM <=> 237068bp scaffold_59 [95M] 009 13641 LG2 1.867cM <=> 246016bp scaffold_59 [95M] 010 4933 LG2 1.953cM <=> 179609bp scaffold_59 [95M] 011 19046 LG2 2.186cM <=> 15612bp scaffold_59 [95M] 012 17376 LG2 2.332cM <=> 894064bp scaffold_59 [95M] 013 10980 LG2 2.800cM <=> 556014bp scaffold_59 [95M] 014 10702 LG2 2.800cM <=> 277231bp scaffold_59 [95M] 015 12013 LG2 2.800cM <=> 236977bp scaffold_59 [95M] 016 7347 LG2 3.043cM <=> 235787bp scaffold_59 [95M] Removing out of order markers: 007: removing marker 7841 [ 8623bp] 008: removing marker 12169 [237068bp] 009: removing marker 13641 [246016bp] 012: removing marker 17376 [894064bp] 013: removing marker 10980 [556014bp] 014: removing marker 10702 [277231bp] 015: removing marker 12013 [236977bp] 016: removing marker 7347 [235787bp] Scaffold 'scaffold_59': removed 8 markers. Final marker ordering: 000 15109 LG2 0.000cM <=> 893973bp scaffold_59 [95M] 001 18920 LG2 0.000cM <=> 634663bp scaffold_59 [95M] 002 10869 LG2 0.000cM <=> 634572bp scaffold_59 [95M] 003 10051 LG2 0.000cM <=> 516764bp scaffold_59 [95M] 004 7897 LG2 0.000cM <=> 494390bp scaffold_59 [95M] 005 1428 LG2 0.000cM <=> 384467bp scaffold_59 [95M] 006 10164 LG2 0.000cM <=> 235696bp scaffold_59 [95M] 007 4933 LG2 1.953cM <=> 179609bp scaffold_59 [95M] 008 19046 LG2 2.186cM <=> 15612bp scaffold_59 [95M]

Linkage group LG2; Scaffold 'scaffold_59'; Orientation: Reverse

Scaffold scaffold_59 is mapped to multiple linkage nodes: 0.000, 1.953|2.186 001 15109 LG2 0.000cM <=> 893973bp => segment 1 [95M] 002 18920 LG2 0.000cM <=> 634663bp => segment 1 [95M] 003 10869 LG2 0.000cM <=> 634572bp => segment 1 [95M] 004 10051 LG2 0.000cM <=> 516764bp => segment 1 [95M] 005 7897 LG2 0.000cM <=> 494390bp => segment 1 [95M] 006 1428 LG2 0.000cM <=> 384467bp => segment 1 [95M] 007 10164 LG2 0.000cM <=> 235696bp => segment 1 [95M] 001 4933 LG2 1.953cM <=> 179609bp => segment 2 [95M] 002 19046 LG2 2.186cM <=> 15612bp => segment 2 [95M] Scaffold 'scaffold_59' will be split into 2 pieces. Segment 1: 893,973bp - 235,696bp Segment 2: 179,609bp - 15,612bp

Splitting scaffold scaffold_59, 53 components Scaffold scaffold_59, 53 components. 000 Contig start: 1, end: 22004, size: 22004 001 Gap start: 22005, end: 22113, size: 109 Segment 2 begin 002 Contig start: 22114, end: 42905, size: 20792 003 Gap start: 42906, end: 42925, size: 20 004 Contig start: 42926, end: 56364, size: 13439 005 ... 008 Contig start: 62253, end: 171504, size: 109252 009 Gap start: 171505, end: 171852, size: 348 010 Contig start: 171853, end: 172670, size: 818 011 Gap start: 172671, end: 173212, size: 542 Segment 2 end 012 Contig start: 173213, end: 241408, size: 68196 013 Gap start: 241409, end: 241614, size: 206 Segment 1 begin 014 Contig start: 241615, end: 281985, size: 40371 015 Gap start: 281986, end: 283184, size: 1199 016 Contig start: 283185, end: 304784, size: 21600 017 ............................... 048 Contig start: 765545, end: 810015, size: 44471 049 Gap start: 810016, end: 811790, size: 1775 050 Contig start: 811791, end: 889088, size: 77298 051 Gap start: 889089, end: 889128, size: 40 Segment 1 end 052 Contig start: 889129, end: 936331, size: 47203

Interpolating between markers: Segment 2/end (011) -> Segment 1/begin (013) Maximum gap at position 11 (542bp) Scaffold scaffold_59, 53 components. 000 Contig start: 1, end: 22004, size: 22004 001 Gap start: 22005, end: 22113, size: 109 Segment 2 begin 002 Contig start: 22114, end: 42905, size: 20792 003 Gap start: 42906, end: 42925, size: 20 004 Contig start: 42926, end: 56364, size: 13439 005 ... 008 Contig start: 62253, end: 171504, size: 109252 009 Gap start: 171505, end: 171852, size: 348 010 Contig start: 171853, end: 172670, size: 818 011 Gap start: 172671, end: 173212, size: 542 Segment 2 end Segment 1 begin 012 Contig start: 173213, end: 241408, size: 68196 013 Gap start: 241409, end: 241614, size: 206 014 Contig start: 241615, end: 281985, size: 40371 015 ................................. 048 Contig start: 765545, end: 810015, size: 44471 049 Gap start: 810016, end: 811790, size: 1775 050 Contig start: 811791, end: 889088, size: 77298 051 Gap start: 889089, end: 889128, size: 40 Segment 1 end 052 Contig start: 889129, end: 936331, size: 47203

Creating scaffold 'SSC00017.0' with 11 elements. Creating scaffold 'SSC00018.0' with 41 elements. Scaffold 'scaffold_59' successfully split into scaffolds SSC00017.0, SSC00018.0