Initially developed to verify the Antifreeze glycoproteins in the genomes of two icefishes (see Rivera-Colón, 2023), klumpy now has expanded to include a suite of tools for assessing whether a particular region in a genome assembly is missassembled by leveraging the original data (i.e., raw reads) used in the assembly process, or it can instead be used to search for a gene of interest in a set of sequences.

Download Klumpy
version 1.1.0

Recent Changes [updated February 7, 2025]

Klumpy Manual

What do I need to run Klumpy?

Depending on which component of klumpy the user wants to run, klumpy will need

  1. A pair of FAST[A][Q] files
    • If the goal is to locate genes or other genetic elements of interest, two fast[a][q] files are needed, with one file serving as as the query (sequences to search for), and the other file serving as the subject (sequences used in search, e.g., a reference genome).
  2. A SAM or BAM file of the initial raw reads to a reference genome
  3. (Optionally) A GFF or GTF file describing the annotation of genes in the assembled genome

What does Klumpy do?

klumpy can either scan a reference genome and detect regions that may have been missassembled by examining how the raw reads align to the assembly or it can detect genes of interest in the form of klumps. In both cases, klumpy can generate images for better and easier interpretation.

  1. Locate sequences of interest.

    • Using the provided query sequences, klumpy will generate a map of the k-merized query sequences onto the subject sequences, which can then be readily used to detect regions where the query sequence(s) are located.

  2. Illustrate sequences of interest.

    • After klumps have been generated, an image illustrating their locations can be generated in a Klump Plot

  3. Locate missassembled regions in a genome assembly.

    • Given a set of alignments onto a genome assembly, klumpy will use a sliding window approach to locate regions where the raw reads are unable to properly tile across the region. If provided GFF/GTF file, only regions containing genes will be examined.

  4. Illustrate sequence alignments.

    • Provided a SAM or BAM file and a target region, a set of alignments can be filtered and viewed for examination. Additional features such as klumps, exons, and assembly gaps can be included for a more thorough image.

Test Cases

For a practical guide, users can view the workflow we used in our publication for all four test cases here.

Implemntation

klumpy is implemented in Python3 and is released under the GNU GPL license.

Citing Klumpy

If you use klumpy on your work, please cite our Mol Ecol Resour manuscript:

Madrigal G, Minhas BF, Catchen JM. (2025) Klumpy: A tool to evaluate the integrity of long-read genome assemblies and illusive sequence motifs Mol Ecol Resour 25: e13982. DOI: 10.1111/1755-0998.13982

Authors

klumpy was developed by Gio Madrigal <gm33@illinois.edu>, with contributions from Bushra Fazal Minhas <bfazal2@illinois.edu> and Julian Catchen <jcatchen@illinois.edu>