Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets

Alexis M. Hill*, James R. Rybarski*, Kuang Hu, Ilya J. Finkelstein, and Claus O. Wilke(* co-first authors) , Journal of Open Source Software 6 (66) (2021).
Full text


Gene clusters are sets of co-localized, often contiguous genes that together perform specific functions, many of which are relevant to biotechnology. There is a need for software tools that can extract candidate gene clusters from vast amounts of available genomic data. Therefore, we developed Opfi: a modular pipeline for identification of arbitrary gene clusters in assembled genomic or metagenomic sequences. Opfi contains functions for annotation, de-deduplication, and visualization of putative gene clusters. It utilizes a customizable rule-based filtering approach for selection of candidate systems that adhere to user-defined criteria. Opfi is implemented in Python, and is available on the Python Package Index and on Bioconda (GrĂ¼ning et al., 2018).