Genomic and Pangenomic Analysis of Kluyveromyces Marxianus SHY2, a Novel Strain with High Proteolytic Potential
Abstract
Kluyveromyces marxianus is an industrially important yeast renowned for its thermotolerance, high growth rate, and broad substrate utilization. In this study, whole-genome sequencing and assembly were performed for a novel K. marxianus strain SHY2 isolated from traditional fermented dairy products, followed by a pangenomic comparative analysis with 15 publicly available strains. The genome of SHY2 contains a large number of protease and protease-regulatory-related genes (2,589 genes according to the MEROPS database, accounting for 48.7% of all protein-coding genes); however, this proportion includes not only active proteases but also peptidases, protease inhibitors and non‑catalytic homologs, and therefore likely overestimates the genuine proteolytic capacity. The genome also harbors two secondary metabolite biosynthesis gene clusters. Pangenomic analysis revealed that K. marxianus possesses a nearclosed pangenome, with total gene families reaching approximately 5,237 and a stable core of approximately 1,807 families when 16 strains were analyzed. SHY2 ranks high in private gene family count, suggesting a genetically distinct background. Functional enrichment of SHY2‑specific private genes indicated over‑representation of transmembrane transport and carbohydrate metabolic processes. All functional interpretations based on gene annotation require experimental validation (e.g., enzyme activity assays) before any claim of high proteolytic potential can be substantiated. In conclusion, this study provides a genomic and pangenomic resource for K. marxianus SHY2, forming a foundation for future hypothesis‑driven research.