Download PDFOpen PDF in browserImplementation of Fine-Tuned BERT for Enzyme Classification Based on Gene OntologyEasyChair Preprint 147236 pages•Date: September 5, 2024AbstractEnzymes are biocatalysts with vital roles in biological functions and many industrial applications. Diverse enzymes are classified using Enzyme Commission (EC) nomenclature, making differentiation challenging. On the other hand, another biological information, gene ontology (GO), can describe the biological aspects of enzymes, covering related biological processes (BP), molecular functions (MF), and their locations within cells (CC). This study proposes a novel EC class and subclass classification of enzymes within the ontology subclass based on their GO semantics using a Bidirectional Encoder Representation of Transformer (BERT). The BERT model is first fine-tuned using the preprocessed GO term name and definition, with the enzymes in each ontology class (BP, MF, or CC) are also divided based on how the GO assigned, either through manual annotation (NONIEA) or electronically inferred (IEA). BERT successfully obtained 0.93, 0.60, 0.99, 0.90, 0.40, and 0.35 F1 scores during fine-tuning for BP IEA, BP NONIEA, MF IEA, MF NONIEA, CC IEA, and CC NONIEA, respectively. On the test set, the fine-tuned BERT significantly outperformed GOntoSim in EC class classification across all metrics with less inference time in all ontology subclass. Expanded further to the EC subclass, BERT can classify the enzyme on the EC subclass level in BP IEA and MF IEA ontology subclass. However, longer epochs are needed in fine-tuning. This result shows that the names and definitions of GO terms are distinguishable features in classifying enzymes as an alternative to the information content approach. Keyphrases: BERT, GOntoSim, Gene Ontology, enzyme classification, fine-tuning
|