Deep Active Learning for De Novo Peptide Sequencing from Data-Independent-Acquisition Mass Spectrometry
EasyChair Preprint 8630
7 pages•Date: August 10, 2022Abstract
De novo peptide sequencing from mass spectrom-
etry data has been proved as one of the promising
methods for the accurate identification of novel
peptides. Recently, deep learning has been ap-
plied to de novo peptide sequencing using mass
spectrometry data. Although numerous mass spec-
trometery dataset is publicly available, annotat-
ing a large amount of data is too expensive and
time-consuming. Therefore, we need a solution
for acquiring ms/ms spectra with the high quality
rather than a large number of them. By integrat-
ing active learning with deep learning, we can
reduce the cost of annotation. In this work, we
mainly focused on one of the state-of-the-art mod-
els, DeepNovo-DIA, which uses convolutional
neural networks to MS/MS extract features and
long short-term memory to learn the language
models of peptides. Instead of selecting spectra
randomly to train the DeepNovo-DIA model, we
utilized an active learning algorithm to acquire
the most informative spectra. We used random
selection as the baseline and compared it with
two other acquisition strategies. The experiments
showed that by integrating active learning with de
novo sequencing, we achieve better performance
compared to DeepNovo-DIA model for small an-
notated spectra.
Keyphrases: Decoder/ Encoder, active learning, data-independent acquisition (DIA), de novo peptide sequencing