Download PDFOpen PDF in browserRobust Audio Fingerprinting Techniques Using Deep Learning for Noise-Robust and Distortion-Invariant MatchingEasyChair Preprint 1466710 pages•Date: September 3, 2024AbstractAudio fingerprinting is a critical technology for identifying and matching audio content in various applications, including copyright protection, content-based retrieval, and real-time media monitoring. However, traditional audio fingerprinting techniques often struggle with robustness in noisy environments or when the audio signal is subject to distortions such as compression, reverberation, or pitch shifts. This abstract presents an exploration of deep learning-based approaches to develop robust audio fingerprinting techniques that maintain high accuracy and reliability under challenging conditions.
The research begins by identifying the limitations of conventional audio fingerprinting methods, particularly their sensitivity to noise and distortions. It underscores the need for more sophisticated techniques that can adapt to a wide range of audio modifications while preserving the unique characteristics of the original signal for accurate identification.
The study proposes the use of deep learning models, specifically convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to extract robust and distinctive features from audio signals. These models are designed to capture both local and global patterns in the audio waveform, enabling the generation of fingerprints that are invariant to common distortions. Additionally, the research explores the application of data augmentation techniques during training, such as adding synthetic noise or applying various transformations to the audio signals, to improve the model's robustness. Keyphrases: Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), audio fingerprinting, audio signal processing, content-based retrieval, data augmentation, deep learning, distortion invariance, noise robustness, real-time audio identification
|