Comparison of Different Neural Network Architectures for Spoken Language Identification

EasyChair Preprint 10680, version 3

Versions: 1 23→history

5 pages•Date: September 11, 2023

Tala Bazazo, Mohammad Zeineldeen, Christian Plahl, Ralf Schlüter and Hermann Ney

Abstract

This paper compares different neural network based archi- tectures on the spoken language identification task. To our best knowledge such a comparison of different models on the same dataset and the same set of languages does not yet exist. We incorporate 7 different models which include the latest architectures: a spectral images based Resnet model, a Convolutional Neural Network, a Bi-directional Long Short-Term Memory, a Convolutional Recurrent Neural Net- work, Wav2Vec 2.0, a transformer and a conformer. We also tackle audio with background noise and music by train- ing on data with similar accoustics. We finally also show that our models generalize well on third-party data.

Keyphrases: Conformer, Language Identification, Wav2vec 2.0, neural networks, transformer

Links:

https://easychair.org/publications/preprint/sfXq

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:10680,
  author    = {Tala Bazazo and Mohammad Zeineldeen and Christian Plahl and Ralf Schlüter and Hermann Ney},
  title     = {Comparison of Different Neural Network Architectures for Spoken Language Identification},
  howpublished = {EasyChair Preprint 10680},
  year      = {EasyChair, 2023}}

Download PDF Open PDF in browser