Show simple item record

dc.contributor.authorVafeiadis, Anastasios
dc.contributor.authorFanioudakis, Eleftherios
dc.contributor.authorPotamitis, Ilyas
dc.contributor.authorVotis, Konstantinos
dc.contributor.authorGiakoumis, Dimitrios
dc.contributor.authorTzovaras, Dimitrios
dc.contributor.authorChen, Liming
dc.contributor.authorHamzaoui, Raouf
dc.date.accessioned2019-07-03T14:12:56Z
dc.date.available2019-07-03T14:12:56Z
dc.date.issued2019-09
dc.identifier.citationVafeiadis, A., Fanioudakis, E., Potamitis, I., Votis, K., Giakoumis, D.,Tzovaras, D., Chen, L., Hamzaoui, R. (2019) Two-dimensional convolutional recurrent neural networks for speech activity detection. 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019), Graz, Austria, Sep. 2019.en
dc.identifier.urihttps://www.dora.dmu.ac.uk/handle/2086/18174
dc.description.abstractSpeech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions.en
dc.language.isoen_USen
dc.publisherInternational Speech Communication Associationen
dc.subjectSpeech activity detectionen
dc.subjectVoice activity detectionen
dc.subjectConvolutional recurrent neural networksen
dc.titleTwo-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detectionen
dc.typeConferenceen
dc.peerreviewedYesen
dc.funderEuropean Union (EU) Horizon 2020en
dc.projectidMarie Skłodowska-Curie grant agreement No. 676157, project ACROSSINGen
dc.cclicenceCC-BY-NCen
dc.date.acceptance2019-06-17
dc.researchinstituteCyber Technology Institute (CTI)en


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record