Utilizing Domain Knowledge in End-to-End Audio Processing

Tycho Tax, Jose Luis Diez Antich, Hendrik Purwins, Lars Maaløe

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

Resumé

End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.
OriginalsprogEngelsk
TitelWorkshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17)
Publikationsdatodec. 2017
StatusUdgivet - dec. 2017
BegivenhedConference and Workshop on Neural Information Processing Systems (NIPS): Machine Learning for Audio Signal Processing - Long Beach Convention & Entertainment Center, Long Beach, USA
Varighed: 8 dec. 20178 dec. 2017
https://nips.cc/Conferences/2017/Schedule?showEvent=8790

Konference

KonferenceConference and Workshop on Neural Information Processing Systems (NIPS)
LokationLong Beach Convention & Entertainment Center
LandUSA
ByLong Beach
Periode08/12/201708/12/2017
Internetadresse

Fingeraftryk

Neural networks
Processing
Classifiers
Acoustic waves

Emneord

    Citer dette

    Tax, T., Antich, J. L. D., Purwins, H., & Maaløe, L. (2017). Utilizing Domain Knowledge in End-to-End Audio Processing. I Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17)
    Tax, Tycho ; Antich, Jose Luis Diez ; Purwins, Hendrik ; Maaløe, Lars. / Utilizing Domain Knowledge in End-to-End Audio Processing. Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17). 2017.
    @inproceedings{2bc469c134694c24928b7b502701b7f4,
    title = "Utilizing Domain Knowledge in End-to-End Audio Processing",
    abstract = "End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.",
    keywords = "Deep Learning, convolutional neural networks, audio signal processing, end-to-end learning",
    author = "Tycho Tax and Antich, {Jose Luis Diez} and Hendrik Purwins and Lars Maal{\o}e",
    year = "2017",
    month = "12",
    language = "English",
    booktitle = "Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17)",

    }

    Tax, T, Antich, JLD, Purwins, H & Maaløe, L 2017, Utilizing Domain Knowledge in End-to-End Audio Processing. i Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17)., Long Beach, USA, 08/12/2017.

    Utilizing Domain Knowledge in End-to-End Audio Processing. / Tax, Tycho ; Antich, Jose Luis Diez; Purwins, Hendrik; Maaløe, Lars.

    Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17). 2017.

    Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

    TY - GEN

    T1 - Utilizing Domain Knowledge in End-to-End Audio Processing

    AU - Tax, Tycho

    AU - Antich, Jose Luis Diez

    AU - Purwins, Hendrik

    AU - Maaløe, Lars

    PY - 2017/12

    Y1 - 2017/12

    N2 - End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.

    AB - End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.

    KW - Deep Learning

    KW - convolutional neural networks

    KW - audio signal processing

    KW - end-to-end learning

    M3 - Article in proceeding

    BT - Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17)

    ER -

    Tax T, Antich JLD, Purwins H, Maaløe L. Utilizing Domain Knowledge in End-to-End Audio Processing. I Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17). 2017