Deep machine learning for audio applications.

Keywords: machine learning, deep learning, neural network, audio signal, audio application, recognition, chord, music.

Abstract

The principles of application of deep learning for neural networks for the recognition of audio signals are disclosed. Apart from the area of sound presentation. It is emphasized that the study will be limited to audio signals. The principles of signal splitting into constituent elements and their removal from audio recording are described. A diagram of the formation of the distribution of an audio signal is given and a general approach to the problem of recognizing audio signals is described. It is conventionally divided into three separate stages: processing of audio recording and its transformation in the time-frequency domain, construction of a spectrogram and its transformation into audio format, followed by outputting a sequence of features in the form of vectors. The overlap ratio and the weighted average overlap ratio (overlap) have been determined. A number of values were formed based on the experiment, which showed that the characteristics / parameters of audio applications formed using a neural network with deep learning are affected by the data preparation method, adding layers and forming a spectrum of units improves the result due to the multiplied training time, the same also applies to periodic connections.

References

Alías, F., Socoró, J. C., & Sevillano, X. (2016). A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Applied Sciences, 6(5), 143, 1–44.

Xu, Y., Huang, Q., Wang, W., Foster, P., Sigtia, S., Jackson, P. J., & Plumbley, M. D. (2017). Unsupervised feature learning based on deep models for environmental audio tagging. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1230-1241.

Camastra, F., & Vinciarelli, A. (2015). Machine learning for audio, image and video analysis: theory and applications. Springer.

Sturm, B. L. (2012, October). A survey of evaluation in music genre recognition. In International Workshop on Adaptive Multimeia Retrieval (pp. 29-66). Springer, Cham.

Sturm, B. L. (2012, October). A survey of evaluation in music genre recognition. In International Workshop on Adaptive Multimedia Retrieval (pp. 29-66). Springer, Cham.

Ciresan, D. C., Meier, U., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2011, June). Flexible, high performance convolutional neural networks for image classification. In Twenty-second international joint conference on artificial intelligence.

Ciresan, D. C., Meier, U., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2011, June). Flexible, high performance convolutional neural networks for image classification. In Twenty-second international joint conference on artificial intelligence.

Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., ... & Ritter, M. (2017, March). Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 776-780). IEEE.

Xu, Y., Kong, Q., Huang, Q., Wang, W., & Plumbley, M. D. (2017, May). Convolutional gated recurrent neural network incorporating spatial features for audio tagging. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 3461-3466). IEEE.

Stastny, J., Skorpil, V., & Fejfar, J. (2013, July). Audio data classification by means of new algorithms. In 2013 36th International Conference on Telecommunications and Signal Processing (TSP) (pp. 507-511). IEEE.

Wichern, G., Yamada, M., Thornburg, H., Sugiyama, M., & Spanias, A. (2010, March). Automatic audio tagging using covariate shift adaptation. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 253-256). IEEE.

Zaccone, G., Karim, M. R., & Menshawy, A. (2017). Deep learning with TensorFlow. Packt Publishing Ltd.

Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S. Y., & Sainath, T. (2019). Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal Processing, 13(2), 206-219. https://doi.org/10.1109/JSTSP.2019.2908700

Music Genre Classification With

Abstract views: 0
PDF Downloads: 0
Published
2021-03-26
How to Cite
LohvinА. (2021). Deep machine learning for audio applications . COMPUTER-INTEGRATED TECHNOLOGIES: EDUCATION, SCIENCE, PRODUCTION, (42), 72-78. https://doi.org/10.36910/6775-2524-0560-2021-42-11