IRAWNET: A Method for Transcribing Indonesian Classical Music Notes Directly from Multichannel Raw Audio
Abstract
A challenging task when developing real-time Automatic Music Transcription (AMT) methods is directly leveraging inputs from multichannel raw audio without any handcrafted signal transformation and feature extraction steps. The crucial problems are that raw audio only contains an amplitude in each timestamp, and the signals of the left and right channels have different amplitude intensities and onset times. Thus, this study addressed these issues by proposing the IRawNet method with fused feature layers to merge different amplitude from multichannel raw audio. IRawNet aims to transcribe Indonesian classical music notes. It was validated with the Gamelan music dataset. The Synthetic Minority Oversampling Technique (SMOTE) overcame the class imbalance of the Gamelan music dataset. Under various experimental scenarios, the performance effects of oversampled data, hyperparameters tuning, and fused feature layers are analyzed. Furthermore, the performance of the proposed method was compared with Temporal Convolutional Network (TCN), Deep WaveNet, and the monochannel IRawNet. The results proved that proposed method almost achieves superior results in entire metric performances with 0.871 of accuracy, 0.988 of AUC, 0.927 of precision, 0.896 of recall, and 0.896 of F1 score.
Downloads
References
E. Benetos, S. Dixon, Z. Duan, and S. Ewert, Automatic music transcription: An overview, IEEE Signal Process. Mag., vol. 36, no. 1, pp. 20–30, 2018. DOI: https://doi.org/10.1109/MSP.2018.2869928
A. van den Oord et al., {WaveNet}: A Generative Model for Raw Audio, no. {arXiv}:1609.03499. 2016. Accessed: Jul. 15, 2022. [Online]. Available: http://arxiv.org/abs/1609.03499
S. Bai, J. Z. Kolter, and V. Koltun, An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, ArXiv, vol. abs/1803.0, 2018.
E. P. MatthewDavies and S. Böck, Temporal convolutional networks for musical audio beat tracking, in 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5, 2019. DOI: https://doi.org/10.23919/EUSIPCO.2019.8902578
L. S. Martak, M. Sajgalik, and W. Benesova, Polyphonic note transcription of time-domain audio signal with deep wavenet architecture, in 2018 25th International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 1–5, 2018. DOI: https://doi.org/10.1109/IWSSIP.2018.8439708
S. L. Oh et al., Classification of heart sound signals using a novel deep WaveNet model, Comput. Methods Programs Biomed., vol. 196, pp. 105604, 2020. DOI: https://doi.org/10.1016/j.cmpb.2020.105604
L. Chen, M. Yu, D. Su, and D. Yu, Multi-band pit and model integration for improved multi-channel speech separation, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 705–709, 2019. DOI: https://doi.org/10.1109/ICASSP.2019.8682470
S. Gul, M. S. Khan, and S. W. Shah, Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions, Appl. Acoust., vol. 179, pp. 108048, 2021. DOI: https://doi.org/10.1016/j.apacoust.2021.108048
W. Zhang, Y. Zhang, Y. She, and J. Shao, Stereo feature enhancement and temporal information extraction network for automatic music transcription, IEEE Signal Process. Lett., vol. 28, pp. 1500–1504, 2021. DOI: https://doi.org/10.1109/LSP.2021.3099073
S. Kristiawan, THE GAMELAN AND ITS IMPACT ON DEBUSSY’S PAGODES, Mahasara-Journal Interdiscip. Music Stud., vol. 1, no. 1, pp. 24–32, 2021.
J. Becker and A. H. Feinstein, Karawitan: Source Readings in Javanese Gamelan and Vocal Music, Volume 1. University of Michigan Press, 2020. DOI: https://doi.org/10.3998/mpub.17577
K. Tanaka, T. Nakatsuka, R. Nishikimi, K. Yoshii, and S. Morishima, Multi-Instrument Music Transcription Based on Deep Spherical Clustering of Spectrograms and Pitchgrams., in ISMIR, pp. 327–334, 2020.
A. Huaysrijan and S. Pongpinigpinyo, Deep Convolution Neural Network for Thai Classical Music Instruments Sound Recognition, in 2021 25th International Computer Science and Engineering Conference (ICSEC), pp. 283–288, 2021. DOI: https://doi.org/10.1109/ICSEC53205.2021.9684611
V. S. Pendyala, N. Yadav, C. Kulkarni, and L. Vadlamudi, Towards building a Deep Learning based Automated Indian Classical Music Tutor for the Masses, Syst. Soft Comput., vol. 4, pp. 200042, 2022. DOI: https://doi.org/10.1016/j.sasc.2022.200042
C. Hawthorne, I. Simon, R. Swavely, E. Manilow, and J. Engel, Sequence-to-sequence piano transcription with Transformers, arXiv Prepr. arXiv2107.09142, 2021.
B. Bahmei, E. Birmingham, and S. Arzanpour, CNN-RNN and Data Augmentation Using Deep Convolutional Generative Adversarial Network for Environmental Sound Classification, IEEE Signal Process. Lett., vol. 29, pp. 682–686, 2022. DOI: https://doi.org/10.1109/LSP.2022.3150258
H.-S. Choi, J. Lee, and K. Lee, Spec2Spec: Towards the general framework of music processing using generative adversarial networks, Acoust. Sci. Technol., vol. 41, no. 1, pp. 160–165, 2020. DOI: https://doi.org/10.1250/ast.41.160
A. Ycart and E. Benetos, Learning and Evaluation Methodologies for Polyphonic Music Sequence Prediction With LSTMs, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 1328–1341, 2020. DOI: https://doi.org/10.1109/TASLP.2020.2987130
Q. Wang, R. Zhou, and Y. Yan, Polyphonic piano transcription with a note-based music language model, Appl. Sci., vol. 8, no. 3, pp. 470, 2018. DOI: https://doi.org/10.3390/app8030470
A. K. Sharma et al., Classification of Indian classical music with time-series matching deep learning approach, IEEE Access, vol. 9, pp. 102041–102052, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3093911
A. Sadekar and S. P. Mahajan, Polyphonic Piano Music Transcription using Long Short-Term Memory, in 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–7, 2019. DOI: https://doi.org/10.1109/ICCCNT45670.2019.8944400
S. Shahriar and U. Tariq, Classifying maqams of Qur’anic recitations using deep learning, IEEE Access, vol. 9, pp. 117271–117281, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3098415
M. A. Román, A. Pertusa, and J. Calvo-Zaragoza, Data representations for audio-to-score monophonic music transcription, Expert Syst. Appl., vol. 162, pp. 113769, 2020. DOI: https://doi.org/10.1016/j.eswa.2020.113769
D. Nurdiyah, Y. K. Suprapto, and E. M. Yuniarno, Gamelan Orchestra Transcription Using Neural Network, in 2020 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), pp. 371–376, Nov. 2020. DOI: https://doi.org/10.1109/CENIM51130.2020.9297988
D. Nurdiyah et al, September 12, 2023, Gamelan Music Dataset, Zenodo repository, doi: 10.5281/zenodo.8333916
A. Arafa, N. El-Fishawy, M. Badawy, and M. Radad, RN-SMOTE: Reduced noise smote based on DBSCAN for enhancing imbalanced data classification, J. King Saud Univ. Inf. Sci., vol. 34, no. 8, pp. 5059–5074, 2022. DOI: https://doi.org/10.1016/j.jksuci.2022.06.005
H. Gao, H. Yuan, Z. Wang, and S. Ji, Pixel Transposed Convolutional Networks, IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 5, pp. 1218–1227, 2020.
L. Cheng, R. Khalitov, T. Yu, J. Zhang, and Z. Yang, Classification of long sequential data using circular dilated convolutional neural networks, Neurocomputing, vol. 518, pp. 50–59, 2023. DOI: https://doi.org/10.1016/j.neucom.2022.10.054
M. Segu, A. Tonioni, and F. Tombari, Batch normalization embeddings for deep domain generalization, Pattern Recognit., vol. 135, pp. 109115, 2023. DOI: https://doi.org/10.1016/j.patcog.2022.109115
Y.-N. Hung and A. Lerch, Multitask learning for instrument activation aware music source separation, arXiv Prepr. arXiv2008.00616, 2020.
A. K. Sharma et al., Polyphonic note transcription of time-domain audio signal with deep wavenet architecture, IEEE Access, vol. 28, no. 1, pp. 1–5, 2020.
Copyright (c) 2023 EMITTER International Journal of Engineering Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.
Funding data
-
Lembaga Pengelola Dana Pendidikan
Grant numbers 201908210115089