A Combination of Lexicon-based and Distributional Representations for Classification of Indonesian Vaccine Acceptance Rates
When the COVID-19 pandemic hit, the use of vaccines was advertised as the end of the pandemic by the entire world. However, the chances of vaccination depended on the sentiments of society and individuals about the vaccine. People's acceptance of vaccines can change depending on conditions and events. Social media platforms such as Twitter can be used as a source of information to find out the conditions and attitudes of the community toward the program. By implementing a machine learning technique on the COVID-19 vaccine dataset, we hope to impact the classification result with text. This study suggests three distinct machine learning models for classifying texts of the COVID-19 vaccination, namely a model based on the first lexicon using the feature extraction method; second, using the word insertion technique to utilize distribution representation; and third, a combination model of distribution representation and feature extraction based on the lexicon. From the evaluation that has been carried out, we found that a combination of lexicon-based and distributional representation methods succeeded in giving the best results for classifying the level of acceptance of the COVID-19 vaccine in Indonesia with an accuracy score of 71.44% and an F1-score of 71.43%.
A. Hussain and A. Sheikh, Opportunities for Artificial Intelligence-Enabled Social Media Analysis of Public Attitudes Toward Covid-19 Vaccines, NEJM Catal Innov Care Deliv, pp. 1–7, 2021, doi: 10.1056/CAT.20.0649.
R. M. Merchant et al., Evaluating the predictability of medical conditions from social media posts, PLoS One, vol. 14, no. 6, pp. 1–12, 2019, doi: 10.1371/journal.pone.0215476.
L. Samaras, E. García-Barriocanal, and M. A. Sicilia, Comparing Social Media and Google to Detect and predict severe epidemics, Sci Rep, vol. 10, no. 1, pp. 1–11, 2020, doi: 10.1038/s41598-020-61686-9.
C. H. Chang, M. Monselise, and C. C. Yang, What Are People Concerned About During the Pandemic? Detecting Evolving Topics about COVID-19 from Twitter, J Healthc Inform Res, vol. 5, no. 1, pp. 70–97, 2021, doi: 10.1007/s41666-020-00083-3.
O. Oyebode et al., Health, psychosocial, and social issues emanating from the COVID-19 pandemic based on social media comments: Text mining and thematic analysis approach, JMIR Med Inform, vol. 9, no. 4, 2021, doi: 10.2196/22734.
Y. Su, A. Venkat, Y. Yadav, L. B. Puglisi, and S. J. Fodeh, Twitter-based analysis reveals differential COVID-19 concerns across areas with socioeconomic disparities, Comput Biol Med, vol. 132, no. March, p. 104336, 2021, doi: 10.1016/j.compbiomed.2021.104336.
H. Jang, E. Rempel, D. Roth, G. Carenini, and N. Z. Janjua, Tracking COVID-19 discourse on Twitter in north america: Infodemiology study using topic modeling and aspect-based sentiment analysis, J Med Internet Res, vol. 23, no. 2, 2021, doi: 10.2196/25431.
D. Gerts et al., 'Thought I'd share first': An analysis of COVID-19 conspiracy theories and misinformation spread on Twitter, JMIR Public Health Surveill, vol. 7, no. 4, p. e26527, 2021.
J. Zhou, S. Yang, C. Xiao, and F. Chen, Examination of Community Sentiment Dynamics due to COVID-19 Pandemic: A Case Study from a State in Australia, SN Comput Sci, vol. 2, no. 3, pp. 1–11, 2021, doi: 10.1007/s42979-021-00596-7.
M. Pellert, J. Lasser, H. Metzler, and D. Garcia, Dashboard of Sentiment in Austrian Social Media During COVID-19, Front Big Data, vol. 3, October, pp. 1–9, 2020, doi: 10.3389/fdata.2020.00032.
M. Sallam, Covid-19 vaccine hesitancy worldwide: A concise systematic review of vaccine acceptance rates, Vaccines (Basel), vol. 9, pp. 1–14, 2021, doi: 10.3390/vaccines9020160.
R. Marcec and R. Likic, Using Twitter for sentiment analysis towards AstraZeneca/Oxford, Pfizer/BioNTech and Moderna COVID-19 vaccines, Postgrad Med J, pp. 544–550, 2021, doi: 10.1136/postgradmedj-2021-140685.
M. R. Jawad et al., Advancement of artificial intelligence techniques based lexicon emotion analysis for vaccine of COVID-19, Periodicals of Engineering and Natural Sciences, vol. 9, no. 4, pp. 580–588, 2021, doi: 10.21533/pen.v9i4.2383.
C. B. P. Putra, D. Purwitasari, and A. B. Raharjo, Stance Detection on Tweets with Multi-task Aspect-based Sentiment: A Case Study of COVID-19 Vaccination, International Journal of Intelligent Engineering and Systems, vol. 15, no. 5, pp. 515–526, 2022, doi: 10.22266/ijies2022.1031.45.
M. S. Zulfiker, N. Kabir, A. A. Biswas, S. Zulfiker, and M. S. Uddin, Analyzing the public sentiment on COVID-19 vaccination in social media: Bangladesh context, Array, vol. 15, Sep. 2022, doi: 10.1016/j.array.2022.100204.
S. Muñoz and C. A. Iglesias, A text classification approach to detect psychological stress combining a lexicon-based feature framework with distributional representations, Inf Process Manag, vol. 59, no. 5, Sep. 2022, doi: 10.1016/j.ipm.2022.103011.
F. S. Tabak and V. Evrim, Comparison of emotion lexicons, in 13th HONET-ICT International Symposium on Smart MicroGrids for Sustainable Energy Sources Enabled by Photonics and IoT Sensors, HONET-ICT 2016, Nov. 2016, pp. 154–158. doi: 10.1109/HONET.2016.7753440.
S. Muñoz and C. A. Iglesias, A text classification approach to detect psychological stress combining a lexicon-based feature framework with distributional representations, Inf Process Manag, vol. 59, no. 5, p. 103011, 2022, doi: 10.1016/j.ipm.2022.103011.
C. S. G. Khoo and S. B. Johnkhan, Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons, J Inf Sci, vol. 44, no. 4, pp. 491–511, 2018, doi: 10.1177/0165551517703514.
S. Wang, W. Zhou, and C. Jiang, A survey of word embeddings based on deep learning, Computing, vol. 102, no. 3, pp. 717–740, 2020, doi: 10.1007/s00607-019-00768-7.
J. D. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, Distributed Representation of Words and Phrases and their Compositionality, Advances in Neural Information Processing Systems 26 (NIPS 2013), 2013, doi: 10.18653/v1/d16-1146.
A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, FastText.zip: Compressing text classification models, pp. 1–13, 2016.
E. Cambria, Q. Liu, S. Decherchi, F. Xing, and K. Kwok, SenticNet 7: A Commonsense-based Neurosymbolic AI Framework for Explainable Sentiment Analysis, Proceedings of the Language Resources and Evaluation Conference, no. June, pp. 3829–3839, 2022.
N. R. Prayoga et al., Unsupervised Twitter Sentiment Analysis on The Revision of Indonesian Code Law and the Anti-Corruption Law using Combination Method of Opinion Word and Agglomerative Hierarchical Clustering, Emit. Int. J. Eng. Technol., vol. 8, no. 1, pp. 200–220, 2020, doi: 10.24003/emitter.v8i1.477.
N. Bahrawi, Sentiment Analysis Using Random Forest Algorithm-Online Social Media Based, J. Inf. Technol. Its Util., vol. 2, no. 2, p. 29, 2019, doi: 10.30818/jitu.2.2.2695.
A. Ogunleye and Q. G. Wang, XGBoost Model for Chronic Kidney Disease Diagnosis, IEEE/ACM Trans Comput Biol Bioinform, vol. 17, no. 6, pp. 2131–2140, Nov. 2020, doi: 10.1109/TCBB.2019.2911071.
Copyright (c) 2023 EMITTER International Journal of Engineering Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.