A Combination of Lexicon-based and Distributional Representations for Classification of Indonesian Vaccine Acceptance Rates

  • Katon Suwida Institut Teknologi Sepuluh Nopember
  • Muhammad Yusuf Kardawi Institut Teknologi Sepuluh Nopember
  • Diana Purwitasari Institut Teknologi Sepuluh Nopember
  • Fahril Mabahist Institut Teknologi Sepuluh Nopember
Keywords: vaccination, text classification, lexicon-based, distributional representations


When the COVID-19 pandemic hit, the use of vaccines was advertised as the end of the pandemic by the entire world. However, the chances of vaccination depended on the sentiments of society and individuals about the vaccine. People's acceptance of vaccines can change depending on conditions and events. Social media platforms such as Twitter can be used as a source of information to find out the conditions and attitudes of the community toward the program. By implementing a machine learning technique on the COVID-19 vaccine dataset, we hope to impact the classification result with text. This study suggests three distinct machine learning models for classifying texts of the COVID-19 vaccination, namely a model based on the first lexicon using the feature extraction method; second, using the word insertion technique to utilize distribution representation; and third, a combination model of distribution representation and feature extraction based on the lexicon. From the evaluation that has been carried out, we found that a combination of lexicon-based and distributional representation methods succeeded in giving the best results for classifying the level of acceptance of the COVID-19 vaccine in Indonesia with an accuracy score of 71.44% and an F1-score of 71.43%.


Download data is not yet available.


A. Hussain and A. Sheikh, Opportunities for Artificial Intelligence-Enabled Social Media Analysis of Public Attitudes Toward Covid-19 Vaccines, NEJM Catal Innov Care Deliv, pp. 1–7, 2021, doi: 10.1056/CAT.20.0649.

R. M. Merchant et al., Evaluating the predictability of medical conditions from social media posts, PLoS One, vol. 14, no. 6, pp. 1–12, 2019, doi: 10.1371/journal.pone.0215476.

L. Samaras, E. García-Barriocanal, and M. A. Sicilia, Comparing Social Media and Google to Detect and predict severe epidemics, Sci Rep, vol. 10, no. 1, pp. 1–11, 2020, doi: 10.1038/s41598-020-61686-9.

C. H. Chang, M. Monselise, and C. C. Yang, What Are People Concerned About During the Pandemic? Detecting Evolving Topics about COVID-19 from Twitter, J Healthc Inform Res, vol. 5, no. 1, pp. 70–97, 2021, doi: 10.1007/s41666-020-00083-3.

O. Oyebode et al., Health, psychosocial, and social issues emanating from the COVID-19 pandemic based on social media comments: Text mining and thematic analysis approach, JMIR Med Inform, vol. 9, no. 4, 2021, doi: 10.2196/22734.

Y. Su, A. Venkat, Y. Yadav, L. B. Puglisi, and S. J. Fodeh, Twitter-based analysis reveals differential COVID-19 concerns across areas with socioeconomic disparities, Comput Biol Med, vol. 132, no. March, p. 104336, 2021, doi: 10.1016/j.compbiomed.2021.104336.

H. Jang, E. Rempel, D. Roth, G. Carenini, and N. Z. Janjua, Tracking COVID-19 discourse on Twitter in north america: Infodemiology study using topic modeling and aspect-based sentiment analysis, J Med Internet Res, vol. 23, no. 2, 2021, doi: 10.2196/25431.

D. Gerts et al., 'Thought I'd share first': An analysis of COVID-19 conspiracy theories and misinformation spread on Twitter, JMIR Public Health Surveill, vol. 7, no. 4, p. e26527, 2021.

J. Zhou, S. Yang, C. Xiao, and F. Chen, Examination of Community Sentiment Dynamics due to COVID-19 Pandemic: A Case Study from a State in Australia, SN Comput Sci, vol. 2, no. 3, pp. 1–11, 2021, doi: 10.1007/s42979-021-00596-7.

M. Pellert, J. Lasser, H. Metzler, and D. Garcia, Dashboard of Sentiment in Austrian Social Media During COVID-19, Front Big Data, vol. 3, October, pp. 1–9, 2020, doi: 10.3389/fdata.2020.00032.

M. Sallam, Covid-19 vaccine hesitancy worldwide: A concise systematic review of vaccine acceptance rates, Vaccines (Basel), vol. 9, pp. 1–14, 2021, doi: 10.3390/vaccines9020160.

R. Marcec and R. Likic, Using Twitter for sentiment analysis towards AstraZeneca/Oxford, Pfizer/BioNTech and Moderna COVID-19 vaccines, Postgrad Med J, pp. 544–550, 2021, doi: 10.1136/postgradmedj-2021-140685.

M. R. Jawad et al., Advancement of artificial intelligence techniques based lexicon emotion analysis for vaccine of COVID-19, Periodicals of Engineering and Natural Sciences, vol. 9, no. 4, pp. 580–588, 2021, doi: 10.21533/pen.v9i4.2383.

C. B. P. Putra, D. Purwitasari, and A. B. Raharjo, Stance Detection on Tweets with Multi-task Aspect-based Sentiment: A Case Study of COVID-19 Vaccination, International Journal of Intelligent Engineering and Systems, vol. 15, no. 5, pp. 515–526, 2022, doi: 10.22266/ijies2022.1031.45.

M. S. Zulfiker, N. Kabir, A. A. Biswas, S. Zulfiker, and M. S. Uddin, Analyzing the public sentiment on COVID-19 vaccination in social media: Bangladesh context, Array, vol. 15, Sep. 2022, doi: 10.1016/j.array.2022.100204.

S. Muñoz and C. A. Iglesias, A text classification approach to detect psychological stress combining a lexicon-based feature framework with distributional representations, Inf Process Manag, vol. 59, no. 5, Sep. 2022, doi: 10.1016/j.ipm.2022.103011.

F. S. Tabak and V. Evrim, Comparison of emotion lexicons, in 13th HONET-ICT International Symposium on Smart MicroGrids for Sustainable Energy Sources Enabled by Photonics and IoT Sensors, HONET-ICT 2016, Nov. 2016, pp. 154–158. doi: 10.1109/HONET.2016.7753440.

S. Muñoz and C. A. Iglesias, A text classification approach to detect psychological stress combining a lexicon-based feature framework with distributional representations, Inf Process Manag, vol. 59, no. 5, p. 103011, 2022, doi: 10.1016/j.ipm.2022.103011.

C. S. G. Khoo and S. B. Johnkhan, Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons, J Inf Sci, vol. 44, no. 4, pp. 491–511, 2018, doi: 10.1177/0165551517703514.

S. Wang, W. Zhou, and C. Jiang, A survey of word embeddings based on deep learning, Computing, vol. 102, no. 3, pp. 717–740, 2020, doi: 10.1007/s00607-019-00768-7.

J. D. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, Distributed Representation of Words and Phrases and their Compositionality, Advances in Neural Information Processing Systems 26 (NIPS 2013), 2013, doi: 10.18653/v1/d16-1146.

A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, FastText.zip: Compressing text classification models, pp. 1–13, 2016.

E. Cambria, Q. Liu, S. Decherchi, F. Xing, and K. Kwok, SenticNet 7: A Commonsense-based Neurosymbolic AI Framework for Explainable Sentiment Analysis, Proceedings of the Language Resources and Evaluation Conference, no. June, pp. 3829–3839, 2022.

N. R. Prayoga et al., Unsupervised Twitter Sentiment Analysis on The Revision of Indonesian Code Law and the Anti-Corruption Law using Combination Method of Opinion Word and Agglomerative Hierarchical Clustering, Emit. Int. J. Eng. Technol., vol. 8, no. 1, pp. 200–220, 2020, doi: 10.24003/emitter.v8i1.477.

N. Bahrawi, Sentiment Analysis Using Random Forest Algorithm-Online Social Media Based, J. Inf. Technol. Its Util., vol. 2, no. 2, p. 29, 2019, doi: 10.30818/jitu.2.2.2695.

A. Ogunleye and Q. G. Wang, XGBoost Model for Chronic Kidney Disease Diagnosis, IEEE/ACM Trans Comput Biol Bioinform, vol. 17, no. 6, pp. 2131–2140, Nov. 2020, doi: 10.1109/TCBB.2019.2911071.

How to Cite
Suwida, K., Kardawi, M. Y., Purwitasari, D., & Mabahist, F. (2023). A Combination of Lexicon-based and Distributional Representations for Classification of Indonesian Vaccine Acceptance Rates. EMITTER International Journal of Engineering Technology, 11(1), 89-99. https://doi.org/10.24003/emitter.v11i1.768