Sentiment Analysis Design and Development for Low Resource Languages in the Case of Telugu
Abstract
The use of sentiment analysis has become more widespread because it is necessary to filter and analyze information on the internet. It has a wide range of applications, including monitoring social media market research and opinion mining. Still, this development is restricted to few languages with enough resources. The Telugu language lags behind in this field of study, even though it is the fourth most spoken language in India and generates a vast quantity of data every day. In this research paper, we develop a trustworthy source for sentiment analysis in Telugu. To use in sentiment analysis, the data is annotated with Telugu movie reviews. We extracted 1844 sentences from 100 film reviews. We annotated the data with two annotators and calculated the kappa coefficient to determine the annotators' inter-rater reliability. We obtained a kappa value of 0.90 for 1844 sentences, indicating nearly perfect agreement. After the annotators' disagreements and discrepancies were resolved, 1807 sentences were chosen. For feature extraction, we used two vectorization methods: TF-IDF and Count vectorization. Using the two vectorization methods, we used SVM and Logistic regression. We used two vectorization approaches to test different split ratios such as 80-20%, 70-30%, and 60-40% on SVM and Logistic regression. The outcomes of the various combinations are compared. We discovered that combining TF-IDF with SVM for a 70-30% ratio yields the highest accuracy among the combinations tested on our dataset.
Downloads
References
Z. Luo, M. Osborne, and T. Wang, An effective approach to tweets opinion retrieval, World Wide Web, Vol. 18, No. 3, pp. 545–566, 2015.
S. A. Z. Farmadi, A. R. Barakbah, and E. M. Kusumaningtyas, Smart I’rab: Smart Aplication for Arabic Grammar Learning, EMITTER Int'l J. of Eng. Tech., Vol. 1, No. 1, pp. 1-10, 2013.
Z. Wang, P. Gao, and X. Chu, Sentiment analysis from Customer-generated online videos on product review using topic modeling and Multi-attention BLSTM, Advanced Engineering Informatics, Vol. 52, p. 101588, 2022.
V. U. Ramya, and K. T. Rao, Sentiment analysis of movie review using machine learning techniques, International Journal of Engineering & Technology, Vol. 3, No. 7, pp. 676-681, 2018.
V. K. Singh, R. Piryani, A. Uddin, and P. Waila, Sentiment Analysis of Movie Reviews and Blog Posts, Proceedings of the 3rd IEEE International Advance Computing Conference (IACC), Ghaziabad, India, pp. 893-898, 2013.
V. B. Viswanadh, Sentiment Analysis of Telugu News Articles Decoding Textual Nuances, Proceedings of the 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE), Vellore, India, pp. 1-6, 2024.
A. H. N. Karthik, C. Aneesh, G. Saumik, K. V. V. Varun, and K. CR, Sentiment Analysis on Telugu Text Translated from English Using NLP and ML, Proceedings of the 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, Indial pp. 246-253, 2025.
V. P. Vasani, and A. Asha, A review based on sentimental analysis for Hindi language, Indonesian Journal of Electrical Engineering and Computer Science, Vol. 37, No. 5, pp. 1815-1829, 2025.
G. L. Anand Babu, and S. Badugu, Extractive summarization of telugu text using modified text rank and maximum marginal relevance, ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 22, No. 9, pp. 1-18, 2023.
S. R. Shah, and A. Kaushik, Sentiment analysis on indian indigenous languages: a review on multilingual opinion mining, arXiv preprint arXiv:1911.12848, 2019.
M. B. Shelke, and S. N. Deshmukh, Recent advances in sentiment analysis of Indian languages, International Journal of Future Generation Communication and Networking, Vol. 13, No. 4, pp. 1656-1675, 2020.
R. Bhargava, S. Arora, and Y. Sharma, Neural network-based architecture for sentiment analysis in Indian languages, J. Intell. Syst., Vol. 28, No. 3, pp. 361-375, 2019.
S. Phani, S. Lahiri, and A. Biswas, Sentiment analysis of tweets in three Indian languages, Proceedings of the 6th workshop on South and Southeast Asian natural language processing (WSSANLP2016), Osaka, Japan, pp. 93-102, 2016.
S. Seshadri, A. K. Madasamy, S. K. Padannayil, and M. A. Kumar, Analyzing sentiment in indian languages micro text using recurrent neural network, IIOAB J, Vol. 7, pp. 313-318, 2016.
G. I. Ahmad, and J. Singla, Machine learning techniques for sentiment analysis of indian languages, Int. J. Recent Technol. Eng., Vol. 8, No. 2, pp. 3630-3636, 2019.
T. A. Le, D. Moeljadi, Y. Miura, and T. Ohkuma, Sentiment analysis for low resource languages: A study on informal Indonesian tweets, Proceedings of the 12th Workshop on Asian Language Resources (ALR12), Osaka, Japan, pp. 123-131, 2016.
R. R. Chowdhury, M. S. Hossain, S. Hossain, and K. Andersson, Analyzing sentiment of movie reviews in Bangla by applying machine learning techniques, Proceedings of the 2019 International Conference on Bangla speech and language processing (ICBSLP), Sylhet, Bangladesh, pp. 1-6, 2019.
A. Joshi, A. R. Balamurali, and P. Bhattacharyya, A fall-back strategy for sentiment analysis in Hindi: a case study, Proceedings of ICON 2010: 8th International Conference on Natural Language Processing, Macmillan Publishers, India, pp. 1-6, 2010.
V. Ramanathan, T. Meyyappan, and S. M. Thamarai, Predicting Tamil movies sentimental reviews using Tamil tweets, Journal of Computer Science, Vol. 15, No. 11, pp. 1638-1647, 2019.
Y. R. Regatte, R. R. R. Gangula, and R. Mamidi, Dataset creation and evaluation of aspect based sentiment analysis in Telugu, a low resource language, Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, pp. 5017-5024, 2020.
X. Fang, and J. Zhan, Sentiment analysis using product review data, Journal of Big Data, Vol. 2, No. 5, pp. 1-14, 2015.
A. Alsaeedi, and M. Z. Khan, A study on sentiment analysis techniques of Twitter data, Int. J. Adv. Comput. Sci. Appl., Vol. 10, No. 2, pp. 361-374, 2019.
K. Chattu, K. A. N. Reddy, S. B. Veesam, P. S. Chirumamilla, V. Dinesh Babu, K. Prakash, S. Bansal, M.R.I. Faruque, and K.S. Al-Mugren, Sentiment classification for telugu using transformed based approaches on a multi-domain dataset, Scientific Reports, Vol. 15, No. 1, p. 22185, 2025.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, Vol. 26, 2013.
A. Bhansali, A. Chandravadiya, B. Y. Panchal, M. H. Bohara, and A. Ganatra, Language identification using combination of machine learning algorithms and vectorization techniques, Proceedings of the 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, pp. 1329-1334, 2022.
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, Natural language processing (almost) from scratch, J. Mach. Learn. Res., Vol. 12, pp. 2493-2537, 2011.
M. A. Zahran, A. Magooda, A. Y. Mahgoub, H. Raafat, M. Rashwan, and A. Atyia, Word Representations in Vector Space and their Applications for Arabic, Computational Linguistics and Intelligent Text Processing: 16th International Conference, CIC Ling 2015, Cairo, Egypt, pp. 430-443, 2015.
M.-T. Luong, R. Socher, and C. D. Manning, Better Word Representations with Recursive Neural Networks for Morphology, Proceedings of the Seventeenth Conference on Computational Natural Language Learning, Sofia, Bulgaria, pp. 104-113, 2013.
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, arXiv preprint arXiv:1607.04606, 2016.
S. S. Mukku, Sentiment Analysis for Telugu Language, Ph.D thesis, International Institute of Information Technology (Deemed to be University), 2017.
Copyright (c) 2025 EMITTER International Journal of Engineering Technology

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.
