Sentiment Analysis Design and Development for Low Resource Languages in the Case of Telugu

Srinivasu Badugu; Suneetha Chittineni; G.L. Anand Babu; G. Sekhar Reddy; S. Vijaykumar; N. Nagalakshmi

doi:10.24003/emitter.v13i2.861

Srinivasu Badugu Department of CSE, stanley College of Engineering and Technology for Women, Abids, Hyderabad, India
Suneetha Chittineni Department of computer applications, R.V.R. & J.C. College of Engineering, Chowdavaram, India
G.L. Anand Babu Dept. of IT, Anurag University, Hyderabad, Telangana, India
G. Sekhar Reddy Dept. of IT, Anurag University, Hyderabad, Telangana, India
S. Vijaykumar Dept. of IT, Anurag University, Hyderabad, Telangana, India
N. Nagalakshmi Dept. of IT, Anurag University, Hyderabad, Telangana, India

DOI: https://doi.org/10.24003/emitter.v13i2.861

Keywords: Intrusion detection system, cloud computing, security, deep learning model, TF-IDF

Abstract

The use of sentiment analysis has become more widespread because it is necessary to filter and analyze information on the internet. It has a wide range of applications, including monitoring social media market research and opinion mining. Still, this development is restricted to few languages with enough resources. The Telugu language lags behind in this field of study, even though it is the fourth most spoken language in India and generates a vast quantity of data every day. In this research paper, we develop a trustworthy source for sentiment analysis in Telugu. To use in sentiment analysis, the data is annotated with Telugu movie reviews. We extracted 1844 sentences from 100 film reviews. We annotated the data with two annotators and calculated the kappa coefficient to determine the annotators' inter-rater reliability. We obtained a kappa value of 0.90 for 1844 sentences, indicating nearly perfect agreement. After the annotators' disagreements and discrepancies were resolved, 1807 sentences were chosen. For feature extraction, we used two vectorization methods: TF-IDF and Count vectorization. Using the two vectorization methods, we used SVM and Logistic regression. We used two vectorization approaches to test different split ratios such as 80-20%, 70-30%, and 60-40% on SVM and Logistic regression. The outcomes of the various combinations are compared. We discovered that combining TF-IDF with SVM for a 70-30% ratio yields the highest accuracy among the combinations tested on our dataset.

Downloads

Download data is not yet available.

References

Z. Luo, M. Osborne, and T. Wang, An effective approach to tweets opinion retrieval, World Wide Web, Vol. 18, No. 3, pp. 545–566, 2015.

S. A. Z. Farmadi, A. R. Barakbah, and E. M. Kusumaningtyas, Smart I’rab: Smart Aplication for Arabic Grammar Learning, EMITTER Int'l J. of Eng. Tech., Vol. 1, No. 1, pp. 1-10, 2013.

Z. Wang, P. Gao, and X. Chu, Sentiment analysis from Customer-generated online videos on product review using topic modeling and Multi-attention BLSTM, Advanced Engineering Informatics, Vol. 52, p. 101588, 2022.

V. U. Ramya, and K. T. Rao, Sentiment analysis of movie review using machine learning techniques, International Journal of Engineering & Technology, Vol. 3, No. 7, pp. 676-681, 2018.

V. K. Singh, R. Piryani, A. Uddin, and P. Waila, Sentiment Analysis of Movie Reviews and Blog Posts, Proceedings of the 3rd IEEE International Advance Computing Conference (IACC), Ghaziabad, India, pp. 893-898, 2013.

V. B. Viswanadh, Sentiment Analysis of Telugu News Articles Decoding Textual Nuances, Proceedings of the 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE), Vellore, India, pp. 1-6, 2024.

A. H. N. Karthik, C. Aneesh, G. Saumik, K. V. V. Varun, and K. CR, Sentiment Analysis on Telugu Text Translated from English Using NLP and ML, Proceedings of the 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, Indial pp. 246-253, 2025.

V. P. Vasani, and A. Asha, A review based on sentimental analysis for Hindi language, Indonesian Journal of Electrical Engineering and Computer Science, Vol. 37, No. 5, pp. 1815-1829, 2025.

G. L. Anand Babu, and S. Badugu, Extractive summarization of telugu text using modified text rank and maximum marginal relevance, ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 22, No. 9, pp. 1-18, 2023.

S. R. Shah, and A. Kaushik, Sentiment analysis on indian indigenous languages: a review on multilingual opinion mining, arXiv preprint arXiv:1911.12848, 2019.

M. B. Shelke, and S. N. Deshmukh, Recent advances in sentiment analysis of Indian languages, International Journal of Future Generation Communication and Networking, Vol. 13, No. 4, pp. 1656-1675, 2020.

R. Bhargava, S. Arora, and Y. Sharma, Neural network-based architecture for sentiment analysis in Indian languages, J. Intell. Syst., Vol. 28, No. 3, pp. 361-375, 2019.

S. Phani, S. Lahiri, and A. Biswas, Sentiment analysis of tweets in three Indian languages, Proceedings of the 6th workshop on South and Southeast Asian natural language processing (WSSANLP2016), Osaka, Japan, pp. 93-102, 2016.

S. Seshadri, A. K. Madasamy, S. K. Padannayil, and M. A. Kumar, Analyzing sentiment in indian languages micro text using recurrent neural network, IIOAB J, Vol. 7, pp. 313-318, 2016.

G. I. Ahmad, and J. Singla, Machine learning techniques for sentiment analysis of indian languages, Int. J. Recent Technol. Eng., Vol. 8, No. 2, pp. 3630-3636, 2019.

T. A. Le, D. Moeljadi, Y. Miura, and T. Ohkuma, Sentiment analysis for low resource languages: A study on informal Indonesian tweets, Proceedings of the 12th Workshop on Asian Language Resources (ALR12), Osaka, Japan, pp. 123-131, 2016.

R. R. Chowdhury, M. S. Hossain, S. Hossain, and K. Andersson, Analyzing sentiment of movie reviews in Bangla by applying machine learning techniques, Proceedings of the 2019 International Conference on Bangla speech and language processing (ICBSLP), Sylhet, Bangladesh, pp. 1-6, 2019.

A. Joshi, A. R. Balamurali, and P. Bhattacharyya, A fall-back strategy for sentiment analysis in Hindi: a case study, Proceedings of ICON 2010: 8th International Conference on Natural Language Processing, Macmillan Publishers, India, pp. 1-6, 2010.

V. Ramanathan, T. Meyyappan, and S. M. Thamarai, Predicting Tamil movies sentimental reviews using Tamil tweets, Journal of Computer Science, Vol. 15, No. 11, pp. 1638-1647, 2019.

Y. R. Regatte, R. R. R. Gangula, and R. Mamidi, Dataset creation and evaluation of aspect based sentiment analysis in Telugu, a low resource language, Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, pp. 5017-5024, 2020.

X. Fang, and J. Zhan, Sentiment analysis using product review data, Journal of Big Data, Vol. 2, No. 5, pp. 1-14, 2015.

A. Alsaeedi, and M. Z. Khan, A study on sentiment analysis techniques of Twitter data, Int. J. Adv. Comput. Sci. Appl., Vol. 10, No. 2, pp. 361-374, 2019.

K. Chattu, K. A. N. Reddy, S. B. Veesam, P. S. Chirumamilla, V. Dinesh Babu, K. Prakash, S. Bansal, M.R.I. Faruque, and K.S. Al-Mugren, Sentiment classification for telugu using transformed based approaches on a multi-domain dataset, Scientific Reports, Vol. 15, No. 1, p. 22185, 2025.

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, Vol. 26, 2013.

A. Bhansali, A. Chandravadiya, B. Y. Panchal, M. H. Bohara, and A. Ganatra, Language identification using combination of machine learning algorithms and vectorization techniques, Proceedings of the 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, pp. 1329-1334, 2022.

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, Natural language processing (almost) from scratch, J. Mach. Learn. Res., Vol. 12, pp. 2493-2537, 2011.

M. A. Zahran, A. Magooda, A. Y. Mahgoub, H. Raafat, M. Rashwan, and A. Atyia, Word Representations in Vector Space and their Applications for Arabic, Computational Linguistics and Intelligent Text Processing: 16th International Conference, CIC Ling 2015, Cairo, Egypt, pp. 430-443, 2015.

M.-T. Luong, R. Socher, and C. D. Manning, Better Word Representations with Recursive Neural Networks for Morphology, Proceedings of the Seventeenth Conference on Computational Natural Language Learning, Sofia, Bulgaria, pp. 104-113, 2013.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, arXiv preprint arXiv:1607.04606, 2016.

S. S. Mukku, Sentiment Analysis for Telugu Language, Ph.D thesis, International Institute of Information Technology (Deemed to be University), 2017.

Sentiment Analysis Design and Development for Low Resource Languages in the Case of Telugu

Abstract

Downloads

References

Plagiarism Check