An Exploring the Power of Feature Representations: An Empirical Study on Product Reviews for Sentiment Analysis

  • Thian Lian Ben Department of Computer Engineering, Marwadi University, Rajkot, India
  • Ravikumar R N Department of Computer Engineering, Marwadi University, Rajkot, India
  • Sushil Kumar Singh Department of Computer Engineering, Marwadi University, Rajkot, India
  • Pratikkumar Chauhan Department of Computer Engineering, Marwadi University, Rajkot, India
  • Sivakumar N School of Computer Science & IT, Jain (Deemed to be University) Banglore, India
  • Manoj Praveen V Department of AI & DS, Velalar College of Engineering and Technology, Erode, India
Keywords: Sentiment Analysis, Natural Language Processing, Yelp Review Dataset, Feature Extraction, TF-IDF

Abstract

With the rise of e-commerce and online shopping, customer reviews have become a crucial factor in determining the quality and reputation of a product. Online shoppers rely heavily on customer reviews to make informed purchasing decisions, as they don't have the opportunity to physically examine the product before buying. As a result, companies are also investing in sentiment analysis to understand and respond to customer feedback, as well as to enhance the quality of their products and services. Using natural language processing (NLP) and machine learning techniques, sentiment analysis classifies the tone of a customer review as positive, negative, or neutral. It involves analysing text data to determine the overall tone, emotion, and opinion expressed in a review. In this work, we study sentiment analysis of client reviews using machine learning algorithms with different vectorization techniques. The strategy outlined here consists of three distinct phases. The initial step involves some pre-processing to get rid of irrelevant information and find the useful terms. Then, feature extraction was accomplished utilizing numerous vectorization strategies as Bag-Of-Words (BoW), Term Frequency Inverse Document Frequency (TF-IDF), and N-grams. After extracting the features from text data, the final stage is classification and predictions based on machine learning approaches. We evaluated the proposed models on Yelp reviews dataset. The experimental results are evaluated using metrics such as precision, recall, and f1-score, and K-fold cross-validation.

Downloads

Download data is not yet available.

References

S. Kausar, X. Huahu, M. Y. Shabir, and W. Ahmad, “A Sentiment Polarity Categorization Technique for Online Product Reviews,” IEEE Access, vol. 8, pp. 3594–3605, 2020.

A. Mitra, “Sentiment Analysis Using Machine Learning Approaches (Lexicon based on movie review dataset),” J. Ubiquitous Comput. Commun. Technol., vol. 2, no. 3, pp. 145–152, 2020.

E. M., M. Abdul, M. Ali, and H. Ahmed, “Social Media Sentiment Analysis using Machine Learning and Optimization Techniques,” Int. J. Comput. Appl., vol. 178, no. 41, pp. 31–36, 2019.

M. Kabir, M. M. J. Kabir, S. Xu, and B. Badhon, “An empirical research on sentiment analysis using machine learning approaches,” Int. J. Comput. Appl., vol. 43, no. 10, pp. 1011–1019, 2021.

Y. S. Mehanna and M. Bin Mahmuddin, “A Semantic Conceptualization Using Tagged Bag-of-Concepts for Sentiment Analysis,” IEEE Access, vol. 9, pp. 118736–118756, 2021.

A. Onan, “Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks,” Concurr. Comput. Pract. Exp., vol. 33, no. 23, pp. 1–12, 2021.

T. Shaik, X. Tao, C. Dann, H. Xie, Y. Li, and L. Galligan, “Sentiment analysis and opinion mining on educational data: A survey,” Nat. Lang. Process. J., vol. 2, no. Yan Li, p. 100003, 2023.

M. Makrehchi and M. S. Kamel, “Automatic extraction of domain-specific stopwords from labeled documents,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 4956 LNCS, pp. 222–233, 2008.

R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The Impact of Features Extraction on the Sentiment Analysis,” Procedia Comput. Sci., vol. 152, pp. 341–348, 2019.

B. Bansal and S. Srivastava, “Lexicon-based Twitter sentiment analysis for vote share prediction using emoji and N-gram features,” Int. J. Web Based Communities, vol. 15, no. 1, pp. 85–99, 2019.

M. A. Fauzi, “Random forest approach fo sentiment analysis in Indonesian language,” Indones. J. Electr. Eng. Comput. Sci., vol. 12, no. 1, pp. 46–50, 2018.

A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, “Sentiment analysis and classification of Indian farmers’ protest using twitter data,” Int. J. Inf. Manag. Data Insights, vol. 1, no. 2, p. 100019, 2021.

A. Alsaeedi and M. Z. Khan, “A study on sentiment analysis techniques of Twitter data,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 2, pp. 361–374, 2019.

R. Xia, C. Zong, and S. Li, “Ensemble of feature sets and classification algorithms for sentiment classification,” Inf. Sci. (Ny)., vol. 181, no. 6, pp. 1138–1152, 2011.

T. Joachims, “Text categorization with Support Vector Machines: Learning with many relevant features BT - Machine Learning: ECML-98,” 1998, pp. 137–142.

1Ravikumar . R N., S. . Jain, and M. . Sarkar, “Efficient Hybrid Movie Recommendation System Framework Based on A Sequential Model”, Int J Intell Syst Appl Eng, vol. 11, no. 9s, pp. 145–155, Jul. 2023.

A. Sharma and U. Ghose, "Toward Machine Learning Based Binary Sentiment Classification of Movie Reviews for Resource Restraint Language (RRL)—Hindi," in IEEE Access, vol. 11, pp. 58546-58564, 2023.

Kalasalingam Academy of Research and Education. IEEE Student Branch., Institute of Electrical and Electronics Engineers, and IEEE Power & Energy Society, IEEE International Conference on Intelligent Techniques in Control, Optimization & Signal Processing : INCOS-’19 : 11th-13th April 2019. .

Rahman and M. S. Hossen, “Sentiment Analysis on Movie Review Data Using Machine Learning Approach,” 2019 Int. Conf. Bangla Speech Lang. Process. ICBSLP 2019, pp. 27–28, 2019.

Sri Eshwar College of Engineering and Institute of Electrical and Electronics Engineers, 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). .

J. Jabbar, I. Urooj, W. Junsheng, and N. Azeem, “Real-time sentiment analysis on E-Commerce application,” Proc. 2019 IEEE 16th Int. Conf. Networking, Sens. Control. ICNSC 2019, pp. 391–396, 2019.

S. Thavareesan and S. Mahesan, “Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation,” 2019 IEEE 14th Int. Conf. Ind. Inf. Syst. Eng. Innov. Ind. 4.0, ICIIS 2019 - Proc., pp. 320–325, 2019.

G. Gautam and D. Yadav, “Sentiment analysis of twitter data using machine learning approaches and semantic analysis,” in 2014 Seventh International Conference on Contemporary Computing (IC3), 2014, pp. 437–442.

J. Plisson, N. Lavrac, and D. D. Mladenić, “A rule based approach to word lemmatization,” Proc. 7th Int. Multiconference Inf. Soc., pp. 83–86, 2004, [Online]. Available: http://eprints.pascal-network.org/archive/00000715/.

R. Srivastava, P. K. Bharti, and P. Verma, “Sentiment Analysis using Feature Generation And Machine Learning Approach,” in 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), 2021, pp. 86–91.

A. A. Farisi, Y. Sibaroni, and S. Al Faraby, “Sentiment analysis on hotel reviews using Multinomial Naïve Bayes classifier,” J. Phys. Conf. Ser., vol. 1192, no. 1, 2019.

G. M. Raza, Z. S. Butt, S. Latif, and A. Wahid, “Sentiment Analysis on COVID Tweets: An Experimental Analysis on the Impact of Count Vectorizer and TF-IDF on Sentiment Predictions using Deep Learning Models,” 2021 Int. Conf. Digit. Futur. Transform. Technol. ICoDT2 2021, 2021.

V. Sundaram, S. Ahmed, S. A. Muqtadeer, and R. R. Reddy, “Emotion Analysis in Text using TF-IDF,” in 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2021, pp. 292–297.

S. M. H. Dadgar, M. S. Araghi, and M. M. Farahani, “A novel text mining approach based on TF-IDF and Support Vector Machine for news classification,” in 2016 IEEE International Conference on Engineering and Technology (ICETECH), 2016, pp. 112–116.

S. Kaur, G. Sikka, and L. K. Awasthi, “Sentiment Analysis Approach Based on N-gram and KNN Classifier,” in 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), 2018, pp. 1–4.

M. Aufar, R. Andreswari, and D. Pramesti, “Sentiment Analysis on Youtube Social Media Using Decision Tree and Random Forest Algorithm: A Case Study,” 2020 Int. Conf. Data Sci. Its Appl. ICoDSA 2020, 2020

K. Zahoor, N. Z. Bawany, and S. Hamid, “Sentiment analysis and classification of restaurant reviews using machine learning,” Proc. - 2020 21st Int. Arab Conf. Inf. Technol. ACIT 2020, 2020.

Published
2025-06-16
How to Cite
Lian Ben, T., R N, R., Kumar Singh, S., Bharatbhai Chauhan, P., N, S., & V, M. P. (2025). An Exploring the Power of Feature Representations: An Empirical Study on Product Reviews for Sentiment Analysis. EMITTER International Journal of Engineering Technology, 13(1), 1-21. https://doi.org/10.24003/emitter.v13i1.821
Section
Articles