An Exploring the Power of Feature Representations: An Empirical Study on Product Reviews for Sentiment Analysis
Abstract
With the rise of e-commerce and online shopping, customer reviews have become a crucial factor in determining the quality and reputation of a product. Online shoppers rely heavily on customer reviews to make informed purchasing decisions, as they don't have the opportunity to physically examine the product before buying. As a result, companies are also investing in sentiment analysis to understand and respond to customer feedback, as well as to enhance the quality of their products and services. Using natural language processing (NLP) and machine learning techniques, sentiment analysis classifies the tone of a customer review as positive, negative, or neutral. It involves analysing text data to determine the overall tone, emotion, and opinion expressed in a review. In this work, we study sentiment analysis of client reviews using machine learning algorithms with different vectorization techniques. The strategy outlined here consists of three distinct phases. The initial step involves some pre-processing to get rid of irrelevant information and find the useful terms. Then, feature extraction was accomplished utilizing numerous vectorization strategies as Bag-Of-Words (BoW), Term Frequency Inverse Document Frequency (TF-IDF), and N-grams. After extracting the features from text data, the final stage is classification and predictions based on machine learning approaches. We evaluated the proposed models on Yelp reviews dataset. The experimental results are evaluated using metrics such as precision, recall, and f1-score, and K-fold cross-validation.
Downloads
References
S. Kausar, X. Huahu, M. Y. Shabir, and W. Ahmad, “A Sentiment Polarity Categorization Technique for Online Product Reviews,” IEEE Access, vol. 8, pp. 3594–3605, 2020.
A. Mitra, “Sentiment Analysis Using Machine Learning Approaches (Lexicon based on movie review dataset),” J. Ubiquitous Comput. Commun. Technol., vol. 2, no. 3, pp. 145–152, 2020.
E. M., M. Abdul, M. Ali, and H. Ahmed, “Social Media Sentiment Analysis using Machine Learning and Optimization Techniques,” Int. J. Comput. Appl., vol. 178, no. 41, pp. 31–36, 2019.
M. Kabir, M. M. J. Kabir, S. Xu, and B. Badhon, “An empirical research on sentiment analysis using machine learning approaches,” Int. J. Comput. Appl., vol. 43, no. 10, pp. 1011–1019, 2021.
Y. S. Mehanna and M. Bin Mahmuddin, “A Semantic Conceptualization Using Tagged Bag-of-Concepts for Sentiment Analysis,” IEEE Access, vol. 9, pp. 118736–118756, 2021.
A. Onan, “Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks,” Concurr. Comput. Pract. Exp., vol. 33, no. 23, pp. 1–12, 2021.
T. Shaik, X. Tao, C. Dann, H. Xie, Y. Li, and L. Galligan, “Sentiment analysis and opinion mining on educational data: A survey,” Nat. Lang. Process. J., vol. 2, no. Yan Li, p. 100003, 2023.
M. Makrehchi and M. S. Kamel, “Automatic extraction of domain-specific stopwords from labeled documents,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 4956 LNCS, pp. 222–233, 2008.
R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The Impact of Features Extraction on the Sentiment Analysis,” Procedia Comput. Sci., vol. 152, pp. 341–348, 2019.
B. Bansal and S. Srivastava, “Lexicon-based Twitter sentiment analysis for vote share prediction using emoji and N-gram features,” Int. J. Web Based Communities, vol. 15, no. 1, pp. 85–99, 2019.
M. A. Fauzi, “Random forest approach fo sentiment analysis in Indonesian language,” Indones. J. Electr. Eng. Comput. Sci., vol. 12, no. 1, pp. 46–50, 2018.
A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, “Sentiment analysis and classification of Indian farmers’ protest using twitter data,” Int. J. Inf. Manag. Data Insights, vol. 1, no. 2, p. 100019, 2021.
A. Alsaeedi and M. Z. Khan, “A study on sentiment analysis techniques of Twitter data,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 2, pp. 361–374, 2019.
R. Xia, C. Zong, and S. Li, “Ensemble of feature sets and classification algorithms for sentiment classification,” Inf. Sci. (Ny)., vol. 181, no. 6, pp. 1138–1152, 2011.
T. Joachims, “Text categorization with Support Vector Machines: Learning with many relevant features BT - Machine Learning: ECML-98,” 1998, pp. 137–142.
1Ravikumar . R N., S. . Jain, and M. . Sarkar, “Efficient Hybrid Movie Recommendation System Framework Based on A Sequential Model”, Int J Intell Syst Appl Eng, vol. 11, no. 9s, pp. 145–155, Jul. 2023.
A. Sharma and U. Ghose, "Toward Machine Learning Based Binary Sentiment Classification of Movie Reviews for Resource Restraint Language (RRL)—Hindi," in IEEE Access, vol. 11, pp. 58546-58564, 2023.
Kalasalingam Academy of Research and Education. IEEE Student Branch., Institute of Electrical and Electronics Engineers, and IEEE Power & Energy Society, IEEE International Conference on Intelligent Techniques in Control, Optimization & Signal Processing : INCOS-’19 : 11th-13th April 2019. .
Rahman and M. S. Hossen, “Sentiment Analysis on Movie Review Data Using Machine Learning Approach,” 2019 Int. Conf. Bangla Speech Lang. Process. ICBSLP 2019, pp. 27–28, 2019.
Sri Eshwar College of Engineering and Institute of Electrical and Electronics Engineers, 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). .
J. Jabbar, I. Urooj, W. Junsheng, and N. Azeem, “Real-time sentiment analysis on E-Commerce application,” Proc. 2019 IEEE 16th Int. Conf. Networking, Sens. Control. ICNSC 2019, pp. 391–396, 2019.
S. Thavareesan and S. Mahesan, “Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation,” 2019 IEEE 14th Int. Conf. Ind. Inf. Syst. Eng. Innov. Ind. 4.0, ICIIS 2019 - Proc., pp. 320–325, 2019.
G. Gautam and D. Yadav, “Sentiment analysis of twitter data using machine learning approaches and semantic analysis,” in 2014 Seventh International Conference on Contemporary Computing (IC3), 2014, pp. 437–442.
J. Plisson, N. Lavrac, and D. D. Mladenić, “A rule based approach to word lemmatization,” Proc. 7th Int. Multiconference Inf. Soc., pp. 83–86, 2004, [Online]. Available: http://eprints.pascal-network.org/archive/00000715/.
R. Srivastava, P. K. Bharti, and P. Verma, “Sentiment Analysis using Feature Generation And Machine Learning Approach,” in 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), 2021, pp. 86–91.
A. A. Farisi, Y. Sibaroni, and S. Al Faraby, “Sentiment analysis on hotel reviews using Multinomial Naïve Bayes classifier,” J. Phys. Conf. Ser., vol. 1192, no. 1, 2019.
G. M. Raza, Z. S. Butt, S. Latif, and A. Wahid, “Sentiment Analysis on COVID Tweets: An Experimental Analysis on the Impact of Count Vectorizer and TF-IDF on Sentiment Predictions using Deep Learning Models,” 2021 Int. Conf. Digit. Futur. Transform. Technol. ICoDT2 2021, 2021.
V. Sundaram, S. Ahmed, S. A. Muqtadeer, and R. R. Reddy, “Emotion Analysis in Text using TF-IDF,” in 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2021, pp. 292–297.
S. M. H. Dadgar, M. S. Araghi, and M. M. Farahani, “A novel text mining approach based on TF-IDF and Support Vector Machine for news classification,” in 2016 IEEE International Conference on Engineering and Technology (ICETECH), 2016, pp. 112–116.
S. Kaur, G. Sikka, and L. K. Awasthi, “Sentiment Analysis Approach Based on N-gram and KNN Classifier,” in 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), 2018, pp. 1–4.
M. Aufar, R. Andreswari, and D. Pramesti, “Sentiment Analysis on Youtube Social Media Using Decision Tree and Random Forest Algorithm: A Case Study,” 2020 Int. Conf. Data Sci. Its Appl. ICoDSA 2020, 2020
K. Zahoor, N. Z. Bawany, and S. Hamid, “Sentiment analysis and classification of restaurant reviews using machine learning,” Proc. - 2020 21st Int. Arab Conf. Inf. Technol. ACIT 2020, 2020.
Copyright (c) 2025 EMITTER International Journal of Engineering Technology

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.