Text Mining for Employee Candidates Automatic Profiling Based on Application Documents

  • Adhi Dharma Wibawa Institut Teknologi Sepuluh Nopember, Indonesia
  • Arni Muarifah Amri Institut Teknologi Sepuluh Nopember, Indonesia
  • Arbintoro Mas Institut Teknologi Sepuluh Nopember, Indonesia
  • Syahrul Iman Institut Teknologi Sepuluh Nopember, Indonesia
Keywords: Document Vector, Euclidean Distance, N-Gram, Text mining, Automatic Candidate Profiling


Opening job vacancies using the Internet will receive many applications quickly. Manually filtering resumes takes a lot of time and incurs huge costs. In addition, this manual screening process tends to be inaccurate due to fatigue conditions and fails in obtaining the right candidate for the job. This paper proposed a solution to automatically generate the most suitable candidate from the application document. In this study, 126 application documents from a private company were used for the experiment. The documents consist of 41 documents for Human Resource and Development (HRD) staff, 42 documents for IT (Data Developer), and 43 documents for the Marketing position. Text Processing is implemented to extract relevant information such as skills, education, experiences from the unstructured resumes and summarize each application. A specific dictionary for each vacancy is generated based on terms used in each profession. Two methods are implemented and compared to match and score the application document, namely Document Vector and N-gram analysis. The highest the score obtained by one document, the highest the possibility of application to be accepted. The two methods’ results are then validated by the real selection process by the company. The highest accuracy was achieved by the N-Gram method in IT vacancy with 87,5%, while the Document Vector showed 75% accuracy. For Marketing staff vacancy, both methods achieved the same accuracy as 78%. In HRD staff vacancy, the N-Gram method showed 68%, while Document Vector showed 74%. In conclusion, overall the N-gram method showed slightly better accuracy compared to the Document Vector method. 


Download data is not yet available.


P. Hendrarso, Meningkatkan Kualitas Sumber Daya Manusia di Perguruan Tinggi menuju Era VUCA : Studi Fenomenologi Pada Perguruan Tinggi Swasta, Prosiding Seminar Stiami, vol. 7, no. 2. 2020.

S. R. Astari, “Penerapan Profile Matching Untuk Seleksi Asisten Laboratorium,” Telematika, vol. 16, no. 1, p. 1, 2019, doi: 10.31315/telematika.v16i1.2987. DOI: https://doi.org/10.31315/telematika.v16i1.2987

J. Kuswanto, “Penerimaan Karyawan Baru Menggunakan Metode Profile Matching,” J. Ilm. Sist. Informasi, Teknol. Inf. dan Sist. Komput., vol. 15, no. 2, pp. 85–97, 2020. DOI: https://doi.org/10.33998/processor.2020.15.2.831

E. Sutinah, “Sistem Pendukung Keputusan Menggunakan Metode Profile Matching dalam Pemilihan Salesman Terbaik,” Informatics Educ. Prof., vol. 2, no. 1, p. 234409, 2017.

Hassani, H., Beneki, C., Unger, S., Mazinani, M. T., & Yeganegi, M. R. (2020). Text mining in big data analytics. Big Data and Cognitive Computing, 4(1), 1–34. https://doi.org/10.3390/bdcc4010001 DOI: https://doi.org/10.3390/bdcc4010001

Wosiak, A. (2021). Automated extraction of information from Polish resume documents in the IT recruitment process. Procedia Computer Science, 192, 2432–2439. https://doi.org/10.1016/j.procs.2021.09.012 DOI: https://doi.org/10.1016/j.procs.2021.09.012

Alanoca, H. A., Vidal, A. A. R. de C., & Saire, J. E. C. (2020). Curriculum Vitae Recommendation Based on Text Mining. http://arxiv.org/abs/2007.11053

A. Aditya, B. N. Sari, and T.N Padilah, "Perbandingan pengukuran jarak Euclidean dan Gower pada klaster k-medoids," Jurnal Teknologi dan Sistem Komputer, vol. 9, no. 1, pp. 1-7, 2021. DOI: https://doi.org/10.14710/jtsiskom.2020.13747

A. Ali, J. Qadir, R. ur Rasool, A. Sathiaseelan, A. Zwitter, and J. Crowcroft, “Big data for development: applications and techniques,” Big Data Anal., vol. 1, no. 1, 2016. DOI: https://doi.org/10.1186/s41044-016-0002-4

D. Rapitasari, “Digital marketing Berbasis Aplikasi Sebagai Strategi Meningkatkan Kepuasaan Pelanggan,” J. Cakrawala, vol. 10, no. 2, pp. 107–112, 2016.

Kotler, P., Rackham, N., & Krishnaswamy, S. (2006). Ending the War Between Sales and Marketing. www.hbrreprints.org

Kasmawati, “Pengembangan Sumber Daya Manusia Dalam Organisasi Pendidikan Islam,” J. UIN Alaudin, vol. VIII, no. 2, pp. 392–402, 2019. DOI: https://doi.org/10.24252/idaarah.v2i2.6864

I. A. Zarqan, “Human Resource Development in the Era of Technology; Technology’s Implementation for Innovative Human Resource Development,” J. Manaj. Teor. dan Terap. | J. Theory Appl. Manag., vol. 10, no. 3, p. 217, 2017. DOI: https://doi.org/10.20473/jmtt.v10i3.5967

M. Habibi, “Implementation of Cosine Similarity in an automatic classifier for comments,” JISKA (Jurnal Inform. Sunan Kalijaga), vol. 3, no. 2, p. 110, 2019. DOI: https://doi.org/10.14421/jiska.2018.32-05

D. Soyusiawaty and Y. Zakaria, “Book data content similarity detector with cosine similarity (case study on digilib.uad.ac.id),” Proceeding 2018 12th Int. Conf. Telecommun. Syst. Serv. Appl. TSSA 2018, 2018. DOI: https://doi.org/10.1109/TSSA.2018.8708758

R. Saptono, H. Prasetyo, and A. Irawan, “Combination of cosine similarity method and conditional probability for plagiarism detection in the thesis documents vector space model” J. Telecommun. Electron. Comput. Eng., vol. 10, no. 2–4, pp. 139–143, 2018.

A. W. Pradana and M. Hayaty, “The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 3, pp. 375–380, 2019. DOI: https://doi.org/10.22219/kinetik.v4i4.912

S. Sohangir and D. Wang, “Improved sqrt-cosine similarity measurement,” J. Big Data, vol. 4, no. 1, 2017. DOI: https://doi.org/10.1186/s40537-017-0083-6

A. K. Singh and M. Shashi, “Vectorization of text documents for identifying unifiable news articles,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 7, pp. 305–310, 2019. DOI: https://doi.org/10.14569/IJACSA.2019.0100742

Singh Lehal M, Kumar, A, Goyal, V, "Comparative Analysis of Similarity Measures for Extraction of Parallel Data", International Journal of Control and Automation, Vol. 12, No. 6, pp. 408-417, 2019.

A. Koochari, A. A. Gharahbagh, and V. Hajihashemi, “A Persian part of speech tagging system using the long short-term memory neural network,” 6th Iran. Conf. Signal Process. Intell. Syst. ICSPIS 2020, 2020, doi: 10.1109/ICSPIS51611.2020.9349556. DOI: https://doi.org/10.1109/ICSPIS51611.2020.9349556


Kinoa, Y., Kurokia, H., Machidab, T., Furuyab, N., Takanob, K., “Text Analysis for Job Matching Quality Improvement,” Int’l Conf. on Knowledge Based and Intelligent Information and Engineering Systems, 2017. DOI: https://doi.org/10.1016/j.procs.2017.08.054

Almada, R. V., Elias, O. M., G ´omez, C. E., Mendoza, M. D., L ´opez, S. G., Natural Language Processing and Text Mining to Identify Knowledge Profiles for Software Engineering Positions, 5th 81st Int’l Conf. in Software Engineering Research and Innovation (CONISOFT), 2017.

S A Md Nasir, W F Wan Yaacob, and W A H Wan Aziz. Analysing Online Vacancy and Skills Demand using Text Mining., Journal of Physics: Conference Series., 1496 (2020), IOP Publishing, doi:10.1088/1742-6596/1496/1/012011 DOI: https://doi.org/10.1088/1742-6596/1496/1/012011

Debortoli S, Müller O and vom Brocke J., (2014). Comparing business intelligence and big data skills: a text mining study using job advertisements. Business & Information Systems Engineering 6(5) DOI: https://doi.org/10.1007/s12599-014-0344-2

Karakatsanis I, AlKhader W, MacCrory F, Alibasic A, Omar M A, Aung Z and Woon W L. (2017)., Data mining approach to monitoring the requirements of the job market: A case study. Information Systems Vol 65 p1-6. DOI: https://doi.org/10.1016/j.is.2016.10.009

How to Cite
Wibawa, A. D., Amri, A. M., Mas, A., & Iman, S. (2022). Text Mining for Employee Candidates Automatic Profiling Based on Application Documents. EMITTER International Journal of Engineering Technology, 10(1), 47-62. https://doi.org/10.24003/emitter.v10i1.679