• EN
Tìm kiếm

[Podcast] Applying Machine Learning to Human Resource Management Data Analysis

28 November, 2024

Keywords: Human resource management; Applied machine learning model; Employee turnover prediction

Human resource management is an important factor in the sustainable development of businesses. However, some have problems with high employee turnover rates, affecting work efficiency and business. To solve this problem, the study conducted by the author from University of Economics Ho Chi Minh City (UEH) analyzed IBM’s human resource data and applied machine learning models listed as Logistics Regression, K-Nearest Neighbors, Decision Tree, Support Vector Machine, Neural Network and Random Forest to predict employee turnover. The results of the study can help organizations build more effective HRM strategies.

Research context

Human resources play a decisive role in the competitiveness and results of a business and are the most important asset, fundamental to the development and existence of an enterprise. Each employee, regardless of their position whether major or minor, still contributes to the overall success of the business. Human Resource Management (HRM) plays an important role in an organization because it is responsible for supervising the most valuable resource of the business – the workforce. HRM has a close relationship and a great influence on the success of an organization or business. Any organization or company is well aware of the importance of employees in achieving and maintaining competitive advantage.

In today’s fast-paced and volatile business environment, effective human resource management is more essential than ever. The success of an organization depends heavily on its ability to attract, to retain and to develop a talented and committed workforce. Therefore, if an employee leaves or resigns, the company will lose not only an employee but also customers, which will affect their production and development. Most businesses do not expect an employee who has worked for many years or a new employee to resign because it will cost a lot of money to recruit a replacement as well as a lot of money and time to train a new employee.

Employee resignation is a normal process because each employee has their reasons for resigning, listed as income, environment, promotion, and family. Therefore, predicting whether an employee is likely to resign or not plays an important role in human resource management in any business, which affects not only human resource management and development but also the business operations if many employees resign in a period of time. If there is a good prediction model, it will help businesses limit this phenomenon; in addition, it also helps the human resource management department and managers grasp the common characteristics of employees resigning in order to improve benefits, the working environment as well as to enhance employee loyalty and attachment to the company.

The UEH’s research suggests a predictive model through data imbalance treatment and feature selection along with machine learning algorithms to predict employee turnover. By leveraging the results of predictive analytics, organizations can develop proactive HR strategies to predict future workforce needs and to mitigate potential risks. The proposed model aims to provide comprehensive information from HR data, which organizations can use to develop effective HR strategies and improve overall organizational performance. The model can be customized to meet the specific needs of each organization and can be applied to address a wide range of HR challenges, listed as recruitment, retention, employee engagement, and performance management.

Research sample and methods 

This study focuses on predicting the likelihood of employee turnover in organizations. The main subject of the study is the IBM workforce, based on a dataset shared on the Kaggle website. This dataset includes employee information listed as age, gender, education level, industry, seniority, number of promotions, salary, performance reviews, and information regarding employee turnover.

This study was carried out based on qualitative and empirical research methods. Qualitative methods were used to survey secondary studies and published works on the application of machine learning and data analysis in the field of human resource management in order to find research gaps to improve performance and build appropriate empirical models. The empirical collected, analyzed and described data and built models using machine learning. The empirical results were evaluated to find appropriate forecasting models.

After the experimental process, the results from the employee turnover prediction model provided important information regarding the possibility of employees leaving the company. By using machine learning models listed as Logistic Regression, K-Nearest Neighbors, Decision Tree, Support Vector Machine, Neural Network and Random Forest, the study built an accurate prediction model. Table 1, Table 2 and Table 3 respectively present the experimental results from the original data after selecting features using the RFE method, and after applying a combination of RFE and SMOTE.

The raw data was used without any preprocessing to evaluate the initial performance of the models. The table below presents the key metrics of each model on the raw dataset.

Table 1. Experimental results from original data

(Source: Authors)

No. Model Results of model from raw data
Accuracy Precision Recall F1-Score
1 Logistic Regression 0.897 0.833 0.431 0.568
2 K-Nearest Neighbors 0.848 0.55 0.19 0.282
3 Decision Tree 0.802 0.358 0.328 0.342
4 SVM (Linear Kernel) 0.883 0.759 0.379 0.506
5 Neural Network 0.856 0.571 0.345 0.43
6 Random Forest 0.872 0.867 0.224 0.356

Based on the results, the Logistic Regression and SVM (Linear Kernel) models showed the highest accuracy, with values ​​of 0.897 and 0.883 respectively. However, the Precision, Recall, and F1-Score indices were low, especially the Recall index, indicating that these models were not effective in identifying positive cases.

Recursive Feature Elimination (RFE) is applied to select the most important features, with a desire to improve model performance. The results obtained from the data after applying RFE are as follows:

Table 2. Experimental results after applying REF

(Source: Authors)

No. Model Results of model on REF
Accuracy Precision Recall F1-Score
1 Logistic Regression 0.889 0.815 0.379 0.518
2 K-Nearest Neighbors 0.859 0.65 0.224 0.333
3 Decision Tree 0.777 0.3 0.31 0.305
4 SVM (Linear Kernel) 0.886 0.833 0.345 0.488
5 Neural Network 0.872 0.657 0.397 0.495
6 Random Forest 0.872 0.762 0.276 0.405

From this table, it can be regconized that applying RFE only slightly, not significantly, improves some metrics. The Logistic Regression and SVM (Linear Kernel) models have better results than those of other models; nevertheless, the Recall index is low, meaning that many truly positive data points are not correctly identified.

To deal with the data imbalance problem, Synthetic Minority Over-sampling Technique (SMOTE) is applied after performing RFE. The results are as follows:

Table 3. Experimental results after applying REF + SMOTE

(Source: Authors)

No. Model Results of model with REF + SMOTE
Accuracy Precision Recall F1-Score
1 Logistic Regression 0.912 0.934 0.893 0.913
2 K-Nearest Neighbors 0.896 0.902 0.896 0.899
3 Decision Tree 0.838 0.852 0.83 0.841
4 SVM (Linear Kernel) 0.917 0.953 0.884 0.917
5 Neural Network 0.911 0.931 0.893 0.912
6 Random Forest 0.922 0.969 0.877 0.921

This table demonstrates a significant improvement compared to that of the previous two cases. The accuracy of the models all exceed 0.83, with many models achieving above 0.90. The F1-Score increases significantly, especially in the Logistic Regression, SVM (Linear Kernel), Neural Network and Random Forest models, with F1-Score values ​​of 0.913, 0.917, 0.912 and 0.921 respectively. This indicates that these models not only recognize positive cases well but also have a good balance between Precision and Recall.

Policy implications for businesses

Employee turnover is a burning issue for most businesses as it greatly affects human resource development and the overall development of the business. Therefore, being able to predict which employees are likely to leave will benefit businesses. In addition, this study has tested many machine learning models to predict employees who are likely to leave at IBM technology company, based on 06 machine learning algorithms including: LG, KNN, DT, SVM, NN, RF. From these forecast results, the management board will have assessments and analyses to find out the characteristics of employees who leave or are loyal. For the general characteristics of employees who leave such as personal information and work experience, the management board can refer to them to determine whether they are likely to work long-term with the business or not. For the common characteristics of loyal employees, the management can continue to maintain good policies or further improve them so that employees have the best working conditions, environment and benefits in accordance with the development strategy and budgets of the enterprise.

Before deciding to use any predictive analytics model, it is important to analyze and to describe the status of the data set. Understanding the statistical characteristics, distributions, and imbalances between data groups helps choose the appropriate approach and processing, and tune and optimize the parameters in the model to improve the forecasting performance. This supports making HR management decisions and developing proactive HR strategies.

Predictive models should be customized to meet the specific needs of each organization. The model can be applied to address a wide range of HR challenges, listed as recruitment, employee retention, employee engagement, and performance management. In this way, organizations can develop effective HR strategies and improve overall organizational performance.

Ultimately, predictive analytics can be leveraged to develop proactive HR strategies, anticipate future workforce needs, and mitigate potential risks. By providing valuable insights into workforce trends, employee behaviors, and factors that influence employee engagement, satisfaction, and productivity, HR data analytics helps organizations make smart, relevant decisions and improve overall organizational performance.

The full-text article on Applying Machine Learning in Human Resource Management Data Analysis can be accessed HERE.

Authors: Dr. Thai Kim Phung – University of Economics Ho Chi Minh City; Mr. Nguyen Phat Dat – VIB Bank; Mr. Nguyen Van Ho – University of Economics and Law, VNU-HCM.

This article is part of the series spreading research and applied knowledge from UEH with the message “Research Contribution For All”. UEH cordially invites readers to wait for the next UEH Research Insights newsletter.

News and photos:  The Authors, UEH Department of Communications and Partnership