Logistic Regression Model on Why HR Leaving | Predicting employee attrition using Machine Learning

Question

Logistic Regression Model on Why HR Leaving | Predicting employee attrition using Machine Learning

asked Jun 8 in Python Programming by Sharda Chaudhary Goeduhub's Expert (2.2k points)

Predict retention of an employee within an organization such that whether the employee will leave the company or continue with it. An organization is only as good as its employees, and these people are the true source of its competitive advantage. Dataset is downloaded from Kaggle. Link: https://www.kaggle.com/giripujar/hr-analytics

First do data exploration and visualization, after this create a logistic regression model to predict Employee Attrition Using Machine Learning & Python.

Goeduhub's Top Online Courses @Udemy

For Indian Students- INR 360/- || For International Students- $9.99/-

S.No.	Course Name	Coupon
1.	Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence	Apply Coupon
2.	Natural Language Processing-NLP with Deep Learning in Python	Apply Coupon
3.	Computer Vision OpenCV Python \| YOLO\| Deep Learning in Colab	Apply Coupon

More Courses

4 Answers

Milind741 · Answer 1 · 2021-06-08T12:46:59+0000

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from sklearn.linear_model import LogisticRegression

df = pd.read_csv("/content/HR_comma_sep.csv")

#df.head()

#Logistics Regression model

df1 = df[['salary','satisfaction_level',

'average_montly_hours',

'promotion_last_5years','left']]

dummies = pd.get_dummies(df1.salary)

df1 = pd.concat([df1,dummies],axis = 'columns')

df1 = df1.drop(['salary','medium'],axis='columns')

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df1[['satisfaction_level', 'average_montly_hours', 'promotion_last_5years','high','low']],df1.left, test_size =2/3,random_state = 1)

model = LogisticRegression()

model.fit(X_train,y_train)

model.score(X_test,y_test)

***************************** O U T P U T *****************************

0.7828

Naresh Kumar · Answer 2 · 2021-06-09T10:42:51+0000

                          #GO_STP_379
                        
                          # In this task we have to find the students scores based on their study hours. 
                        
                          # This is a simple Regression problem type because it has only two variables.
                        
                          import pandas as pd
                        
                          data = pd.read_csv('HR_comma_sep.csv')
                        
                          # exploration of data
                        
                          print("-------exploration of data------------")
                        
                          print(data.info())
                        
                          print(data.head())
                        
                          # laber encoder of data
                        
                          from sklearn.preprocessing import LabelEncoder
                        
                          col=['Department','salary']
                        
                          label_encoder =LabelEncoder()
                        
                          data['Department']= label_encoder.fit_transform(data['Department'])
                        
                          data['salary']= label_encoder.fit_transform(data['salary'])
                        
                          print("after the laber encoder : \n",data)
                        
                          # LogisticRegression of data
                        
                          from sklearn.linear_model import LogisticRegression
                        
                          from sklearn.model_selection import train_test_split
                        
                          from sklearn.metrics import confusion_matrix,accuracy_score
                        
                          ft=data[['Department','satisfaction_level','salary']]
                        
                          label=data['left']
                        
                          xtrain,xtest,ytrain,ytest=train_test_split(ft,label)
                        
                          my_model=LogisticRegression()
                        
                          my_model.fit(xtrain,ytrain)
                        
                          y_pred=my_model.predict(xtest) # y test
                        
                          cm=confusion_matrix(ytest,y_pred)
                        
                          print("confusion matrix: ",cm)
                        
                          print("accuracy socre: ",accuracy_score(ytest,y_pred))
                        
                          print("socre: ",my_model.score(xtrain,ytrain))
                        
                          # visualization of data
                        
                          import matplotlib.pyplot as plt
                        
                          plt.subplot(2,2,1)
                        
                          plt.scatter(ytest, y_pred, marker = '+')
                        
                          plt.xlabel('xtest')
                        
                          plt.ylabel('y prediction')
                        
                          plt.legend()
                        
                          plt.title('Prediction of company')
                        
                          plt.subplot(2,2,2)
                        
                          plt.scatter(x=data['salary'], y=data['left'],label='salary and left')
                        
                          plt.xlabel('x')
                        
                          plt.ylabel('y')
                        
                          plt.legend()
                        
                          plt.title('salary and left')
                        
                          plt.subplot(2,2,3)
                        
                          plt.scatter(x=data['satisfaction_level'], y=data['left'],label='satisfaction level and left')
                        
                          plt.xlabel('x')
                        
                          plt.ylabel('y')
                        
                          plt.legend()
                        
                          plt.title('satisfaction level and left')
                        
                          plt.subplot(2,2,4)
                        
                          plt.scatter(x=data['time_spend_company'], y=data['left'],label='time_spend_company and left')
                        
                          plt.xlabel('x')
                        
                          plt.ylabel('y')
                        
                          plt.title('time_spend_company and left')
                        
                          plt.legend()
                        
                          plt.show()
                        
                          # logistic regression model to predict Employee Attrition
                        
                          #create a pipeline for Logistic Regression
                        
                          from sklearn.externals import joblib
                        
                          import joblib as joblib
                        
                          import pickle
                        
                          with open('model_save','wb') as file:
                        
                              pickle.dump(my_model,file)
                        
                          #load model and prediction
                        
                          with open('model_save','rb') as file:
                        
                              newmodel=pickle.load(file)
                        
                          # newmodel.coef_
                        
                          joblib.dump(my_model,'model_joblib')
                        
                          mymodel=joblib.load('model_joblib')
                        
                          print("my model: ",mymodel)
                        
                          print("new model: ",newmodel)
                        
                          print("file is :",file)

Online Courses	Free Tutorials	Go to Your University	Placement Preparation

Online Training - Youtube Live Class Link

Logistic Regression Model on Why HR Leaving | Predicting employee attrition using Machine Learning

Goeduhub's Top Online Courses @Udemy

For Indian Students- INR 360/- || For International Students- $9.99/-

Please log in or register to answer this question.

4 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions