gravatar

FMicallef

Francesca Micallef

Recently Published

Binary Logistic Regression- Analyzing Employees Evaluation for Promotion
Executive Summary- An HR team hired us to develop a model in order to predict which employees deserve a promotion to make the process of choosing a lot quicker and easier, as it tends to get delayed due to great amounts of details about each employee. We decided to remove employee ID as it is irrelevant and does not have an effect on employee promotion. Additionally, there were 9,093 missing values from our data that we had to remove. Our model would have not succeeded if we did not remove these missing values because a binary logistic regression model requires zero missing variables. After removing the missing values, we graphed each variable against our target variable of is_promoted. We did this to visualize which variables have a greater influence on employees getting recommended for promotion. Based on the graphs used, it showed that the variables department, region, education, number of training, age, previous year rating, awards won, and average training score had the greatest effects on whether an employee would be recommended for promotion. In explanation, employees who have won awards, achieved a five on the previous year rating, attained at least one training within the last year, and worked in region four all had higher chances of getting promoted. The method used to explain this data was a binary logistic regression model. This model is useful for predicting whether employees should get promoted due to its binary outcome. In other words, this model is functional for this dataset because it analyzes the ten possible factors (the independent variables) that can influence the outcome of employee promotions (dependent variable). When analyzing the results, the model correctly predicted that 8,416 employees were not recommended for promotion. On the other hand, the model incorrectly predicted ten employees for promotion when they actually did not get one. Additionally, the model correctly predicted 242 employees for recommendation and incorrectly predicted 570 employees. This model was 93.72% accurate and the p-value was 2.2e-16 which gives us statistical significance that each of these variables has an effect on employee promotion. This shows that this model is a good fit for this data since almost 94% of the classifications were accurate, which shows that the HR team can adopt this model to make this process easier and quicker. Problem Statement- The HR team of this company is having a hard time comparing all the variables that feed into a promotion of its 54,000 employees. This company needs us to make a model that will predict the employees that will be eligible for a promotion. Research Question- Would a binary logistic regression model be a good fit for our data in predicting if employees had a greater probability of being recommended for promotion based on the variables? And, which variables have the greatest influence on whether an employee is recommended for promotion or not? Methods After cleaning the data and having a better understanding of all the variables, we created a binary logistic regression model. Logistic regression is a statistical technique used to predict the relationship between predictors (independent variables) and a predicted variable (the dependent variable) where the dependent variable is binary. Binary logistic regression is useful in the analysis of multiple factors influencing a negative/positive outcome, or any other classification where there are only two possible outcomes. We chose this model since our output variable, ‘is_promoted” is a binary variable with 0 or 1 as a value. A result of 0 means the employee was not recommended for promotion and 1 means they were recommended for promotion. Therefore, a binary logistic regression model is useful for our dataset because we are analyzing multiple factors that influence if an employee is recommended for promotion or not. This technique helps to identify important factors impacting the target variable and also the nature of the relationship between each of these factors and the dependent variable. All predictor variables are tested in one block to assess their predictive ability while controlling for the effects of other predictors in the model. The goal of this model is to see if it is a good fit for our data in predicting if employees are recommended for promotion or not based on the variables and to see which variables influence the recommendation. The set.seed() function was used to set the starting number to generate a sequence of random numbers. This ensures that we get the same result if we start with that same seed each time we run the same process. The model was created using the glm() function for binary logistic regression with the ‘is_promoted’ variable as the dependent variable and the dataset is the clean data. The data was split into training and test sets to test for overfitting and accuracy. A confusion table was created for both the training and test sets to see the predicted results versus the actual results. The accuracy was also calculated for both sets. Results- When looking at the summary of the binary logistic regression model, we can see that all the departments, region 4, education of masters and above, number of training, age, previous year rating, awards won, and average training score are statistically significant. These are the factors that have the greatest effect on if an employee is recommended for promotion or not. For the training set, 33,783 employees were predicted correctly as not recommended for promotion. 62 were predicted incorrectly as recommended for promotion when they actually were not. 1,001 employees were predicted correctly as recommended for promotion. 2258 were predicted incorrectly as not recommended for promotion when they actually were recommended. The training model is 93.75% accurate. For the test set, 8,416 employees were predicted correctly as not recommended for promotion. 10 were predicted incorrectly as recommended for promotion when they actually were not. 242 employees were predicted correctly as recommended for promotion. 570 were predicted incorrectly as not recommended for promotion when they actually were recommended. The test model is 93.72% accurate, so the training set is only a tiny bit more accurate. The algorithm learns very well. The test and training accuracy are on the same scale, so there is no overfitting problem. In general kappa values > 0.75 are excellent and 0.40-0.75 are fair to good. For this classification model, the kappa is .4372 so this is fair. Even though this is not excellent, it is not the only factor that plays a role in determining if a model is good to use. The test p-value is very small, < 2.2e-16, which means the model is significant. Conclusion- We can conclude that the binary logistic regression model is an acceptable and good model for predicting if employees are recommended for promotion or not since the model is about 94% accurate and is significant. This model could be used in the future to see if employees should be recommended for promotion based on the factors that could affect an employee getting promoted. The model also told us which factors have the biggest effect on an employee getting recommended for promotion. They include the departments, region 4, education of masters and above, number of training, age, previous year rating, awards won, and average training score. Employees working in region 4 have a better chance of being recommended for promotion than employees in the other regions. All departments are statistically significant, so one department is not more likely to get recommended for promotion than another department. We can also see that having a master’s degree or higher gives you a better chance of getting recommended for promotion. The length of service is not statistically significant, so how long an employee has worked at the company does not contribute to getting recommended. The way an employee was recruited also does not influence getting recommended. Gender does not affect if an employee will get recommended for promotion, which is good to hear because it is illegal discriminating against someone’s gender. In conclusion, the model analyzed is a good model for predicting if employees are recommended for promotion and to see which variables affect an employee on getting recommended.
World Happiness
The World Happiness Report is a landmark survey of the state of global happiness. The report continues to gain global recognition as governments, organizations, and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness. he first 3 columns includes a subset of the variables and the last includes all variables to check whether multicolliniarty distorts results, but these results suggest that there isnt severe multicollinearty because the signs are consist among the columns besides generosity. The signs for 1 and 4 correspond and 2 and 4, and 3 and 4 are the same for significant.
New Haven Economic Performance Index
The New Haven Region Economic Performance Index (NHREP Index), updated in October 2021, measures the performance and prosperity of the economy in the New Haven Region over the past six months. The NHREP Index uses multiple variables to accurately determine the economy's performance as a whole. These variables include Education & Health Services for employees in New Haven, New Haven building permits, unemployment benefits claims, average weekly hours of work, and the average weekly earnings of all New Haven employees. It should be noted that just like the previous NHREP Index, the range of restrictions that were put into place due to the Coronavirus Pandemic will be factored into the region's performance. Based on Figure 1, we can see that due to the Coronavirus Pandemic at the beginning of 2020, there was an abrupt decrease in the New Haven Regions Economic Performance Index. In the middle of 2020, the NHREP Index finally began to increase with the economy beginning to boom from COVID relief. During this year, however, with vaccines being widely distributed and states loosening regulations and reopening their economies, we have seen the region's performance start to trend upwards, with a steady increase in economic performance. More businesses have been able to open, and construction has been able to resume in the region; this easing of restrictions impacted weekly earnings, weekly hours, unemployment, and building permits. The government's handling of the coronavirus has allowed the region to continue to trend upwards in the last portion of 2021 as they continue to ease restrictions. In Figure 1, we can see the forecast model for the New Haven region's economic performance; the forecast is a statistical model used to predict outcomes. Based on the forecast model, in Figure 1, it is predicted that the index will remain fairly constant for the start of 2022 with a point forecast of 179 and 181. The blue line on the graph is the predicted forecast and appears to be steadily increasing, but given many external factors that play a role in the index, we have confidence intervals that show us our confidence range. At the beginning of 2022, we are 80% confident that the index will be between 140 and 217, while we are 95% confident that it will be within a range of 123 and 241. Both the 80% and 95% confidence intervals are very large, and the region's economic performance outcome will fall within these ranges. It must be noted that the forecast is created using a number of lagged observations of time series, so it is possible the forecast is not entirely accurate due to external factors that may be unpredictable. However, the next couple of months will be imperative in determining how the region will perform post-pandemic and see how quickly it will bounce back. If the government decides it is best to shut the area down, the performance will be affected and likely trend downward once again. Another factor that might hinder the region's growth is the federal government's orders on vaccines in companies with over 100 employees; the resistance to getting vaccinated might affect the region from reaching pre-pandemic performance in the coming months. Those giving up their jobs so they do not have to get vaccinated might affect the region's weekly earnings, weekly hours, and unemployment if those do not find jobs that do not require vaccination. Despite this, the booster rollout might influence those concerned about returning to the workforce because of their fear of contracting COVID-19 that is safe enough to get back to work. With more individuals receiving the booster shot, we may expect to see the NHREP Index continue upwards. Prior to the pandemic, we can see from Table 2 the unemployment rate in New Haven's region was 4.1, and at the peak of the pandemic, the unemployment rate reached the highest rate it has ever seen in July 2020 at 11.4. Favorably as of September 2021, unemployment was down to 5.7; this number is still not where it was pre-pandemic but shows the region is on the path to pre-pandemic rates. Ideally, with the current worker shortage across the country, we will be able to see the unemployment rate continue to decrease in the coming months. Not only will seeing these job openings filled reduce the unemployment rate, but it will impact weekly earnings and weekly hours, ultimately increasing the region's performance.
COVID-19s Effect on City/Suburban Migration
The Coronavirus Pandemic affected the housing market worldwide, but specifically had an effect on migration from the city to the suburbs in Connecticut. According to the Hartford Business Journal, over 27,000 people moved from New York City to Connecticut in 2020. Many people who lived and worked in New York City, started working remotely due to COVID-19. Being able to work remotely allowed civilians the opportunity to move to the suburbs, specifically, in Connecticut. With the demand for Connecticut homes increasing over time during the pandemic, the question arises, are there enough homes for sale to meet buyer demand? An increase in demand causes prices to increase. Buying a house in Connecticut throughout the pandemic is more expensive than pre-Covid, but this is great for the sellers. According to Redfin, home prices statewide were up 3.4% year-over-year in October. At the same time, the number of homes sold fell 19.2% and the number of homes for sale fell 32.7%. The general trend for the median sale price for Connecticut homes as seen in Figure 1, is an increase over the past three years. This corresponds with the percent of homes sold above list price increasing over time, in Figure 2. The demand for houses has increased the sale prices which, in turn, have also caused houses to sell above list price. The issue that arises is the supply of homes for sale. As seen in Figure 3, generally, the number of homes for sale in Connecticut has decreased. There being fewer houses available for sale, drives the prices of the houses on the market up, since the demand for them is greater. As seen in Figure 1, the median sale price for Connecticut homes has increased over the past three years. The median sale price is significantly higher now compared to before the pandemic hit. The highest median sale price was $340,000 in June 2021. This graph shows that the pandemic had a significant impact on Connecticut homes' median sale price. The direction and pace at which home prices are changing are indicators of the strength of the housing market and whether homes are becoming more or less affordable. The median price of a home in the United States is currently $305,000. In Figure 2, from January 2021 to June 2021, there was a large spike in the percentage of homes sold above the list price. June 2021 accounted for the highest percentage of homes sold above list price, at 62%. The highest median sale price also occurred in June 2021. Homes that sold above list price likely received multiple offers. A high or growing percentage of homes selling above list price indicates that the housing market is competitive and bidding wars are becoming more common. A low or shrinking percentage of homes selling above list price suggests that the market is becoming less competitive. The number of homes for sale, according to Figure 3, has become less and less over time. June 2021 was a peak for the number of homes for sale at 12,000 homes. The supply of homes for sale has decreased since the demand was high. More homes were being bought than were available. The direction and pace at which housing supply changes indicate whether the options for buyers are increasing or decreasing. They can also indicate whether homes are lingering on the market or being sold faster than sellers are listing them. There are currently 10,806 residential homes for sale in the United States. Income is the main factor for why many people moved out of the city and into the suburbs. The average rent for a New York 1-bedroom apartment is $3,805, which is very expensive for most, especially those who lost their jobs. Unemployment during the pandemic increased sharply, which contributed to people moving out of the city. Another factor is that people wanted more space since they were working remotely. Working in a one-bedroom apartment with your spouse has proven to be difficult when trying to find a quiet space. Also, being quarantined in a tiny place isn’t ideal. Needing more private space caused many people to move into houses. During the past three years, the median sales prices of homes in Connecticut have increased along with the percentage of homes sold above list price. The number of homes for sale has decreased in the past three years since more people are moving into Connecticut than are moving out. In the future, we can expect to see these trends continuing. Connecticut houses are becoming more expensive as more people want to live here.