Recently Published
Predicting Gratuity Amount for Taxi Drivers
We began by selecting linear (univariate and multivariate) regression models to examine how they fit our data. Linear regression is a conventional, common approach that may explain the association with tip well, so we test it first. Following that, we proceed to use lasso, ridge, and principal component analysis (PCA) to strengthen our linear model. We also employ decision trees and random forest as regressions. The capacity of decision trees to mimic non-linear connections is one of its advantages. According to our EDA (https://rpubs.com/chiraglakhanpal/975131), journey duration, distance, and fare are all linearly connected over small distances, but this connection weakens over longer distances due to the involvement of other possible factors. Consequently, there may be, in fact, a non-linear relationship with tip, too.
Demystifying Components of Riding Hailing
One of the most famous pictures of New York is the wave of yellow taxi taxis flooding the streets. So, where better to research taxi cab data than New York City? This is exactly what we intended to do. From 2009 until the present, the NYC Taxi and Limousine Commission (TLC) has gathered massive amounts of data for every taxi travel in New York City. We set out to get our hands dirty and put the sophisticated analysis we learnt over the semester to work. We wanted to see how parameters like pick-up location, distance, number of passengers, and drop-off location impact the tipping behaviour of NYC taxi drivers.