gravatar

kaku

Kamal Dobriyal

Recently Published

Junk Detection
In daily life we receive mails and messages from different people and those people can send you mails/messages by ordinary purpose but some of them can send you junk mail/messages which is not necessary for you at all. This project is also about classification of mail/messages as ham (not junk) and spam (junk). For this we taken use of few libraries and Singular Value Decomposition method has been also used with Term Frequency - Inverse Document Frequency applied on data. Text length and N-grams also affected the classification, We have used Random Forest machine learning model for classification.
Smart Keyboard
Smart Keyboard is a web app working on NLP n_grams models. It's interface is very simple, there a sidebar which have all instructions for any new user. The user just have to enter text on given text field and as soon as space is entered a call goes to backend with the entered text as input. Eventually user sees atmost three words in green color which are predicted by the app.
Milestone Report (Next-Word-Prediction)
The main goal of this report is to present the exploratory analysis of datasets which will be considered for Next-Word-Prediction project and build n_gram models for the text prediction algorithm and also includes the steps to be taken in future for the development of shiny app.
1985 Ward's Automotive Yearbook-Automobile
When a new automobile launches in the market its price depends upon various features and factors which mainly includes brand, fuel-type, body-style, engine size, horse-power etc. But every automobile is built while keeping the budget in mind so obviously more the budget more and good the features in the automobile as all the features and factors definitely contribute in the price of automobile. Thus we are going to choose a model which can precisely and significantly predict the price of any automobile depending on the respective features and factors.
Fitness Prediction
In this project, we will use data from accelerometers on the belt, forearm, arm, and dumbbell of 6 participants to predict the manner in which they did the exercise. This is the “classe” variable in the training set. We train 4 models: Decision Tree, Random Forest, Gradient Boosted Trees, Support Vector Machine using k-folds cross validation on the training set. We then predict using a validation set randomly selected from the training csv data to obtain the accuracy and out of sample error rate. Based on those numbers, we decide on the best model, and use it to predict 20 cases using the test csv set.
MTCars-Data Analysis
In this project we are going to analyze the relationship between the MPG i.e., miles per gallon with other factors of a car, for which we are going to use the mtcars dataset which consists of many characteristics of a car in columns for different cars in rows. Analysis is focused on two questions: * Is an automatic or manual transmission better for MPG * Quantify the MPG difference between automatic and manual transmissions
NOAA Storm Data Analysis
This data analysis involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.