Recently Published
Apple Stock price
\This project contains many interesting facts and stock exchange of apple between the years from = "2008-01-01", to = "2019-12-31", it was a cool and interesting project to me, So kindly check over it and have fun.
Salary Prediciton System
MIni Project
SENTIMENT ANALYSIS ON PM NARENDRA MODI'S SPEECH
title: "Modi Ji Speech Sentiment Analysis Score"
output: html_document
author: Mano R
In this project, we are up to find the sentiment nature of the comment that are given by the public on the Youtube platform By our Honorable Prime Minister Narendra Modi ON MAY 12 2020.
Datasaras
Little dino
Hierarchical Clustering
Problem Statement
You own the mall and want to understand the customers like who can easily converge [Target Customers] so that the sensor can be given to the marketing team and plan the strategy accordingly.
Inspiration
By the end of this case study, you would be able to answer the below questions.
1- How to achieve customer segmentation using a machine learning algorithm (Hierarchical Clustering) in R in the simplest way.
2- Who are your target customers with whom you can start marketing strategy [easy to converse]
3- How the marketing strategy works in the real world
Bar plot animation
This data contains GDP value of most of the countries for several years (especially from 2000 to 2017).
Kmeans
K Means Clustering Project Usually when dealing with an unsupervised learning problem, it's difficult to get a good measure of how well the model performed. For this project, we will use data from the UCI archive-based off of red and white wines (this is a very commonly used data set in ML).
We will then add a label to the combined data set, we’ll bring this label back later to see how well we can cluster the wine into groups.
Artifical neural network
This data set contains details of a bank's customers and the target variable is a binary variable reflecting the fact whether the customer left the bank (closed his account) or he continues to be a customer.
Neuralnet
Neural Net Project
We’ll use the Bank Authentication Data Set from the UCI repository.
Apiriori
n this project we are going to use the Apriori algorithm to perform a Market Basket Analysis using UCI dataset
Cardiotocograph Analysis & Prediction
2126 fetal cardiotocograph (CTGs) were automatically processed and the respective diagnostic features measured. The CTGs were also classified by three expert obstetricians and a consensus classification label assigned to each of them. The classification was both with respect to a morphologic pattern (A, B, C. ...) and to a fetal state (N, S, P). Therefore the dataset can be used either for 10-class or 3-class experiments.
My first dashboard using Rstudio
This is an Interactive dash create using plotly and Rstudio
Natural Language Processing
The dataset used to Sentiment Analysis is the Dataset from the UCI dataset(Hotel), This project is based upon the prediction of the word based on the model I trained in the Monkeylearn engine. The following results are displayed below
Kaggle Bikeshare demand updated
You are provided hourly rental data spanning two years. For this competition, the training set is comprised of the first 19 days of each month, while the test set is the 20th to the end of the month. You must predict the total count of bikes rented during each hour covered by the test set, using only information available prior to the rental period.
Data Fields
datetime - hourly date + timestamp
season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday - whether the day is considered a holiday
workingday - whether the day is neither a weekend nor holiday
weather - 1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp - temperature in Celsius
atemp - "feels like" temperature in Celsius
humidity - relative humidity
windspeed - wind speed
casual - number of non-registered user rentals initiated
registered - number of registered user rentals initiated
count - number of total rentals
Neuralnet
Data Set Information:
Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.
Attribute Information:
1. variance of Wavelet Transformed image (continuous)
2. skewness of Wavelet Transformed image (continuous)
3. curtosis of Wavelet Transformed image (continuous)
4. entropy of image (continuous)
5. class (integer)
Kmeans Cluster
Usually, when dealing with an unsupervised learning problem, it's difficult to get a good measure of how well the model performed. For this project, we will use data from the UCI archive-based off of red and white wines (this is a very commonly used data set in ML).
We will then add a label to the combined data set, we'll bring this label back later to see how well we can cluster the wine into groups.
Support vector machine
Support Vector Machines Project For this project we will be exploring publicly available data from LendingClub.com. Lending Club connects people who need money (borrowers) with people who have money (investors). Hopefully, as an investor, you would want to invest in people who showed a profile of having a high probability of paying you back. We will try to create a model that will help predict this.
The lending club had a very interesting year in 2016, so let’s check out some of their data and keep the context in mind. This data is from before they even went public.
We will use lending data from 2007-2010 and be trying to classify and predict whether or not the borrower paid back their loan in full. You can download the data from here or just use the CSV already provided. It’s recommended you use the CSV provided as it has been cleaned of NA values.
Here are what the columns represent:
credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise. purpose: The purpose of the loan (takes values “credit_card”, “debt_consolidation”, “educational”, “major_purchase”, “small_business”, and “all_other”). int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be riskier are assigned higher interest rates. installment: The monthly installments owed by the borrower if the loan is funded. log.annual.inc: The natural log of the self-reported annual income of the borrower. di: The debt-to-income ratio of the borrower (amount of debt divided by annual income). fico: The FICO credit score of the borrower. days.with.cr.line: The number of days the borrower has had a credit line. revol.bal: The borrower’s revolving balance (amount unpaid at the end of the credit card billing cycle). revol.util: The borrower’s revolving line utilization rate (the amount of the credit line used relative to total credit available). inq.last.6mths: The borrower’s number of inquiries by creditors in the last 6 months. delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years. pub.rec: The borrower’s number of derogatory public records (bankruptcy filings, tax liens, or judgments).
First_Package
This a package that contains a library(only).
Contains a function which is useful in grabbing an hour from the DateTime column form the dataset
Decision Tree and Random forest
For this project we will be exploring the use of tree methods to classify schools as Private or Public based off their features.
Let's start by getting the data which is included in the ISLR library, the College data frame.
A data frame with 777 observations on the following 18 variables.
Private A factor with levels No and Yes indicating private or public university
Apps Number of applications received
Accept Number of applications accepted
Enroll Number of new students enrolled
Top10perc Pct. new students from top 10% of H.S. class
Top25perc Pct. new students from top 25% of H.S. class
F.Undergrad Number of fulltime undergraduates
P.Undergrad Number of parttime undergraduates
Outstate Out-of-state tuition
Room.Board Room and board costs
Books Estimated book costs
Personal Estimated personal spending
PhD Pct. of faculty with Ph.D.’s
Terminal Pct. of faculty with terminal degree
S.F.Ratio Student/faculty ratio
perc.alumni Pct. alumni who donate
Expend Instructional expenditure per student
Grad.Rate Graduation rate
Linear regression(Bikeshare)
Linear Regression Project
For this project you will be doing the Bike Sharing Demand Kaggle challenge! We won't submit any results to the competition, but feel free to explore Kaggle more in depth. The main point of this project is to get you feeling comfortabe with Exploratory Data Analysis and begin to get an understanding that sometimes certain models are not a good choice for a data set. In this case, we will discover that Linear Regression may not be the best choice given our data!
Instructions
Just complete the tasks outlined below.
Get the Data
You can download the data or just use the supplied csv in the repository. The data has the following features:
datetime - hourly date + timestamp
season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday - whether the day is considered a holiday
workingday - whether the day is neither a weekend nor holiday
weather -
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp - temperature in Celsius
atemp - "feels like" temperature in Celsius
humidity - relative humidity
windspeed - wind speed
casual - number of non-registered user rentals initiated
registered - number of registered user rentals initiated
count - number of total rentals
Classification problem(KNN)
Data Set Information:
This is perhaps the best-known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Predicted attribute: class of iris plant.
This is an exceedingly simple domain.
This data differs from the data presented in the Fishers article (identified by Steve Chadwick, spchadwick '@' espeedaz.net ). The 35th sample should be 4.9,3.1,1.5,0.2, "Iris-setosa" where the error is in the fourth feature. The 38th sample: 4.9,3.6,1.4,0.1, "Iris-setosa" where the errors are in the second and third features.
Attribute Information:
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica
your first Animation
animation using gganimate
Logistic Regression
In this project we will be working with the UCI adult dataset. We will be attempting to predict if people in the data set belong in a certain class by salary, either making <=50k or >50k per year.
Document
Salary predicition
Plot
co relation
HTML
mtcars dataframe (animation with plotly)
Corruption and human Development
A visual which shows the Corruption and human development in the following countrys