The primary objective and aim of this project is to analyze the spread of Corona Virus(COVID-19) worldwide. Which countries and States were most affected by this virus within this time frame. This dataset has data available from 22 Jan 2020 till 9th Feb 2020. The CSV file is taken from the WHO(World Health Organization).
This Markdown document contains some analysis of Stack Overflow Developer Survey data.
In the following document, I will be using SVM classification techinque to predict heart disease (angiographic disease status). From a set of 14 variables, the most important to predict heart failure are whether or not there is a reversable defect in Thalassemia followed by whether or not there is an occurrence of asymptomatic chest pain.
This section will analyze and study the professional lives of the participants, their major degree ,time spend studying data science topics, what job titles they hold,which ML method they actually use in the industries , which blogs the participants prefer the most for studying data science, what the participants think about the most necessary skills for a data scientist etc
This part will mostly aim at analyzing things such as what is the most preferred tool for implementing Datascience, Most used ML method, learning platform,What language they recommend etc and other more specific answers related to data-science and machine learning given by the survey participants
This is a data analytics project for mining analyzing, visualizing the data collected by the Kaggle Data science survey conducted in 2017. This part specifically focuses on some variables such as country of origin , gender, employment status etc attributes of the participants.
This article describes the concept behind k-means clustering algorithm and talks about its merits and demerits.Then fake data is generated in order to demonstrate about how to cluster data in R using K-means algorithm.
Classify messages as Spam or Ham using a simple Naive Bayes classifier and some natural language processing.
This article explains the maths behind principal components analysis which is a famous dimentionality reduction technique and a famous pre-processing tool for supervised learning.The article also gives an example on how to implement it in R.
This is the analysis and report of all the terrorist activities that took place throughout the globe form 1970-2016.It mainly aims at answering questions such as which -"country had most number of attacks","which countries had most successfully executed terrorist attacks" , then rolling down to countries such as India and drilling all the terrorist activities that took place in India etc.
This article gives a good overview of radial kernel and the use of kernel trick in SVM to apply non linear transformation to non linearly separate the data and generate non linear decision boundaries.
This articles includes an abstract view of what Support Vector Machines are and at the same time implementation of a Linear Support Vector Machine in R which is used to linearly separate the data .
Network Analysis is the study of relationships and dependencies between objects . I will use Directed Acyclic Graphs to plot the relationships in R. This is Part 1 of the complete Project because this is quiet a big one and specifically in this Article we first process, transform and scrape data and prepare it for feeding it to the Graphs and plot Directed Acyclic Graphs of Families and GOT houses.
Random Forest is a Ensembling technique which is similar to a famous Ensemble technique called Bagging but a different tweak in it. In Random Forests the idea is to Decorrelate the several trees which are generated on the different bootstrapped samples from training Data.And then we simply reduce the Variance of the Trees by averaging them.In this example I am trying to predict the median housing values from the Boston housing data set.
Trees involve stratifying or sagmenting the Predictor($X_i$) space into a number of simple Regions.The tree based Methods generate a set of $Splitting \ Rules$ which are used to sagment the Predictor Space.These techniques of sagmenting and stratifying data into different Regions $R_j$ are called __Decision Trees__.
The main aim of this project is to study the states which had the most Obese Population amongst Adults and Children as well as Teens in USA.Secondly,another objective of this project is to learn how to scrape data in R from an HTML page using rvest package and generate beautiful maps using ggplot and maps package in R.
This is a short presentation on how to implement Deep learning in R using Keras and Tensorflow packages in R.I did Digit Recognition on MNIST datset . And the results were quiet good.
This article talks about implementing Generalized additive Models in R and how they can help us to capture Non linear interactions in the data and generate linear models which are Non linear in their predictors(X) to capture non linearities between input and output variables.It also talks about the Mathematical from of the Generalized Additive Models and how they are different form others.In this example I predict the salaries of employees as a function of age,years of experience and education .
This article gives a step by step tutorial on implementing Smoothing Splines and cubic Splines and the various Mathematical forms used in these methods to add and learn Non Linear Functions.
This is a document consisting of techniques which show step by step implementation of Polynomial Regression in R i.e fitting Non Linear Models which are more flexible and help us to model Non linearities in Data easily.
This is a Human Resource Analytics project which mainly focuses on Descriptive analytics of the data,using various statistical methods and lots of data visualizations and plots , and use of R packages such as 'dplyr' , 'tidyr' and 'ggplot2'. The data set is downloaded from Kaggle and is a artificial human resource dataset. But still it covers every answers to all the questions one can think from analyzing a HR data of a firm.