gravatar

MaryamM

Maryam

Recently Published

CW-11 23-04-2024
CW-10 09-04-2024
CW-9 02-04-2024
CW-11 16-04-2024
Linear Regression
Project 3 Using Regression Trees: Titanic Data Analysis
For each question, write the corresponding R command & attach the relevant graphs. 1) Select import dataset as csv from https://goo.gl/At238b 2) Load into R into a variable called T3 3) Build a new dataset, titanic, by selecting these features: survived, embarked, age, sex, sibsp, parch and fare. 4) Perform a statistical analysis of the titanic dataset. 5) Survived is the dependent variable, find its proportion in the dataset. 6) Remove NAs if any. 7) Make Survived embarked and sex as factors. 8) Find the correlation matrix between survival and the other features. 9) Plot survival with other features to see if any correlation exists 10) Set a seed to 1000 and use it to split the dataset into 80% training, 20% testing 11) Build your training (till index 1046) and test (till index 1308) datasets 12) Download the rpart for “Recursive Partitioning and Regression Trees” if it is not in your R base and bring it in your environment. 13) Train your learner on the training dataset and save your model in a variable fit fit <- rpart(Survived ~ Sex + Age + SibSp + Parch + Fare + Embarked, data=train, method="class") 14) Plot your regression tree and save plot into an image file 15) You must have obtained a plot that is not very readable. In order to get a better tree, download the following packages and bring them into your environment: rattle, rpart.plot and RColorBrewer 16) Type the R command fancyRpartPlot(fit) to obtain a more readable plot 17) Examine the tree you obtained. What is the most “important” feature over which the tree first split? Do you agree with the Titanic slogan “ Women and children first!” 18) Use the predict function with your model fit to make predictions on the test dataset and save it in a variable Prediction: Prediction <- predict(fit, test, type = "class") 19) Save the results in a data frame Results that will have two columns 1: PassengerSex =test$Sex, and 2: Survived = Prediction 20) Save your data frame in a .csv file by using write.csv(Results, file = "Titanicdtree.csv", row.names = FALSE) 21) Submit your R code as a link to Rpubs and Titanicdtree.csv file on Blackboard by Tuesday, April 13th 2024.
CW-7,8 Credit Decision Trees
Project 2- Data Mining
-Using the dataset provided to your group, perform an analysis of your dataset and apply unsupervised learning (clustering). 1- Import the dataset to R 2-Perform a statistical analysis of the dataset 3- Find if any entries are NA and remove them 4- Normalize/standardize your dataset (Scale function) 5- Apply the hierarchical clustering (try many cluster values) 6- Apply the K means clustering (try many cluster values ) 7- Since the dataset is not labeled, you cannot use external validation/evaluation procedures. However, you can use the internal evaluation procedures, seen in class (silhouette coefficient and Dunn’s coefficient). 8- Analyze your results in terms of number of clusters, which k value gives the best result. Can you find a meaning to these clusters?
CW-6 05-03-2024
wisc_bc_dataset k-Nearest Neighbors (k-NN) algorithm
CW-5 27-02-2024
Seeds_dataset Analysis
Project 1_S24
#6: Load Your Machine Learning Datasets #7: Understand Your Data Using Descriptive Statistics #8: Understand Your Data Using Data Visualization #9: Prepare Your Data For Machine Learning With Pre-Processing
20-02-2024 CW-4
Document
13-02-2024 CW-3
02-07-2024 CW-2
CW-1
Document