gravatar

Ilyashaikall

Muhamad Ilyas Haikal

Recently Published

LBB Time Series
Clustering of Wine
Causes of Loan Defauld
House Prices Prediction
For predicting housing prices in King County, USA, the multiple linear regression performs poorly. Multiple linear regression was unable to caught every patterns in the data. The best adjusted R2 got in the multiple linear model is 0.6999 with RMSE of 149611.2. Removing every outliers and log-transforming numerical features did not help much either, as it decreases the adjusted R2 into 0.562, but also decreases RMSE into 136126.8 (which is still high). All the model also violates several linear regression model assumptions. It is recommended to use another machine learning methods such as decision tree, random forest, or deep learning for this data.
Credit Fraud Detection
Credit card fraud detection is a challenging problem that requires analyzing large amounts of transaction data to identify patterns of fraud. For the purposes of this project, I trained two prediction models to perform the same forecasting task and then compared the results to decide the final "best" forecast model with the highest accuracy. I tried to deal with imbalanced datasets using a sampling technique, specifically a credit card fraud transaction dataset where the proportion of fraudulent cases to total transactions is quite small. Since I balanced the data before training the model, I can use both the confusion matrix accuracy and the accuracy using the Area Under the Precision-Recall Curve (AUC) to analyze the predictions of my models. Although Nearest Neighbors models performed well, the Logistic Regression model yielded the highest accuracy of 0.9324324 with an AUC of 0.9732012. I will present the Logistic Regression as my final model. The hope is that in the near future, more accurate fraud detection systems can be developed to assist fraud investigators in detecting fraudulent transactions and proposing better policies for real-world regulation.
Document
Latihan Divedeeper