gravatar

s3256292

Binh Le

Recently Published

Assignment 3 - Binh Le - Story Telling with Open Data
Vietnamese Sign Language development snapshot
MATH2270 - Data Visualisation - Asignment 2
Binh Le (s3256292)
Data Preprocessing Assignment 3 - Vinh Loi Chau - s3699871 | Binh Chon Nhut Le - s3256292
In this markdown, we will be processing data from Australian Bureau of Statistics (ABS) for Income and Method of transportation. Through many cleaning steps including checking for null, infinite values and transpose row to columns, we come out with 3 columns in which 2 of them are used as combined primary key (Region Name and Code) to join 2 datasets together*. We also remove some redundant columns such as year as all of our data is all collected in 2016. To make it perfect, we also re-label the data to improve comprehension. We then tried to identify outliers. However, after consideration the nature of our data, we decided it would not reasonable to remove them. Finally, the variable of income from over 3000 (I3000) would be selected for transformation since it had a skewed-right distribution. BoxCox transformation was implemented as it performed better for this attribute as compared to other methods. *Note: For these datasets, the joining can only be performed after data tidying to get them into the right format ready for combining. Therefore, the merging step would be moved to Tidy section.