Recently Published
Rice (Cammeo and Osmancik) Data Set
https://archive.ics.uci.edu/ml/datasets/Rice+%28Cammeo+and+Osmancik%29#
Data Set Information:
Among the certified rice grown in TURKEY, the Osmancik species, which has a large planting area since 1997 and the Cammeo species grown since 2014 have been selected for the study. When looking at the general characteristics of Osmancik species, they have a wide, long, glassy and dull appearance. When looking at the general characteristics of the Cammeo species, they have wide and long, glassy and dull in appearance. A total of 3810 rice grain’s images were taken for the two species, processed and feature inferences were made. 7 morphological features were obtained for each grain of rice.
Attribute Information:
1.) Area: Returns the number of pixels within the boundaries of the rice grain. 2.) Perimeter: Calculates the circumference by calculating the distance between pixels around the boundaries of the rice grain. 3.) Major Axis Length: The longest line that can be drawn on the rice grain, i.e. the main axis distance, gives. 4.) Minor Axis Length: The shortest line that can be drawn on the rice grain, i.e. the small axis distance, gives. 5.) Eccentricity: It measures how round the ellipse, which has the same moments as the rice grain, is. 6.) Convex Area: Returns the pixel count of the smallest convex shell of the region formed by the rice grain. 7.) Extent: Returns the ratio of the regionformed by the rice grain to the bounding box pixels. 8.) Class: Cammeo and Osmancik rices
HCV data Data Set
The target attribute for classification is Category (blood donors vs. Hepatitis C (including its progress (‘just’ Hepatitis C, Fibrosis, Cirrhosis). https://archive.ics.uci.edu/ml/datasets/HCV+data#
Homework No 7
Multidimensional data: projections:
1) Find principal components of your data (use: Matlab: pca, R: princomp, or other).
2) Make visualizations in two projection.
3) Make the attribute axis representation.
4) Select the most informative/informative attributes possible according to the shortest / longest axes
5) Create different subsets of attributes (informative only, non-informative only, no non-informative attributies, etc.) and visualize the data using the nonlinear projection method - MDS
6) Present different visualizations, comment on the result.
Homework No.5
Goal: calculate distances/similarities of multidimensional objects using following techniques: a. Manhattan
b. Euclidean
c. Cosine
d. SMC
e. Jaccard
Meaningfully apply appropriate distance/similarity techniques to get the most similar subgroup of objects.
Comment the results. Are subgroups of similar objects obtained using different techniques are composed of the same objects?
For a distance/similarity techniques that cannot be applied to your data, select a test dataset.
Homework No.6
Choose two methods from Multidimensional data: direct methods
Make visualizations
Prepare presentation and upload.
Homework No.4
Describe Your Data:
4 erroneous visualizations and correct versions.
4 lyging/misleading vizualization and correct versions.
Homework No.3
CAR INSURANCE CLAIM DATA
Describe Your Data:
Chose 4 simple visualization methods (boxplot must be included) for your data visualization.
Present and upload your results.
Homework assignment No.1
CAR INSURANCE CLAIM DATA
1.Choose a data set (the number of data attributes should be more than 5), explain why it is important or interesting for you. 2 2. Formulate research questions (for which you expect to find the answers)
3. Make some visualizations for the formulated questions.
4. Prepare a presentation (where you explain the data, questions, problems, results) and upload it.
Homework assignment No.2
CAR INSURANCE CLAIM DATA
Describe Your Data:
Data types.
Statistcs (mean, min, max, etc. depending on the data types), use box plots and other similar plots to illustrate it.
Create basic visualizations of your data.
Check for periodicity in your data, show it (if there is no seasonality, show that there is no seasonality).