RPubs

by RStudio

oaxacamatt

Matt Curcio

Recently Published

Convert Square Meters to Square Feet

First RStudio Cloud Test

over 6 years ago

Keras - First Test Demo

The Fashion MNIST dataset https://tensorflow.rstudio.com/keras/articles/tutorial_basic_classification.html

over 6 years ago

Produce % AA Composition

From small test harness sample of O2 binding proteins

about 7 years ago

Multiple Caret Run: Naive Bayes, KNN, C5.0 with AA Test-harness dataset

Using `caret` and `caretEnsemble`

about 7 years ago

EM test

Title: EM test Summary: Model based clustering:Expectation Maximization

about 7 years ago

Iris-caret-6ML-models

KNN=modelknn, C50=modelc50, SVMR=modelSvm, NB=modelnb, CART=modelCart, LDA=modellda

about 7 years ago

Three-ML-Model-Run-Using-Caret

over 7 years ago

* Calculate correlation matrix * Find attributes that are highly corrected (ideally > 0.75) * Learning Vector Quantization (LVQ) model * Construct an Learning Vector Quantization (LVQ) model. The varImp is then used to estimate the variable importance, which is printed and plotted. * Estimate variable importance using filterVarImp * Recursive feature selection

over 7 years ago

Tuning RF Using ntree

over 7 years ago

Tuning RF with ‘mtry’

over 7 years ago

Random-Forest-of-AA-on-Test-Harness-w-Scale-Center

Random forest using 7 x 500 test-harness with 5 x 10 fold cross-validation.

over 7 years ago

Train RF with Test-Harrness dataset

over 7 years ago

Produce_Single_AA_Percent_Composition_For_Test-Harness.rmd

Take a test-harness (randomly sampled file of protein classes) and produce the percent amino acid composition from the proteins listed. In this case there are 3500 proteins.

over 7 years ago

Alt-500-Test-Harness.rmd

This program samples 500 polypeptides from each of the files and combines them into a test-harness file. This test harness file (4MB) is much smaller than if I were to run the full dataset of 27MB.

over 7 years ago

EDA-Part2-Oxygen-binding.rmd

A. Plot of Variances (x 10^5) Vs Amino Acid Type B. Plot of Means (% of Total) Vs Amino Acid Type

over 7 years ago

EDA-Part1-Oxygen-Binding-Proteins

One important question which I would like to answer with this exploratory data analysis, is this data set sufficient to successfully classify the seven classes of protein using a Random Forest machine learning approach?

over 7 years ago

is.palindrome

over 7 years ago

Fibonacci Numbers

What is Fib(37)?

over 7 years ago

The Monty Hall Problem

Who Wins?

over 7 years ago

Pima Indian Diabetes data

over 7 years ago

cli_test_generator

Produces 5000 files in the FASTA format then compresses them into a bz2 archive. This program will be the basis for producing all CLI Tests for any number of students.

almost 8 years ago

Bioconductor TM

See: https://rpubs.com/oaxacamatt/R-cran-TM This notebook is a continuation from part A which looks at the words from R-cran and does analysis on them. However this notebook looks at the words from Bioconductor packages. We will then take the output from these two sets of data and do a simple set comparison, looking A intersect B, A-B and B-A.

almost 8 years ago

Text mining R-cran Descriptions - Part A

## Text mining R-cran Work with the text from ONLY the second column of 'R-cran' text found on [R-cran package short description](https://cran.r-project.org/) followed by 'Packages' then 'Table of available packages, sorted by name' For Part B see: https://rpubs.com/oaxacamatt/Bioconductor-TM

almost 8 years ago

Microbenchmarking for a random number generator

The take home message is that 'ceiling' is better than 'as.integer' due to the fact that 'as.integer' returns NAs by coercion. Also using 'ceiling' is slightly faster. ;)

about 8 years ago

Expression quantitative trait loci, Part 1

about 8 years ago

Generate Random RNA / FASTA

over 8 years ago

Generate Random Protein / FASTA

over 8 years ago

Generate Random DNA Sequences

over 8 years ago

Plotting with Expressions in Labels

Brownian Motion

over 8 years ago

Stratified Means

over 8 years ago

Math Pix

over 8 years ago

Basic Histogram

Basic Histogram - you need to practice this, using GGplot.

over 8 years ago

Joy Division - Unknown Pleasures Album cover

I found this script on Rbloggers.com and I liked it and tweaked it a little bit.

over 8 years ago

Publish Document

title: "M.L. Using O2-Binding AA" subtitle: "1. Exploratory Investigation of Percent AA composition" author: "Matthew Curcio"

over 8 years ago

Use of Machine Learning (CART) wrt Oxygen-Binding Proteins

over 8 years ago

Use of Machine Learning wrt Oxygen-Binding Proteins

This module uses Random Forest against the %AA composition (using 20AA Not Dipeptides) of Oxygen binding proteins to predict classification

over 8 years ago

Exploratory Data Analysis on MONO Peptides

almost 9 years ago

Random Forest Investigation

This is the first portion of my oxygen binding proteins machine learning project. This portion only displays the random forest generation using the percent amino acid composition.

almost 9 years ago

O2 Binders

This is the start of my project to ?? first step produce data frames of o2 binders.

almost 9 years ago

04_10_eQTL

This does not give the same values as the script produced by J Leek.

almost 9 years ago

Global temperatures since 1880

almost 9 years ago

JLeek_GLM 3_14

"GLM_3_14_pvalues_multiple_testing"

about 9 years ago

Coursera Stats for Genomics, Quiz 3, Ques3

about 9 years ago

Coursera Stats for Genomics, Quiz 3, Ques 1

about 9 years ago

coursera_genomics_stats_wk2_quiz_ques5

What is the dimension of the residual matrix, the effects matrix and the coefficients matrix?

about 9 years ago

coursera_genomics_stats_wk2_quiz_ques4

Linear Regression Modelling

about 9 years ago

coursera_genomics_stats_wk2_quiz_ques2

Set the seed to 333 and use k-means to cluster the samples into two clusters. Use svd to calculate the singular vectors.

about 9 years ago

coursera_genomics_stats_wk2_quiz_ques4

Linear Models with Indicator variable for M sex.

about 9 years ago

coursera_genomics_stats_wk2_quiz_ques3

Linear Models

about 9 years ago

coursera_genomics_stats_wk2_quiz_ques1

Singular Value Analysis and Percent Variance

about 9 years ago

Quiz Wk 1, Question #9

Statistics for Genomic Data Science

about 9 years ago

MCC Bioconductor Project 4

over 9 years ago

Stratified Means and Totals Using Proportions

almost 10 years ago

Test 3

almost 10 years ago

Basic GRanges & AnnotationHub

almost 10 years ago

Bioconductor Exam 2

almost 10 years ago

Coursera Bioconductor Quiz 2

Epigenomics roadmap data

almost 10 years ago

Bioconductor for Genomic Data Science, Quiz 1

Coursera author: "MCC" date: "September 30, 2016"

almost 10 years ago

Bioconductor Outline Course, Quiz # 1

almost 10 years ago

Model fitting for an interesting puzzle

See: https://youtu.be/i4VqXRRXi68

almost 10 years ago

Example 1 of Sentiment Analysis Using Natural Language Processing

This example sentiment analysis uses Natural Language Processing (NLP) using R and the library TM (text mining package, https://cran.r-project.org/web/packages/tm/). The corpus uses 81 job postings using the key words, computation biology or bioinfromatics from http://www.biospace.com.

over 10 years ago

Sign In

oaxacamatt

Matt Curcio

Recently Published