Recently Published
Feature Selection for Regression Problem
We develop a demo, which is entirely contrived but we hope convincing, to show that, given a large number of variables, we can find not only the significant variables but also the coefficients for a regression problem using a suitable feature selection library in python. For this demonstration we want a relatively small percentage, say 5%, of the given variables to be significant. In R, we create a set of 200 variables and 200 observations. Note we must have at least 200 observations since we are given 200 variables. We then call python function defined in a python script. We are able to find the desired variables and the prescribed coefficients. We expect each application will require customization, and even different methods. We call SelectKBest and pass a suitable *score* function.
Slide Deck for Flight Performance App
Quickly compare airlines using actual elapsed times and delays.
Instantly obtain the expected delay on arrival.
Highly flexible with full spectrum of options for cities, airline and flight times.
Fast, responsive, efficient with professionally designed layout.
User-friendly GUI with familiar click and scroll interface.
Report of Study using BTS Databases
A study to develop a useful Shiny Web app to compare delays with different airlines and to calculate expected delay at the destination, given only the delay at the origin and the scheduled elapsed time.
Software Development in R Programming Language
An overview of software development in R under RStudio. These slides are intended to highlight software development especially from a data scientist's point of view. It is not intended to be comprehensive treatment.
Chat Assistant App
A Shiny web app to predict the next word using the last few words. Automatic spell checking is provided. The transformed text used for prediction is also shown and may be helpful to understand text pasted from chat.
Chat Assistant App Report
Provides an executive summary, discusses background and prior work, presents some of the code and displays graphical results.
Presentation
Slides for Shiny application of interactive S&P 500 chart
SNP Index with Volume
Plots index and volume for five year period using plotly
Analysis of Events from NOAA Storm Database
The basic goal of this assignment is to explore the NOAA Storm Database and answer the following questions:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?