gravatar

grigory

Gregory E Kanevsky

Recently Published

Dallas Animal Services Shelters - Exploratory and Survival Analysis
Dallas OpenDatta Dallas Animal Shelter Data pertains to operational processes carried out by shelter personnel who assist citizens by receiving surrendered and stray animals, facilitating adoptions, transferring animals to rescue groups, and providing care to the animals in the shelter every day. Shelter personnel document their work using Chameleon software, an animal shelter management program. The Dallas Animal Shelter Data is updated daily to help citizens better understanding the operational processes that the shelter personnel perform daily for the animals and citizens of the City of Dallas.
Dallas Animal Shelter Dog Stays: Intake Types of Admission to Outcome of Discharge
“Helping Dallas be a safe, compassionate, and healthy place for people and animals”. Dallas Animal Shelter Data pertains to operational processes carried out by shelter personnel who assist citizens by receiving surrendered and stray animals, facilitating adoptions, transferring animals to rescue groups, and providing care to the animals in the shelter every day. Shelter personnel document their work using Chameleon software, an animal shelter management program. The Dallas Animal Shelter Data is updated daily to help citizens better understanding the operational processes that the shelter personnel perform daily for the animals and citizens of the City of Dallas. The start date is October 01, 2014 and through end of July, 2017
Survival Analysis and Cox PH Regression with R and Aster R
Cox proportional-hazards regression analysis becomes popular method of predicting failure and survival events in variety of business use cases. We discuss the method, its applications, and examples using R and Aster R environments.
Data Science Pipelines with R and Aster - Use Cases with Examples
Scalable data science with Teradata Aster analytical platform and R culminates with design, development, and deployment of data science pipelines. The 2d presentation on data science pipelines with R includes discussion of the use cases and examples which include data manipulation similar to dplyr or Spark, summarization techniques, complex in-database computations and execution, in-database principal component analysis (PCA) and logistic regression with R and SQL-MR, and more.
Sentiment Analysis of Reviews on iTunes - PodCruncher
iPhone App developers care the most about its App rankings in App Store. One of main factors contributing to rankings are user ratings. Together with rating user always submits a review that usually describes key points driving rating decision. Found in both iTunes and AppStore App ratings and reviews are rich content for better understanding what drives user satisfaction (or dissatisfaction or both). In this document we continue (see part I http://rpubs.com/grigory/PodCruncher) analysis for popular iPhone podcast player - PodCruncher - using sentiment extracted from 500 reviews and applying different methods and visualizations.
Analysis of PodCruncher App Ratings and Reviews in iTunes
iPhone App developers care the most about its App rankings in App Store. One of main factors contributing to rankings are user ratings. Together with rating user always submits a review that usually describes key points driving rating decision. Found in both iTunes and AppStore App ratings and reviews are rich content for better understanding what drives user satisfaction (or dissatisfaction or both). In this document we analyze ratings and reviews for popular iPhone podcast player - PodCruncher - and illustrate how lack of new releases affects its user base.
Data Science Pipeline with Teradata Aster R
Scalable data science with Teradata Aster analytical platform and Aster R to design, develop, and deploy data science pipelines at scale and across enterprise data sources.
How To Be Part of Digital Media Ecosystem for Data Scientists
Simple illustration of how data scientist can contribute to, enhance, and compliment digital news content in simple but complete example that illustrates practical but realistic workflow of a news flowing into data that becomes a visualization that contributes back to a story again.
Correlations on big data with Aster R, in-database R, and toaster
Simple goal of computing correlations on large data sets may be a daunting task in R. Having Aster database do the job is simple with Aster R but there is no single way to get the job done. With examples using Aster R, in-database R, and toaster we demonstrate advantages of each. The least, the examples shown provide reader with better understanding of how Aster R addresses analytics with big data.
Intro to Aster with toaster
Extensive set of examples and tips on how to work with Teradata Aster big data platform using R and package toaster.
Scripting Parallel Jobs on Aster in R
Hardly a surprise to anyone Teradata Aster runs each SQL, SQL-MR, and SQL-GR command in parallel on many clusters and across distributed data. But when faced with the task of running several independent jobs at once we have to do extra work to parallelize them in Aster. For example, cross-validation of linear regression or other models is divided into independent jobs each working with its respective partition. These jobs could run in parallel in Aster with little help from R. This article will illustrate how to run K linear regression models in parallel in Aster as part of the K-fold cross-validation procedure.
Graphs with Aster and toaster
The Aster graph functions use the Aster Database SQL-GR framework, which allows large-scale graph analysis in-database. While these functions offer wide range of specialized graph analysis their integration and visualization is beyond of the scope of the Aster database. At the same time few of competing graph products, such as neo4j, offer both. toaster attempts to fill this gap by offering comprehensive graph functionality that expands and integrates Aster functions. It also exposes Aster graph data inside R as network (graph) objects and expand set of graph functionality. As always, toaster does this by taking advantage of Aster in-database performance, scalability and distributed architecture.
Kmeans with toaster package
K-means clustering is powerful unsupervised learning technique available out of the box with Aster database. But it still requires certain amount of data preparation, handling, and custom coding to acheive the same results as with core R functions kmeans, scale, etc. toaster streamlines and simplifies kmeans clustering with Aster with its family of kmeans clustering functions for computing and vizualization.