Recently Published
DREAM-High Breast Cancer Patient Data_test
We will load and examine a data frame that contains clinical information from over 1,000 breast cancer patients from *The Cancer Genome Atlas (TCGA)*.
Predictive Modeling for Cancer Prognosis
This activity will introduce you to loading and manipulating the data from the breast cancer METABRIC dataset, visualizing and working with gene expression measurements, and building predictive models based on the expression of many different genes (and clinical data too).
TCGA Heatmaps and Clustering
In the earlier module, Breast_Cancer_Expression_Data, we examined the mRNA levels for 18,351 genes across 1,082 breast cancer patients. In another activity, Heatmaps, we worked with the smaller data set `mtcar` and saw how the function `heatmap()` can reorder objects in our data set to reveal patterns in the data: Objects and features in the same clusters are more similar to each other than to those in other clusters. Here, we apply these concepts and skills to the TCGA clinical and gene expression features.
Breast Cancer Expression Data
We will load and examine R dataframe objects that contain data from over 1,000 breast cancer (BRCA) patients from The Cancer Genome Atlas (TCGA).
The objects include:
-- clinical measurements on the patients and the patients' tumors, such as gender, age, estrogen, progesterone, and her2 receptor status. We examined this data in detail in our previous activity, Breast_Cancer_Patient_Data.Rmd.
-- gene expression data which tells us how many messenger RNAs (mRNAs) per gene are present in a patient sample. The amount of a gene's mRNA corresponds (roughly) to the amount of protein in the sample.
Heatmaps
Heatmaps are a way to colorize, visualize, and organize a data set with the goal of intuiting relationships among observations and features.
We will use heatmaps in this course to find patterns in the gene expression data for the 1K breast cancer patients from The Cancer Genome Atlas. Here, we focus on what heatmaps are and how to create them by practicing with a small dataset.
Breast Cancer Cell Lines
We work with data from experiments with human cancer cell lines from the Physical Sciences in Oncology (PS-ON) Cell Line Characterization Study.
Cancer cell lines are cancer cells that keep dividing and growing over time, under certain conditions in a laboratory. Cancer cell lines are used in research to study the biology of cancer and to test cancer treatments.
The PS-ON Study includes imaging- and microscopy-based measurements of physical properties of the cells, such as morphology (shape) and motility (movement). We will examine:
-- the expression levels of genes, and
-- how fast the cells move.
DREAM-High Breast Cancer Patient Data
We will load and examine a data frame that contains clinical information from over 1,000 breast cancer patients from The Cancer Genome Atlas (TCGA).
DREAM-High Breast Cancer Patient Data
RStudio activity for examining breast cancer patient data from The Cancer Genome Atlas.
Breast Cancer Patient Data
In this activity, we will learn new skills in R with a large real-life dataset!
We will load and examine a data frame that contains clinical information from over 1,000 breast cancer patients from The Cancer Genome Atlas (TCGA). TCGA characterized over 20,000 cancer samples spanning 33 cancer types with genomics. Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. Throughout this course, we will examine some of the different data types and the computational analyses that were performed to decipher breast cancer data from TCGA.
Module 5: Genes for Enrichment Analysis
The TCGA breast cancer data set contains the expression levels of 18K genes from 1K patient samples. We rearranged this data so that similar samples and similar genes are "clustered" together. We found interesting clinical properties for the sample clusters, such as ER+ and ER- samples. Here, we will use Enrichment Analysis to find biological themes for the gene clusters.
Module 4: Patterns in Breast Cancer Gene Expression
The TCGA breast cancer data set contains the expression levels of 18K genes from 1K patient samples. We will rearrange this data so that similar samples and similar genes are "clustered" together and look for clinical themes in the patient sample clusters.
Module 3: Breast Cancer Expression Data
The TCGA breast cancer data set contains the expression levels of 18K genes from 1K patient samples. We will learn how manipulate this data and perform preliminary analyses.
Module 7: Breast Cancer Cell Lines: Part 2
We previously worked with gene expression data from patient samples. We will apply what we learned to data from human cancer cell lines. Human cancer cell lines are widely used as experimental models of cancer and often provide additional information related to their physical properties.
Student Project
A high school student applies his new skills and knowledge to create a new way of examining the TCGA breast cancer data set.
Module 2: Breast Cancer Patient Data
In this activity, we will put our new skills in R to use with a large real-life dataset. We will load and examine an R data frame that contains clinical information from over 1,000 breast cancer patients from The Cancer Genome Atlas (TCGA).
Module 6: Breast Cancer Cell Lines: Part 1
We previously worked with gene expression data from patient samples. We will apply what we learned to data from human cancer cell lines. Human cancer cell lines are widely used as experimental models of cancer and often provide additional information related to their physical properties.