The Cancer Genome Computational Analysis (CGCA) group — a central component of the ӳý’s Cancer Program — addresses unanswered questions of cancer biology and genomics through the development of computational methods and tools, in conjunction with platforms, datasets and resources. Specifically, the group works to understand cancer by characterizing and interpreting genomic data:
Characterization: fully describing the genomic events (including somatic and germline events, at DNA, RNA, and proteomic levels) in tumor and normal samples coming from individual patients
Interpretation: analysis of characterization data across populations or cohorts with the aims of identifying a) genes, regions, and pathways that are altered beyond what is expected by chance, and b) subtypes of disease
CGCA works closely with many groups within the ӳý, including the institute’s Genomics, Genetic Perturbation, and Data Sciences platforms. CGCA members also engage with collaborators from the ӳý’s partner institutions and outside organizations such as IBM. The team also participates in several National Institutes of Health-funded national consortia, such as , the , the , the .
The CGCA team has created a number of powerful genomic analysis tools and platforms for the cancer research community, including:
, a cloud-based cancer genomics analysis platform developed with the ӳý’s Data Science Platform. FireCloud houses the full dataset set generated by TCGA and a suite of robust cancer genomics workflows containing CGCA-developed tools, such as:
, which estimates purity/ploidy, and computes absolute copy-number and mutation multiplicities.
, a tool for identifying somatic rearrangements as clusters of aberrant paired-end sequencing reads in a tumor sample.
, which identifies genes in a dataset that have mutated more often than would expected by chance.
, which infers HLA types from whole exome sequence data.
CGCA has also built and maintains several key genomic data resources, such as:
, a comprehensive mutational dataset comprising exome mutation data from 21 cancer types
, a user-friendly, web-based entry point to downloadable TCGA datasets, summary reports, and graphical tools. FireBrowse sits atop , an application providing access to TCGA datasets and a robust selection of tools and pipelines for analyzing cancer genome data, as well as thousands of data analysis archives.
, a comprehensive atlas and open database of gene expression and gene regulation across human tissues that provides a “normal” dataset against which to compare tumor-based expression and regulation data.