The Clinical Proteomic Tumor Analysis Consortium (CPTAC), from the National Cancer Institute (NCI), part of the National Institutes of Health (NIH), the brings together leading centers nationwide in a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics, and to address mechanisms of treatment response, resistance, or toxicity.
The NIH-NCI CPTAC consortium has generated a comprehensive dataset that standardizes genomic, proteomic, imaging, and clinical data from individual studies of more than 1,000 tumors across 10 cancer types.
This CPTAC pan-cancer proteogenomic dataset, described in a paper published in Cancer Cell, builds on decades of technological advances in proteomic science.
Researchers from around the world can use this publicly available resource to uncover new molecular insights into how cancers develop and progress.
The launch of this dataset supports the Biden-Harris Administration’s Cancer Moonshot℠ goal of accelerating cancer research through improved sharing of data.
Two additional research papers published in Cell by CPTAC investigators provide an initial demonstration of the dataset’s potential as a valuable resource for scientific discovery. In the first paper, multi-omic analyses are used to link cancer driver mutations with protein patterns. The second paper delves into protein modifications that regulate cell signaling and physiology to show associations with DNA repair, metabolism, and immunity across different tumor types.
The pan-cancer proteogenomic dataset is publicly available through the NCI Cancer Research Data Commons repositories. Proteomics and Genomic data can be accessed using our popular R/Bioconductor tool, TCGAbiolinks, that we have recently expanded to stream CPTAC pan-cancer data.
In addition to leveraging the numerous software tools available within Bioconductor, TCGAbiolinks facilitates access to molecular data from multiple international consortia such as TCGA, GENIE, MET500, GTEx, GEO, and IHEC.
With TCGAbiolinks internal functions to harmonize data from diverse consortia, end-users can explore and validate hypotheses on a comprehensive library of reference datasets using sharable and reproducible codes.
Pan-cancer proteogenomics connects oncogenic drivers to functional states
https://doi.org/10.1016/j.cell.2023.07.014
Pan-cancer analysis of post-translational modifications reveals shared patterns of protein regulation
https://doi.org/10.1016/j.cell.2023.07.013
Deep learning integrates histopathology and proteogenomics at a pan-cancer level
https://doi.org/10.1016/j.xcrm.2023.101173
Integrative multi-omic cancer profiling reveals DNA methylation patterns associated with therapeutic vulnerability and cell-of-origin
https://doi.org/10.1016/j.ccell.2023.07.013
Proteogenomic data and resources for pan-cancer analysis
https://doi.org/10.1016/j.ccell.2023.06.009
TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data.
http://doi.org/10.1093/nar/gkv1507