Introduction to the Fred Hutch instance of cBioPortal and a demonstration of its usage in the context of research data.

Learning Objectives

After reading this article, you will learn:

What can you do with cBioportal?

There are innumerable ways you can leverage the vast amount of data on cBioPortal. In the example below, we demonstrate how you can harness some of the capabilities of cBioPortal to facilitate your own research.

Note: The examples below are demonstrated using the public instance of cBioPortal but can be replicated in the Fred Hutch instance as well)

As an example, let us say you identified KRAS as an important gene in cancer. Let’s see how we can use cBioPortal to expand this observation leveraging publicly available datasets.

Question 1: How often is KRAS mutated in cancers?

Thus with cBioPortal, exploring published multi-omics data becomes easier without needing to download raw data files, process them, and manually create visualizations. It also provides you with a one-stop shop to evaluate specific mutations by referencing external databases as well. The platform makes the entire process seamless and user-friendly.

Question 2: Are mutations in KRAS associated with any clinical parameters such as sex, age, etc.?

Overlaying clinical data with genomic data is crucial for discovering novel associations between genetic mutations and patient outcomes. This integration can help you identify patterns that may not be apparent from analyzing genetic data alone. By combining both types of data, it becomes easier to uncover critical insights that can lead to better predictions of survival and improved treatment strategies.

Question 3: Do KRAS mutations co-occur with other kinds of mutations?

These analyses can reveal alternative therapeutic targets by identifying genes that often co-occur with mutations (co-occurrence) or show mutually exclusive patterns (synthetic lethal interactions). Ultimately, these findings can improve predictions for patient outcomes and guide research direction to identify personalized, effective therapies.

Question 4: Do KRAS mutations affect survival or disease progression?

cBioPortal provides a powerful platform for assessing overall survival and disease-free progression by integrating genomic alterations with clinical outcomes. The platform allows you to visualize survival curves, stratify patients by genetic features, and correlate these with clinical variables like treatment response. This helps in identifying potential biomarkers that could predict prognosis or guide personalized treatment strategies, making cBioPortal an invaluable resource for cancer research and clinical decision-making.

Other examples of what you can explore with cBioPortal:

How do I prepare my data for upload into cBioportal?

Before you begin

Required/Optional Study Files

There are a few files that are required while all other files are optional. Below is an overview of the required files and some optional files Note: Version 6 of cBioportal currently also requires in the least 1 non-clinical file to be uploaded as well. See instructions below on where to find a dummy table that you can modify in case you are only uploading clinical data.

Type Requirement Filename Example Required Format Purpose Detailed Instructions Example
Cancer Study Required meta_study.txt Text file Overall information about the study Readme Example
Cancer Type Optional meta_cancer_type.txt Text file A meta file with information about the file with new cancer type. Required if your cancer type does not exist in the database. Readme Example
Cancer Type Optional cancer_type.txt Tab Separated Value (TSV) Details about a new cancer type not found in the cBioPortal database. Required if your cancer type does not exist in the database. Readme Example
Clinical Sample Required meta_clinical_sample.txt Text file A meta file with information about the clinical samples Readme Example
Clinical Sample Required data_clinical_sample.txt Tab Separated Value (TSV) File with the sample-level clinical covariates/metadata Readme Example
Clinical Patient Optional meta_clinical_patient.txt Multi-line text file A meta file with information about the clinical patient Readme Example
Clinical Patient Optional data_clinical_patient.txt Tab Separated Value (TSV) File with the sample-level clinical covariates/metadata Readme Example
Panel Optional meta_gene_panel_matrix.txt Multi-line text file A meta file for describing the gene panel matrix file Readme Example
Panel Optional data_gene_panel_matrix.txt Tab Separated Value (TSV) Sample level details of the gene panel used for the different samples Readme Example
Mutation Optional meta_mutations.txt Multi-line text file A meta file describing information about the mutation file. Readme Example
Mutation Optional data_mutations.txt Tab Separated Value (TSV) File with mutation data Readme Example
Case Lists Required case_lists/cases_sequenced.txt Multi-line text file It helps cBioPortal identify which samples have data. Required if uploading data files beyond clinical data. Readme Example
Structural Variant Optional meta_sv.txt Multi-line text file A meta file for describing the structural variant data file Readme Example
Structural Variants Optional data_sv.txt Tab Separated Value (TSV) File with structural variant data Readme Example
Generic Assays: Arm-level CNA Optional meta_armlevel_CNA.txt Multi-line text file A meta file for arm-level copy number alteration data Readme Example
Generic Assays: Arm-level CNA Optional data_armlevel_CNA.txt Tab Separated Value (TSV) Arm-level copy number alteration data Readme Example

Publicly available tools for data formatting

There are many publicly available formatting tools that may help with the formatting process. When deciding which one works best for you, it ultimately depends on what tools you’re comfortable with and what kinds of data you’re uploading, but here are a few options that might help get you started:

Tool Name Description Advantages Disadvantages Fred Hutch Repository Link
Data-processor Formats clinical data tables in multi-tab Excel files to cBioportal format - Useful for varied clinical data fields.
- Supports multi-tab Excel files.
- Easy terminal execution
- Does not seem to work to generate clinical data files.
- Requires adherence to specific clinical data variable names from [cBioPortal Clinical Data Dictionary]
Data_processor
cbpManager An R-based Shiny App that allows users to create and upload cBioPortal-formatted studies. - A relatively easy to run R based (Shiny) App.
- Allows you to create clinical data files, timeline related files, and mutation files.
- Allows users to run the validation of their formatted study folders
- Currently only helps to create clinical and mutation data related files.
- If using the app to create the files then can only update one patient at a time
cbpManager
CaisisTools (a Fred Hutch tool) Takes clinical data in the form of an excel workbook and converts to cBioportal format - Helpful for processing clinical data.
- Can be used for data from RedCap
- Data either must be obtained from Caisis or should be in the same format CaisisTools
Varan 2.0 Takes genomic data and existing study folder to process and upload into cBioportal. - Useful for validating an existing cBioportal study folder.
- Can concatenate from multi-sample vcf files.
- Can be used to do filtering of genomic files
- Has several local dependencies (vcf2maf, VEP, and samtools).
- Folder preparations restricted to CNV, SNV, SV, and clinical data
Varan
cBioPortal-BS-Lab Helpful scripts to take data from RedCap to convert to clinical data files - Good for demonstrating how to take data stored in RedCap and format - Mostly would be useful for clinical data files cBioPortal-BS-Lab
cBioPortal_Importer Python script to prepare data for uploads into cBioportal. Mostly genomics data. - Helpful scripts to transform specific data types into cBioportal format - Accepts very specific output files.
- Requires threshold setting for copy number data, etc.
cBioPortal_Importer
cbpConverter R Shiny App to convert Excel sheets into cBioportal format - Seems like a simple Shiny app to convert clinical data into cBioportal format - Looks untested but might have helpful scripts.
- Again only clinical data
cbpConverter
gdc-et-pipeline Converts data from the GDC repository to cBioportal format - If data is available on GDC, this might be useful - Written in Java.
- File formats have to be in the GDC format.
- Folder preparations restricted to CNV, SNV, Expression, and clinical data
gdc-et-pipeline
kf-cbioportal-etl Specific to this study: CAVATICA and Data Warehouse - Helpful scripts that can be leveraged. - These scripts might be specific to the format of files found in this study. kf-cbioportal-etl
mutational-signature-converter Very specifically converts the mutational signature data into cBioportal format - Helps to convert mutational signature data into cBioPortal format - Has not been updated in a few years.
- Simple python script
mutational-signature-converter
shah-cbioportal-tools Specifically for formatting Copy Number Data expects a seg file and TITAN output - Could potentially be used for tools other than TITAN that generate a seg file - Specific for TITAN outputs shah-cbioportal-tools

Help

Report bugs or issues

To report bugs or issues with the Fred Hutch instance of cBioPortal, please file an issue here. For questions about using the tool and formatting your data for upload, schedule a data house call using the link above.