Introduction to the Fred Hutch instance of cBioPortal and a demonstration of its usage in the context of research data.
After reading this article, you will learn:
There are innumerable ways you can leverage the vast amount of data on cBioPortal. In the example below, we demonstrate how you can harness some of the capabilities of cBioPortal to facilitate your own research.
Note: The examples below are demonstrated using the public instance of cBioPortal but can be replicated in the Fred Hutch instance as well)
As an example, let us say you identified KRAS as an important gene in cancer. Let’s see how we can use cBioPortal to expand this observation leveraging publicly available datasets.
Question 1: How often is KRAS mutated in cancers?
Using cBioPortal, you can explore the frequency of KRAS mutations in different cancers, for example here we look at colorectal, lung, and pancreatic cancers. This analysis shows that KRAS is mutated in 26% of patients across these studies. You can similarly investigate other studies (over 400 publicly available datasets) on cBioPortal.
With cBioPortal, you can explore the different types of mutations in KRAS, such as single-nucleotide variations, insertions, deletions, and copy number changes. These can be visualized in OncoPrint format like above (showing mutations in genes for each sample), lollipop format like below (showing where mutations occur on the protein), and many others.
Additionally, cBioPortal’s direct integration with external databases such as ClinVar, COSMIC, etc. allows you to evaluate the impact of these genomic alterations. Here you can see each mutation in KRAS and its prediction of pathogenicity by OncoKB and COSMIC databases.
Thus with cBioPortal, exploring published multi-omics data becomes easier without needing to download raw data files, process them, and manually create visualizations. It also provides you with a one-stop shop to evaluate specific mutations by referencing external databases as well. The platform makes the entire process seamless and user-friendly.
Question 2: Are mutations in KRAS associated with any clinical parameters such as sex, age, etc.?
With cBioPortal, you can visualize genomic data, such as mutations, alongside clinical factors like sex, age at diagnosis, or smoking history. This allows for a deeper understanding of how these clinical traits may correlate with specific genetic changes. In the example below, we can add tracks representing patient sex, age at diagnosis, and history of smoking in subjects with mutations in KRAS. You can see that KRAS mutations are frequent in individuals with a history of smoking.
You can also correlate if specific clinical covariates are associated with certain mutation types or expression like in the screenshot below. Even if you have a particular tool in mind that you’re more comfortable with, users can download the underlying data and analyze it outside of cBioPortal.
Overlaying clinical data with genomic data is crucial for discovering novel associations between genetic mutations and patient outcomes. This integration can help you identify patterns that may not be apparent from analyzing genetic data alone. By combining both types of data, it becomes easier to uncover critical insights that can lead to better predictions of survival and improved treatment strategies.
Question 3: Do KRAS mutations co-occur with other kinds of mutations?
These analyses can reveal alternative therapeutic targets by identifying genes that often co-occur with mutations (co-occurrence) or show mutually exclusive patterns (synthetic lethal interactions). Ultimately, these findings can improve predictions for patient outcomes and guide research direction to identify personalized, effective therapies.
Question 4: Do KRAS mutations affect survival or disease progression?
Using cBioportal, you can explore the relationship between KRAS mutations and overall survival in the three cohorts. In the analysis below, it appears the KRAS mutant tumors have a worse overall survival compared to those cancer patients with unaltered KRAS.
You can also explore if KRAS mutations affect disease free survival. In this case, we see KRAS mutant tumors have a lower percentage of disease-free survival which is an important metric when considering oncogenic targets for therapeutic intervention.
cBioPortal provides a powerful platform for assessing overall survival and disease-free progression by integrating genomic alterations with clinical outcomes. The platform allows you to visualize survival curves, stratify patients by genetic features, and correlate these with clinical variables like treatment response. This helps in identifying potential biomarkers that could predict prognosis or guide personalized treatment strategies, making cBioPortal an invaluable resource for cancer research and clinical decision-making.
Other examples of what you can explore with cBioPortal:
It is a local installation of cBioPortal within the Fred Hutch computational infrastructure.
If you are interested in uploading your own data into the Fred Hutch instance of cBioPortal, here are the steps you need to follow:
Note: The cBioPortal team can be contacted via the #cbioportal-support channel on FH-Data Slack, or reach out directly to the DaSL Data Governance Team at dataprotection@fredhutch.org.
Note: Study approval is specific to the IRB and datatypes specified in the REDcap form. While periodic auditing will be performed by the DaSL team, it is the lab’s responsibility to ensure that only the approved datatypes are uploaded to the platform. Failure to do so will result in study removal from the FH cBioPortal platform. If you would like to upload any new datatypes for an existing study, please reach out to the Data Governance team as this will require additional review. If you would like to create a new project covered under a different IRB, please submit another response to the REDCap Form.
In the meantime, get AWS credentials by emailing the Fred Hutch help desk. Note: Make sure to include your PI in this email request as they’ll need a lab-based account as well. Once Fred Hutch help desk emails you back with your credentials, make sure to test them to ensure they are functioning correctly. If you already have AWS credentials, skip this step.
Get access to the fh-dasl-cbio
S3 bucket by emailing the cBioPortal team your AWS (Amazon Web Services) Account ID number and AWS username. Once the cBioPortal team gives you access, you will receive a confirmation email. Test your access to the cBioPortal S3 bucket by following these steps in a terminal window:
# How to test you have the correct access to the fh-dasl-cbio S3 bucket.
# Do the following to test if you have the correct access to the fh-dasl-cbio bucket.
# You should only be able to write and list files to this S3 bucket.
# ssh into rhino and follow the instructions here to configure AWS CLI (https://sciwiki.fredhutch.org/scicomputing/access_credentials/#configure-aws-cli)
ssh user@rhino
module load awscli
aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
# You should be able to write a file into the S3 bucket
# Write a simple text file into the s3 bucket
echo hello | aws s3 cp - s3://fh-dasl-cbio/hello.txt
# List
aws s3 ls s3://fh-dasl-cbio
# You should NOT be able to retrieve/delete any study data (even your own).
# aws s3 cp s3://fh-dasl-cbio/hello.txt hello.txt # Should error out...
fh-dasl-cbio
S3 bucket. On a Mac, right-click on your study directory and click “Compress”. On Windows, right-click on your study directory, select “Send to”, then “Compressed (zipped) folder”. If you prefer to use the command line, you can zip the folder using this command:
# Go to the directory where your study folder is present
cd /path/to/directory/cancer_study_indentifier
# Zip the folder recursively
zip -r cancer_study_identifier.zip .
fh-dasl-cbio
S3 bucket. You can do that one of these 3 ways:
fh-dasl-cbio
S3 bucket by following these steps.fh-dasl-cbio
bucket in finder.
fh-dasl-cbio
tab or window, right-click and select “Paste”.fh-dasl-cbio
S3 bucket:
aws s3 cp /path/to/directory/cancer_study_identifier.zip s3://fh-dasl-cbio/
Our Airflow automation scripts will take care of the rest of the upload process from there and send you an email notification with details about the outcome. If upload was unsuccessful, the email will contain a detailed HTML file called a “validation report” that identifies which parts of your study are causing the issue and need updating. Note: If you do not receive an email notification indicating the success/failure of your study upload within an hour, reach out to the cBioPortal Team for help identifying the issue.
Go have fun and explore your data on the Fred Hutch instance of cBioPortal!
fh-dasl-cbio
S3 bucket using the instructions in step 6. Any subsequent uploads of the same study will overwrite the previous study data.Note: If you would like to upload any of the available public studies into the Fred Hutch instance of cBioPortal, please contact the cBioPortal team via the #cbioportal-support channel on FH-Data Slack.
cancer_study_identifier
value in your meta files should also be this same study ID. Note: the study ID can be updated if desired, just reach out to the Data Governance team with your preferred ID.fh-dasl-cbio
bucket. As mentioned above, the automated upload process will send you a validation report via email outlining any issues that may have popped up.There are a few files that are required while all other files are optional. Below is an overview of the required files and some optional files Note: Version 6 of cBioportal currently also requires in the least 1 non-clinical file to be uploaded as well. See instructions below on where to find a dummy table that you can modify in case you are only uploading clinical data.
Type | Requirement | Filename Example | Required Format | Purpose | Detailed Instructions | Example |
---|---|---|---|---|---|---|
Cancer Study | Required | meta_study.txt | Text file | Overall information about the study | Readme | Example |
Cancer Type | Optional | meta_cancer_type.txt | Text file | A meta file with information about the file with new cancer type. Required if your cancer type does not exist in the database. | Readme | Example |
Cancer Type | Optional | cancer_type.txt | Tab Separated Value (TSV) | Details about a new cancer type not found in the cBioPortal database. Required if your cancer type does not exist in the database. | Readme | Example |
Clinical Sample | Required | meta_clinical_sample.txt | Text file | A meta file with information about the clinical samples | Readme | Example |
Clinical Sample | Required | data_clinical_sample.txt | Tab Separated Value (TSV) | File with the sample-level clinical covariates/metadata | Readme | Example |
Clinical Patient | Optional | meta_clinical_patient.txt | Multi-line text file | A meta file with information about the clinical patient | Readme | Example |
Clinical Patient | Optional | data_clinical_patient.txt | Tab Separated Value (TSV) | File with the sample-level clinical covariates/metadata | Readme | Example |
Panel | Optional | meta_gene_panel_matrix.txt | Multi-line text file | A meta file for describing the gene panel matrix file | Readme | Example |
Panel | Optional | data_gene_panel_matrix.txt | Tab Separated Value (TSV) | Sample level details of the gene panel used for the different samples | Readme | Example |
Mutation | Optional | meta_mutations.txt | Multi-line text file | A meta file describing information about the mutation file. | Readme | Example |
Mutation | Optional | data_mutations.txt | Tab Separated Value (TSV) | File with mutation data | Readme | Example |
Case Lists | Required | case_lists/cases_sequenced.txt | Multi-line text file | It helps cBioPortal identify which samples have data. Required if uploading data files beyond clinical data. | Readme | Example |
Structural Variant | Optional | meta_sv.txt | Multi-line text file | A meta file for describing the structural variant data file | Readme | Example |
Structural Variants | Optional | data_sv.txt | Tab Separated Value (TSV) | File with structural variant data | Readme | Example |
Generic Assays: Arm-level CNA | Optional | meta_armlevel_CNA.txt | Multi-line text file | A meta file for arm-level copy number alteration data | Readme | Example |
Generic Assays: Arm-level CNA | Optional | data_armlevel_CNA.txt | Tab Separated Value (TSV) | Arm-level copy number alteration data | Readme | Example |
There are many publicly available formatting tools that may help with the formatting process. When deciding which one works best for you, it ultimately depends on what tools you’re comfortable with and what kinds of data you’re uploading, but here are a few options that might help get you started:
Tool Name | Description | Advantages | Disadvantages | Fred Hutch Repository Link |
---|---|---|---|---|
Data-processor | Formats clinical data tables in multi-tab Excel files to cBioportal format | - Useful for varied clinical data fields. - Supports multi-tab Excel files. - Easy terminal execution |
- Does not seem to work to generate clinical data files. - Requires adherence to specific clinical data variable names from [cBioPortal Clinical Data Dictionary] |
Data_processor |
cbpManager | An R-based Shiny App that allows users to create and upload cBioPortal-formatted studies. | - A relatively easy to run R based (Shiny) App. - Allows you to create clinical data files, timeline related files, and mutation files. - Allows users to run the validation of their formatted study folders |
- Currently only helps to create clinical and mutation data related files. - If using the app to create the files then can only update one patient at a time |
cbpManager |
CaisisTools (a Fred Hutch tool) | Takes clinical data in the form of an excel workbook and converts to cBioportal format | - Helpful for processing clinical data. - Can be used for data from RedCap |
- Data either must be obtained from Caisis or should be in the same format | CaisisTools |
Varan 2.0 | Takes genomic data and existing study folder to process and upload into cBioportal. | - Useful for validating an existing cBioportal study folder. - Can concatenate from multi-sample vcf files. - Can be used to do filtering of genomic files |
- Has several local dependencies (vcf2maf, VEP, and samtools). - Folder preparations restricted to CNV, SNV, SV, and clinical data |
Varan |
cBioPortal-BS-Lab | Helpful scripts to take data from RedCap to convert to clinical data files | - Good for demonstrating how to take data stored in RedCap and format | - Mostly would be useful for clinical data files | cBioPortal-BS-Lab |
cBioPortal_Importer | Python script to prepare data for uploads into cBioportal. Mostly genomics data. | - Helpful scripts to transform specific data types into cBioportal format | - Accepts very specific output files. - Requires threshold setting for copy number data, etc. |
cBioPortal_Importer |
cbpConverter | R Shiny App to convert Excel sheets into cBioportal format | - Seems like a simple Shiny app to convert clinical data into cBioportal format | - Looks untested but might have helpful scripts. - Again only clinical data |
cbpConverter |
gdc-et-pipeline | Converts data from the GDC repository to cBioportal format | - If data is available on GDC, this might be useful | - Written in Java. - File formats have to be in the GDC format. - Folder preparations restricted to CNV, SNV, Expression, and clinical data |
gdc-et-pipeline |
kf-cbioportal-etl | Specific to this study: CAVATICA and Data Warehouse | - Helpful scripts that can be leveraged. | - These scripts might be specific to the format of files found in this study. | kf-cbioportal-etl |
mutational-signature-converter | Very specifically converts the mutational signature data into cBioportal format | - Helps to convert mutational signature data into cBioPortal format | - Has not been updated in a few years. - Simple python script |
mutational-signature-converter |
shah-cbioportal-tools | Specifically for formatting Copy Number Data expects a seg file and TITAN output | - Could potentially be used for tools other than TITAN that generate a seg file | - Specific for TITAN outputs | shah-cbioportal-tools |
To report bugs or issues with the Fred Hutch instance of cBioPortal, please file an issue here. For questions about using the tool and formatting your data for upload, schedule a data house call using the link above.