This page provides guidance on sharing research data for publication or as required by funding agencies. Many journals have data and code availability requirements where data must be promptly available and accessible to readers upon publication.

For information about the 2023 NIH Data Management and Sharing Policy see: NIH Data Sharing

Choosing a Data Repository (NIH Guidance)

The type of data, your funder’s requirements, and your field of research will all influence which repository is right for your project.

Common Data Repositories

  • cBioPortal - A great way to view and access cancer genomics data
  • dbGaP - NIH’s database of Genotypes and Phenotypes, offering both public and controlled-access individual-level genomic data
  • GEO - Gene Expression Omnibus, a public functional genomics data repository for array- and sequence-based data
  • TCGA - The Cancer Genome Atlas, providing molecular characterization of approximately 20,000 primary cancers across 33 cancer types
  • GTEx - The Genotype-Tissue Expression project, studying tissue and cell-specific gene expression and regulation
  • 1000 Genomes - A resource for genetic variants in human populations
  • gnomAD - The Genome Aggregation Database, aggregating and harmonizing exome and genome sequencing data across multiple studies
  • TOPMed - Trans-Omics for Precision Medicine, an NIH/NHLBI program focused on heart, lung, blood, and sleep (HLBS) disorders
  • dbSNP - Database of single nucleotide variations, microsatellites, and small-scale insertions and deletions along with population frequency and other information.
  • UK Biobank - Prospective cohort study with genetic and health data on 500,000 participants
  • Sage’s Synapse.org - Platform for sharing research data privately or publicly, hosting several open datasets and DREAM Challenges

  • dbGaP Specific Guidance

See the dbGaP Study Submission Guide

The NIH is committed to respecting the privacy and intentions of research participants. Data access is intended only for scientific investigators pursuing research questions consistent with informed consent agreements. Investigators must utilize appropriate controls and abide by Data Use Limitations.

NIH repositories like dbGaP provide two access levels:

  • Public Access: Non-individual genomic data can be publicly accessed through repository websites
  • Controlled Access: Individual-level data submitted to NIH repositories must be de-identified (no names or identifiable information). However, genetic fingerprints are embedded in genotype data and cannot be de-identified. Therefore, all individual-level data is distributed only through the NIH Authorized Access System

Data Sharing Best Practices

Genomic data requires special considerations due to its personal nature and unique characteristics. Genomic data:

  • Is often stored indefinitely
  • Changes in relevance over time
  • Carries uncertain risks
  • Raises privacy concerns due to re-identification risks
  • Can reveal unexpected health susceptibilities
  • Has implications for family members and reproductive decisions

Please consult with the appropriate administrative authority (e.g. an IRB) before submitting or accessing controlled-access data.

When sharing genomic and phenotypic data, investigators should:

  • Use informed consent documents with appropriate language regarding data sharing and future use
  • Share de-identified data by default
  • Use requested datasets solely for the research project described in the approved data request or protocol
  • Make no attempt to identify or contact individual participants without appropriate IRB approvals
  • Not distribute data to any entity or individual beyond those specified in the approved data request or protocol
  • Strive for harmonization of data collection and archiving methods to ensure scientific quality and validation
  • Adhere to computer security practices that ensure only authorized individuals can access data files and otherwise meet institutional security requirements

Questions?

Consortium members (Fred Hutch, UW, Children’s) can schedule Data House Calls for suppport:

Updated: