Genomics Platforms and Data Types

Updated: October 25, 2018

Edit this Page via GitHub       Comment by Filing an Issue      Have Questions? Ask them here.

This guide highlights some of the genomics platforms available through the Genomics Shared Resource at Fred Hutch. This guide is intended to give general context to each platform. Access to many of the submission processes involved in using the Genomics Shared Resource is via Hutchbase.

Note: As technologies and reagents change, the relative costs of performing these experiments do as well. It is important to discuss your particular experiment and needs with the Genomics and Bioinformatics Shared Resource during the planning stages of your project. You can contact them by emailing genomics.

Sequencing Based Platforms

Sequencing based platforms currently available via the Genomics Core at the Fred Hutch include the Illumina sequencers (HiSeq/MiSeq) and the Pacific Biosciences Long Range sequencer. These platforms can be used to sequence a variety of assay material types via different library preparation processes. In the RNA or DNA Approaches pages, we discuss different options for the creation of libraries for sequencing from either nucleic acid type and for different research questions. Choosing the appropriate assay material QC and library preparation reagents depends in part on how the libraries will be sequenced. Thus is it important to verify that all phases of the process are using compatible techniques.

Illumina Sequencers

The Illumina sequencers are high read number, short read sequencers that provide a range of sequencing capabilities for many different upstream library types. The primary approach of these sequencers is to sequence at high numbers, individual fragments of DNA generated by the library preparation process, then to reconstruct the sequences in the mixture by, for example, aligning the sequences of the short reads to a reference, or other bioinformatic approach. The Illumina HiSeq at the Fred Hutch currently has two options: the High Output mode or the Rapid Run mode. The basic details for each mode along with the corresponding information for the MiSeq are included below. We have provided cost ranges to provide a basic idea of the relative costs of these sequencing platforms, but the exact costs of a sequencing run will depend on the read lengths, whether the sequencing is paired end or single end, as well as your affiliation (Fred Hutch vs external).

Sequencer Mode Read Lengths Approx. Reads per Lane Lanes per Run
NovaSeq 6000 Depends on Chip Type 50-150, depending on Chip Type Variable Variable
HiSeq 2500 High Output 50, 100, 125 250M 8*
HiSeq 2500 Rapid Run 50, 100, 150, 200, 250 150M 2
MiSeq Version 2 Reagents up to 250 15M 1
MiSeq Version 3 Reagents up to 300 25M 1

* High Output mode requires all 8 lanes to be run simultaneously, no partial runs.

When deciding how much sequencing is needed for a set of libraries to provide sufficient read depth (number of reads per genomic location in the genome covered in the library), issues such as the intended data type, sample type and quality, library preparation type, number of total samples, and the applicability of multiplexing approaches need to be considered. Consulting with the Genomics Core can help provide more clarity for individual projects.

More from Illumina about Illumina Sequencing can be found here.

Pacific Biosciences (PacBio) Long Range Sequencer

The PacBio SMRT sequencer works differently than the Illumina sequencers in that the read length is not specified by the platform, but is limited by the library itself, with an associated reduction in confidence of the sequence as reads get longer and longer. However, instead of being limited to sequencing only fragments of DNA, PacBio sequencing can provide long stretches of sequencing data that occur in the same fragment. This allow for analyses such as full length isoform discovery, de novo small genome sequencing, assessing structural variants/translocations, and allele phasing. On average the PacBio sequencer aims to provide up to 15kb of read length.

  • 300-600k reads per SMRT cell
  • Insert size for sequencing of 200bp minimum up to 40kb fragments
  • Library prep
    • Reagents purchased by lab from PacBio and prepped by lab, brought to Genomics as completed library (see PacBio website for more info)
    • OR
    • Genomics Shared Resources can help with library prep for a service fee to process a QC’d sample into a Pac Bio library depending on library type:
      • For amplicon library prep up to 5kb
      • For large insert library prep up to 20kb
  • Multiple multiplexing schemas (in-line or ligated) - discuss details with Genomics Shared Resource to plan the approach (email genomics).

More about PacBio SMRT Sequencing.

Array Based Platforms

Microarrays are a sometimes less costly option that can in some cases be substituted for a wide variety of sequencing types; for example, there are SNP, gene expression, and whole exome arrays. While microarrays are not useful for discovery of novel targets, for well-established targets, assay chemistries and data analysis pipelines are well-vetted. A discussion with the Genomics Core can be useful in helping you decide the best technologies for your work.

Single Nucleotide Polymorphism (SNP) Arrays or Methylation Arrays

The Genomics lab is equipped to run all Illumina genotyping and methylation beadchips (Illumina Microarrays). Beadchip kits are sold in a variety of sample size kits and it is important to plan total sample sizes as well as randomizing batches to minimize batch-specific bias. Beadchip kits need to be purchased by the investigator and should be drop-shipped to the Genomics lab. The Genomics lab then charges a fee for processing which covers all non-Illumina supplies, reagents, and labor. Investigators should plan to provide genomic DNA for genotyping arrays, which has been quantified by a dsDNA specific method such as picogreen. For methylation arrays, investigators may submit either genomic DNA or bisulfite converted DNA. Again, the starting DNA should be quantified by picogreen.

Genotyping BeadChip kits vary in price ($50/sample - $600/sample), depending on content. The Genomics Shared Resource can process samples to run for a service fee, regardless of the beadchip kit with an additional fee for DNA samples that require bisulfite conversion for methylation arrays.

Nanostring Hybridization Arrays for Gene Expression

Reagents for Nanostring arrays can be purchased from Nanostring and total RNA ready to be run can be brought to the Genomics Shared Resource for processing.

Library Preparation Reagents and Methods

The choice of genomics platform will dictate the needs of the library preparation method. It is important to understand how library preparation can impact the final data, for example if biases towards detection of specific types of nucleic acids are introduced. Meeting with an NGS specialist at the Genomics Core will help to guide your path.

Library Preparation for Sequencing

The four main steps in preparing RNA or DNA for NGS analysis are:

  1. fragmenting and/or sizing the target sequences to a desired length (via physical, enzymatic, and chemical methods)
  2. converting target to double-stranded DNA (if RNA)
  3. attaching oligonucleotide adapters to the ends of target fragments–these adaptors are multi-purpose. They are used to index/barcode the fragments, and allow the fragments to be attached to the flow cell of the sequencer (where the sequencing of the fragments occurs)
  4. quantitating the final library product for sequencing

10x Genomics Single Cell Library Preparation System

To obtain single cell gene expression data from RNA-seq, the Genomics lab uses the 10X Genomics Single Cell Expression platform. Starting with a cell suspension, this process partitions cells into droplets for cDNA library preparation. After library prep, the droplets are pooled, then sequenced on an Illumina sequencer. Unique molecular identifiers used in the library prep allow the sequencing results to be computationally traced back to individual cells.

Overview of the Sequencing Process

  1. The adapter-ligated DNA library is loaded onto a flowcell.
  2. The fragments are hybridized to the flow cell surface.
  3. Each bound fragment is amplified into a cluster. This step is known as bridge amplification.
  4. Fluorescently-labeled nucleotides and sequencing reagents are added; the flow cell is fluorescently imaged after the incorporation of each nucleotide. The color of the fluorescent dyes identifies which base was incorporated. This cycle of nucleotide incorporation and imaging is repeated n times for a n-reads of sequence.

Available Resources

  • The Hutch Genomics Shared Resource: The genomics core is VERY helpful if you need guidance about reagent and platform choice for your samples. Email genomics to discuss your particular project.
  • Submission of some types of samples for services to the Hutch Genomics Shared Resource is via Hutchbase.
  • Genohub has a compendium of library prep kits, organized by NGS application type here.
  • Illumina has a few interactive methods guides to help you find the most appropriate library prep reagents and sequencing methods to use in your experiments.
  • A helpful reference in considering library prep methods is Ordoukhanian’s 2014 paper.

Updated: October 25, 2018

Edit this Page via GitHub       Comment by Filing an Issue      Have Questions? Ask them here.