Short Reads vs Long Reads: What’s the Difference and Why It Matters
Not all sequencing technologies are created equal. If you’ve ever wondered why some people swear by Illumina and others rave about Oxford Nanopore or PacBio, this post is for you. Let’s break down the differences between short and long read sequencing and why you should care.
Summary of Different Technologies
Feature | Illumina | MGI | Element Biosciences | Ultima Genomics | PacBio | ONT |
---|---|---|---|---|---|---|
Read Length | 50–300 bp | 50–300 bp | 75–300 bp | 225–400 bp | 10–25 kb | 10 kb – 4 Mb |
Accuracy | 90% > Q30 (99.9%) | 85% > Q30 (99.9%) | 90% > Q30 (99.9%) | 75% > Q30 (99.9%) | 95% > Q30 (99.9%) | Q20–Q30 (99–99.9%) |
Throughput (per flowcell) | Up to 26B reads (7.8 Tb) | Up to 40B reads (12.0 Tb) | Up to 1B reads (300 Gb) | Up to 12B reads (3.0 Tb) | Up to 30 Gb | Up to 290 Gb |
Approx. Cost per Gb | ~$2–$3 | ~$1.50–$2 | ~$5–$7 | ~$1–$1.50 | ~$10–$20 | ~$8–$15 |
Flagship Instruments | MiSeq, NextSeq, NovaSeq | DNBSEQ-T20x2 | AVITI | UG 100 | Revio, Sequel II | MinION, PromethION 48 |
Best For | Population WGS, RNA-seq, WES | High-throughput WGS, RNA-seq | Mid-throughput labs, targeted WGS | Population-scale WGS at ultra-low cost | de novo assembly, structural variants | Ultra-long reads, structural variants |
What Are Short Reads?
Short-read sequencing refers to platforms that generate relatively small snippets of DNA, typically 50 to 300 base pairs long. These reads are then aligned or assembled computationally to reconstruct genomes or transcriptomes. hbddgvev bfdhfdghfffhd
Key Players:
- Illumina - The dominant force in short-read sequencing since 2007
- MGI - A competitive player out of China offering DNBSeq technology since 2016
- Element Biosciences - A US-based newcomer offering high-accuracy short-read sequencing at lower operating costs for mid-throughput labs since 2022
- Ultima Genomics - Entered the market in 2022 with a high-throughput short-read sequencer aiming to dramatically lower sequencing costs, with a stated goal of enabling $1 human genomes at scale
Strengths:
- Accuracy: Their error rate is incredibly low, making it the gold standard for variant calling and quantification.
- Cost-effective: High throughput and low per-base cost make it ideal for large-scale studies like Genome-Wide Association Studies (GWAS) or RNA-seq.
- Tool ecosystem: Most software tools are optimized for short-read data.
Weaknesses:
- Repetitive regions: Short reads struggle to resolve large structural variations and repetitive elements.
- Context loss: Without long reads, it’s hard to understand haplotypes, phasing, and full-length transcripts.
- Assembly challenges: Assembling a genome de novo with short reads is like putting together a puzzle with too many identical pieces.
What Are Long Reads?
Long-read sequencing generates DNA fragments that can span tens of thousands (or even millions!) of bases. These are useful for studying structural complexity and building more complete assemblies.
Key Players:
- Oxford Nanopore Technologies (ONT) - Known for portability (MinION), real-time sequencing, and extreme read lengths
- PacBio - Uses circular consensus sequencing to combine long reads with very high accuracy
Strengths:
- Structural resolution: Excellent for detecting insertions, deletions, inversions, translocations
- Assembly: Makes genome assembly easier and more contiguous
- Transcriptomics: Full-length isoform sequencing (Iso-Seq, cDNA or direct RNA)
Weaknesses:
- Cost: Higher cost per gigabase (though rapidly decreasing)
- Throughput: Lower than short-read platforms for most instruments
- Computational overhead: Larger files and different error models require more advanced handling (CPU vs GPU) and sometimes custom pipelines
Use Cases in the Wild
Application | Ideal Read Type | Why? |
---|---|---|
Single Nucleotide Polymorphism (SNP) calling | Short reads | Cheap, accurate, scalable |
Different expression (RNA-seq) | Short reads | High accuracy and precise quantification at low cost |
Full-length transcript detection | Long reads | Preserves isoform structure |
Bacterial genome sequencing | Long or hybrid | De novo or complete assemblies |
Cancer structural variant detection | Long reads | Detect large rearrangements |
Metagenomics profiling | Short reads | High throughput, cost-efficient |
What About Hybrid Approaches?
One of the most effective strategies today is to combine both short and long reads.
- Long reads can scaffold the genome and resolve complex regions
- Short reads polish the sequence to improve base-level accuracy
Other use cases for hybrid sequencing allow for proper identification of SNPs (short reads) and structural variants (long reads) by combining both technologies.
Tools like Unicycler, Pilon, or MaSuRCA allow you to integrate both datasets for high-quality assemblies or variant calls.
Things to Consider Before Choosing
- Budget: Short-read sequencing remains cheaper per Gb, but ONT is catching up due to their cheaper instrument costs
- Computational skills: Long reads need different QC, alignment, and polishing tools
- Downstream needs: Are you calling SNPs or assembling new genomes?
If you’re just getting started in genomics, short reads might be more accessible but don’t underestimate the power of long reads for solving complex questions.
How Do These Technologies Actually Work?
Each sequencing platform takes a different approach to reading DNA. Here’s a breakdown of how the six major players do it:
Illumina - Sequencing by Synthesis (SBS)
Illumina uses sequencing by synthesis, where fluorescently labeled nucleotides are added one base at a time. As each base is incorporated, a signal is recorded, allowing the instrument to “read” the DNA.
- Strengths: Accuracy, throughput, and cost efficiency
- Limitations: Short reads (typically 150-300 bp), limited structural context
MGI - DNA Nanoball Sequencing
MGI (from BGI) uses DNBSeq, which involves amplifying DNA into nanoballs and sequencing them via combinatorial probe-anchor synthesis (cPAS).
- Strengths: Low duplication rates, reduced index hopping, high output
- Limitations: Still short-read technology, with ecosystem locked to MGI software/hardware
Element Biosciences - Avidity Sequencing
Element uses Avidity Sequencing, a twist on sequencing by synthesis that separates the steps of nucleotide incorporation and signal detection. It uses “avidites”, multivalent binding complexes, to improve accuracy and reduce reagent use.
- Strengths: Very high accuracy (Q40+), lower reagent costs, flexible throughput for mid-scale labs
- Limitations: Currently limited to short reads (up to 300 bp) and still building a large install base
Ultima Genomics - Open Substrate SBS
Ultima’s platform uses an open substrate and continuous sequencing-by-synthesis chemistry, designed to massively scale throughput while cutting reagent costs. The system runs on large circular wafers rather than flow cells.
- Strengths: Extremely low projected cost per genome (goal: $1 WGS at scale), high throughput per run
- Limitations: Still short reads (~300-400 bp), limited public performance data as adoption ramps up
Oxford Nanopore - Electrical Signal Detection
ONT devices pass single-stranded DNA through a nanopore and detect changes in ionic current, which correspond to different base sequences.
- Strengths: Ultra-long reads, real-time sequencing, portable devices (like MinION)
- Limitations: Historically lower accuracy, sensitive to base modifications and homopolymer runs
PacBio - Single Molecule Real-Time (SMRT) Sequencing
PacBio’s HiFi technology uses circular consensus sequencing: a polymerase reads a circularized template multiple times to generate a consensus with very high accuracy.
- Strengths: Long reads with Illumina-like accuracy (>99.9%)
- Limitations: Lower throughput compared to short-read platforms, higher cost per Gb
What About Roche?
Roche has re-entered the sequencing game in a big way, investing in technologies that aim to combine the strengths of both short and long reads. Roche has announced that in 2026 it plans to launch a short-read platform using its new X-Binding Sequencing (XBS) chemistry. This technology is designed to deliver high-accuracy (Q40+) short reads with faster run times, potentially reshaping the clinical sequencing market.
While full specifications aren’t public yet, Roche is positioning XBS as a complement to its long-read HiFi offerings through AVENIO.If XBS delivers on its promises, it could become a serious contender in both research and clinical genomics, offering an alternative to Illumina, MGI, Element, and Ultima in the short-read space. For now, it’s one to watch and we’ll be keeping an eye out for its performance data when it hits the market.
What’s Coming Next?
This post kicks off a mini-series on reference-based vs de novo assembly strategies, where we’ll look at:
- When and how to use short, long, or hybrid approaches in practice
- How reference-based methods align to existing genomes
- Why de novo assemblies are important (and hard)
Stay tuned for the next post and if you’ve got questions about your own dataset, feel free to drop them in the comments.
Comments