This is the first of a two-part series on the human genome
In 2001, the first entire human genome was successfully sequenced under the support of the Human Genome Project. This endeavor took 15 years and cost nearly 3 billion dollars. Since then, high-throughput sequencing technology, also known as Next Generation Sequencing (NGS), was developed to reduce the time and cost of human genome sequencing. In 2005, the first NGS sequencer was released to the market by 454 Life Sciences. This sequencer reduced the cost of human genome sequencing by 50,000-fold.
Over the past decade, the development of NGS technologies have made remarkable progress and brought the cost of human genome sequencing down to about $1,000. More exciting is that, the NGS technologies have brought to the market compact units – about the size of a microwave oven – that have significantly reduced the sequencing time to several hours and enabled the sequencing of patient genome for diagnosis in clinical applications. These advancements have brought genome sequencing technologies to the attention of entrepreneurs and investors. Currently, the most popular NGS technologies are based on so-called “short-read sequencing”, which delivers reads up to 1,000 base pairs (“bp”s). The following is a patent overview of representative technologies in short-read sequencing NGS. (The relevant patents are listed in the following table.)
454 Life Sciences (acquired by Roche) In 2005, 454 Life Sciences, a Connecticut-based company, launched the first NGS sequencer. In 2007, 454 Life Sciences released the sequencing result of the genome of James Watson, co-discoverer of the structure of DNA. In the same year, Roche acquired 454 Life Sciences for $154.9 million. 454 Life Sciences’ sequencing technology is based on pyrosequencing or sequencing by synthesis with single-nucleotide addition (SNA). In this method, the DNA is fragmented into shorter reads, up to 1,000 bps. These DNA fragments are immobilized on streptavidin coated magnetic beads with one specific DNA fragment per bead. The DNA fragments are then amplified by emulsion PCR to generate millions of copies on a single bead (US 7,323,305). The beads are distributed into single wells of a glass array for monitoring the sequencing process. During sequencing, the four deoxynucleotides (dNTPs), i.e., dATP, dCTP, dGTP, and dTTP, are added in turn to react with the DNA fragments. A bioluminescence signal is released when nucleotides add to the DNA fragments. The intensity of the light signals is recorded to extract the sequence of the DNA fragment (US 7,335,762, US 7,211,390). After each reaction, the extra dNTPs are washed away and the next dNTPs are added for reaction. The sequence is read through the cycling of the dNTPs.
Illumina Another company in the field is Illumina, whose main sequencing technology is based on solid-phase bridge amplification and sequencing by synthesis with cyclic reversible termination (CRT). Just as the 454 technology, in Illumina’s sequencers, the DNA is fragmented into approximately 200 bp reads. These DNA fragments are ligated to a patterned solid slide with one specific DNA sequence in one patterned spot (US 7,790,418). After PCR amplification, each spot contains millions of copies of the specific DNA sequence, which is separated into single strands before sequencing. During sequencing, a mixture of all four individually fluorophore labelled and 3’-blocked dNTPs are added for reaction. With the 3’-blocked terminator molecules, a single dNTP is added to the sequence for each cycle. The color of the fluorescent signal indicates the base of the DNA sequence. After imaging, the terminators and the fluorophores are removed, and the DNA sequence is ready for the reaction of the next dNTP (US 7,115,400). The record of the addition of each single dNTP corresponds to the sequence of the DNA fragments.
Ion Torrent Ion Torrent developed the first NGS sequencer without using optical signals, but rather relying on chemical, specifically ionic signals. In Ion Torrent, the DNA is fragmented into approximately 200 bps and annealed on beads for emulsion PCR amplification. Then the beads are separated into a single well of a glass slide for sequencing. As the 454 technology, each dNTP is added to the well for reaction iteratively. The recorded signals are the pH change of the solution in the well due to the releasing of H+ ions as each dNTP incorporates to the DNA sequence (US 7,948,015).
Beijing Genomics Institute Beijing Genomics Institute (BGI) acquired Complete Genomics and improved Complete Genomics’ technologies for BGISEQ-500. BGI uses a DNA nanoball generation to amplify DNA sequences and a combinatorial probe-anchor synthesis (cPAS). DNA nanoball generation is currently the only solution phase DNA amplification technology available. In this technology, DNA is fragmented into approximately 100 bps and each side of the fragment is ligated with the first adapter, i.e., adapter 1. The ligated DNA fragments are amplified, forming a circular shape by binding of the adapters on both sides. Then the circular DNA is cleaved to add the second adapters, i.e., adapter 2, and repeat the amplification and circularization. Once four adapters, i.e., adapter 1, 2, 3, and 4, have been added into the DNA fragments, the final circular DNA is further amplified through a rolling circle amplification to generate a nanoball DNA template. The DNA nanoballs are immobilized on a patterned array for subsequent sequencing (US 7,709,197, US 8,445,197). cPAS sequencing is modified from combinatorial probe-anchor ligation sequencing (cPAL) (US 8,617,811) to increase the read length, though there is no detailed disclosure of cPAS.
Applied Biosystems (now Thermo Fisher Scientific) Applied Biosystems (ABI) Sequencing by Oligonucleotide Ligation and Detection (SOLiD) is the only technology using sequencing by ligation. In ABI SOLiD, the DNA fragments are annealed on the surface of magnetic beads and amplified by emulsion PCR. The beads are then covalently bound to a glass slide for sequencing. A two-base encoding method is used to sequencing the DNA fragments. In the sequencing by ligation, a sequence complex with both anchor and probe parts is ligated to the DNA fragments. The anchor encodes a known sequence which is complementary to an adapter on the DNA fragment and initiates the ligation for the probe, which is a fluorophore labelled dinucleotide. Once the probe ligates to the DNA fragment, an image is collected to measure the two-base combination. In this method, one of the bases in the two-base combination has to be already known in order to identify the other unknown base. After imaging, the fluorophore is removed and a new two-base probe is ligated to the DNA fragment for the next reading (US 8,329,404, US 9,217,177). The advantage of two-base encoding is that each base is read twice in one reading sequence and the error rate is reduced.
Limitations of short-read NGS and the Next Generation Although the short-read NGS can only process up to a 1,000 bp DNA sequence, the system can process millions to billions reads in parallel. In just several hours, the NGS can sequence several Gbytes of data. This significantly increases the throughput of genome sequencing and makes it possible to use NGS in clinical diagnosis and treatment. However, due to highly complex genomes with many long repeating sequences, the short reads cannot sufficiently cover these repetitive regions, which results in difficulty in accurately extracting the whole genome information or alternations. To address this problem of short-read sequencing, new technologies are being developed. We will discuss these new technologies in the next installment.
|US 7,323,305||Methods of amplifying and sequencing nucleic acids||454 Life Sciences Corporation||John H. Leamon; Kenton L. Lohman; Jonathan M. Rothberg; Michael P. Weiner|
|US 7,335,762||Apparatus and method for sequencing a nucleic acid||454 Life Sciences Corporation||Jonathan M. Rothberg; Joel S. Bader; Scott B. Dewell; Keith McDade; John W. Simpson; Jan Berka; Christopher M. Colangelo; Michael P. Weiner|
|US 7,211,390||Method of sequencing a nucleic acid||454 Life Sciences Corporation||Jonathan M. Rothberg; Joel S. Bader; Scott B. Dewell; Keith McDade; John W. Simpson; Jan Berka; Christopher M. Colangelo|
|US 7,790,418||Isothermal amplification of nucleic acids on a solid support||Illumina Cambridge Limited; Illumina, Inc.||Pascal Mayer|
|US 7,115,400||Methods of nucleic acid amplification and sequencing||Solexa Ltd.||Celine Adessi; Eric Kawashima; Pascal Mayer; Jean-Jacques Mermod; Gerardo Turcatti|
|US 7,948,015||Methods and apparatus for measuring analytes using large scale FET arrays||Life Technologies Corporation||Jonathan M. Rothberg; Wolfgang Hinz; Kim L. Johnson; James Bustillo|
|US 7,709,197||Nucleic acid analysis by random mixtures of non-overlapping fragments||Callida Genomics, Inc.||Radoje Drmanac|
|US 8,445,197||Single molecule arrays for genetic and chemical analysis||Callida Genomics, Inc.||Radoje Drmanac; Matthew J. Callow; Snezana Drmanac; Brian K. Hauser; George Yeung|
|US 8,617,811||Methods and compositions for efficient base calling in sequencing reactions||Complete Genomics, Inc.||Radoje Drmanac|
|US 8,329,404||Reagents, methods, and libraries for bead-based sequencing||Applied Biosystems LLC||Kevin McKernan; Alan Blanchard; Lev Kotler; Gina Costa|
|US 9,217,177||Methods for bead-based sequencing||APPLIED BIOSYSTEMS, LLC||Kevin McKernan; Alan Blanchard; Lev Kotler; Gina Costa|