Microsoft's DNA Storage Research Just Hit A Huge Milestone
Microsoft has detailed a major breakthrough in its work on synthetic DNA storage, specifically on improving data throughput. The proof-of-concept is the subject of a new study from Microsoft Research and a team at the University of Washington's Molecular Information Systems Laboratory (MISL), paving the way for a future in which the world's data is stored on lab-made DNA, not tapes and hard drives.
Old tech still dominates
Microsoft has spent years working on synthetic DNA data storage, a promising technology that aims to address growing storage demands. The company paints an elaborate, if not mind-boggling, picture centered around present-day and future data needs — the huge quantity of information that already exists, the amount produced every day, and growth predictions over the next two years.
Assuming those predictions are accurate, there will be approximately 8.9 zettabytes of data in storage around the world by 2024, according to IDC. That works out to around 9 million petabytes of data, which is still more than the average person can visualize. Microsoft translates that figure into a more relatable context: a single zettabyte would be equivalent to installing Windows 11 on more than 15 billion computers.
Multiple types of data storage are commonly used, and though they seem positively archaic at this point, tape cartridges remain the most appealing commercial option due to their density (via IBM).
Magnetic tape has been around for several decades and offers some distinct benefits for companies that produce vast amounts of data: they help keep information secured away from hackers and can pack hundreds of terabytes of data in a small form factor. IBM says one tape cartridge utilizing its latest tech has a 580TB capacity, which would require more than three-quarters of a million CDs to store.
Using tape cartridges for data archival is a practice that will stick around for years, but there's strong demand for a modern alternative that offers even greater density while eliminating many of the old tech's problems. That, Microsoft says, is where synthetic DNA data storage comes in.
Why DNA?
Tape cartridges need to be rewritten every three or so decades at most, which is a short period of time when it comes to long-term data archiving. Synthetic DNA, on the other hand, is far more durable, Microsoft says, with the potential to preserve data for thousands of years. On top of that, synthetic DNA will likely drastically reduce the environmental impact of data centers, with Microsoft citing evidence that indicates lower water and energy use, as well as decreased greenhouse gas emissions.
Synthetic DNA data storage can only be a viable option if certain big hurdles are addressed, however. The technology is currently limited by low data throughput, specifically the rate at which data can be written. This, Microsoft notes, is a big stumbling block to large-scale synthetic DNA storage, not to mention the costs associated with the tech at this.
The newly announced breakthrough revolves around throughput, presenting a proof-of-concept molecular controller. The researchers describe this innovation as a "tiny DNA storage writing mechanism on a chip," which drastically improves how tightly DNA-synthesis spots are packed. The result is proof that higher levels of writing throughput are possible.
At its core, synthetic DNA storage involves moving data back and forth from molecules to bits. Microsoft explains that two things are critical for making DNA a viable commercial-scale storage option:
The first requires translating digital bits (ones and zeros) into strands of synthetic DNA representing these bits with encoding software and a DNA synthesizer. The second is to read and decode the information back into bits to recover that information into digital form again with a DNA sequencer and decoding software.
The company goes into extensive details about the new development and the wider processes involved in synthetic DNA storage in a new blog post. Storing data in DNA requires the information (in the form of digital bits) to be embedded in a DNA sequence's A/C/T/G bases. The DNA chain is then synthesized, which typically involves a photochemical process.
Microsoft goes on to explain that electrochemical DNA synthesis side-steps some of the limitations inherent to photochemistry; it involves an array, electrodes, and cathodes. The new work details a synthesis method that successfully increased the rate at which the data was written in synthetic DNA, therefore boosting the throughput and, by proxy, decreasing the costs associated with synthesizing the DNA.
Though synthetic DNA storage isn't yet ready to replace magnetic tape, Microsoft sees this latest development as a key step toward that reality. In its blog post detailing the study, Microsoft explained:
A natural next step is to embed digital logic in the chip to allow individual control of millions of electrode spots to write kilobytes per second of data in DNA. From there, we foresee the technology reaching arrays containing billions of electrodes capable of storing megabytes per second of data in DNA. This will bring DNA data storage performance and cost significantly closer to tape.