In late December 2019, eight pages of genetic code were sent to computers at the National Institutes of Health in Bethesda, Md.
Unbeknown to American officials at the time, the genetic map that had landed on their doorstep contained critical clues about the virus that would soon touch off a pandemic.
The genetic code, submitted by Chinese scientists to a vast public repository of sequencing data run by the U.S. government, described a mysterious new virus that had infected a 65-year-old man weeks earlier in Wuhan. At the time the code was sent, Chinese officials had not yet warned of the unexplained pneumonia sickening patients in the central city of Wuhan.
But the U.S. repository, which was designed to help scientists share run-of-the-mill research data, never added the submission it received on Dec. 28, 2019, to its database. Instead, it asked the Chinese scientists three days later to resubmit the code with certain additional technical details. That request went unanswered.
It took almost another two weeks for a separate pair of virologists, one Australian and the other Chinese, to work together to post the genetic code of the new coronavirus online, setting off a frantic global effort to save lives by building tests and vaccines.
The initial attempt by Chinese scientists to publicize the crucial code was revealed for the first time in documents released on Wednesday by House Republicans investigating Covid’s origins. The documents reinforced questions circulating since early 2020 about when China learned of the virus that was causing its unexplained outbreak — and also drew attention to gaps in the American system of monitoring for dangerous new pathogens.
The Chinese government has said it promptly shared the virus’s genetic code with global health officials. House Republicans said the new documents suggested that was untrue. News accounts and Chinese social media posts have long reported that the virus was first sequenced in late December 2019.
But lawmakers and independent scientists said that the documents did offer tantalizing new details about when and how scientists first tried to share those sequences globally, illustrating the difficulty the United States has with picking worrisome pathogens out of the thousands of humdrum genetic sequences that are submitted to its repository every day.
“You’d never have an ambulance sitting in normal 3 p.m. traffic,” said Jeremy Kamil, a virologist at Louisiana State University Health Sciences Center Shreveport. Referring to the coronavirus code from 2019, he said, “Why would you allow this sequence to sit there under the same process as a sequence I just got from a new snail species I found in a ravine?”
A spokeswoman for the Department of Health and Human Services, which includes the N.I.H., said in a statement on Wednesday that the genetic code was not published because it “was unable to be verified, despite follow-ups by N.I.H. to the Chinese scientist for more information and a response.”
In an earlier letter to House Republicans, Melanie Anne Egorin, a senior Health Department official, said that the sequence had initially been subjected to a “technical, but not scientific or public health,” review, as was customary. After not hearing back from the Chinese scientists about its requested corrections, the database, known as GenBank, automatically deleted the submission from its queue of unpublished sequences on Jan. 16, 2020.
It is not clear why the Chinese scientists did not respond. One of the submitters, Lili Ren, who worked at a pathogen institute within the state-affiliated Chinese Academy of Medical Sciences in Beijing, did not respond to a request for comment. The Chinese embassy said China’s response was “science-based, effective and consistent with China’s national realities.”
But the same sequence that Dr. Ren’s group sent to GenBank was made public on a different online database, known as GISAID, on Jan. 12, 2020, shortly after other scientists had posted the first coronavirus code. Dr. Ren’s group also resubmitted a corrected version of the code to GenBank in early February and published a paper describing its work.
The two-week gap between the code first being sent to the American database and China sharing the sequence with global health officials “underscores why we cannot trust any of the so-called ‘facts’ or data” from the Chinese government, the Republican leaders of the House Energy and Commerce Committee said.
Jesse Bloom, a virologist at the Fred Hutchinson Cancer Center in Seattle, said that the genetic sequence would have strongly suggested to anyone reviewing it in late December 2019 that a new coronavirus was causing the mysterious pneumonia cases in Wuhan. Instead, official Chinese timelines indicate the government did not make that diagnosis until early January.
“If this sequence had been made available, probably the prototype vaccines could’ve been started right away, and that was two weeks earlier than they were started,” Dr. Bloom said.
The documents, first reported by The Wall Street Journal, do not provide insight into the origins of the virus, Dr. Bloom and other scientists said, given that the sequence did not contain special clues about the virus’s evolution and was later made public anyway.
But they do offer new details about the pace at which Dr. Ren’s team worked to sequence the virus. The swab containing the virus they analyzed was taken from the 65-year-old patient, a vendor at the large market where the illness was first seen spreading, on Dec. 24, 2019. Within four days, scientists sent that virus’s genetic data to GenBank.
“That’s incredibly fast,” said Kristian Andersen, a virologist at the Scripps Research Institute.
At the time, finding a new coronavirus in the patient’s sample would not have proven that it was that pathogen, and not a different virus or bacteria, causing his illness, Dr. Andersen said, though it would have been a reasonable hypothesis.
That consideration appeared to weigh on the Chinese scientists studying samples from early patients. One researcher at a Chinese commercial laboratory that worked with Dr. Ren wrote on a blog in late January 2020 that while she had identified a new virus in hospital samples, that alone did not demonstrate that the virus was causing pneumonia cases, slowing down an official announcement.
In early 2020, the Chinese government also issued directives discouraging certain lines of scientific research and restricted the release of data about the virus.
Even once the virus’s genetic code was sent to the U.S. repository, it would have been difficult for American officials staffing the research-oriented database to take notice. The repository holds hundreds of millions of genetic sequences. Much of the process for screening them is automated.
And at least until Chinese officials started sounding an alarm at the very end of December 2019, almost no one would have known to look for a new coronavirus within the heaps of submissions.
“At the time, there was no way that anyone at N.C.B.I. would realize the importance of that,” said Alexander Crits-Christoph, a computational biologist, referring to the N.I.H. center that runs GenBank. Beyond that, he said, genetic repositories like GenBank have to be mindful about publicly blasting out sequences, given that researchers are often using the same data to prepare journal articles.
Still, some scientists believe that American and global health officials have been slow to retrofit databases like GenBank to allow them to seize on sequences that could have critical public health implications.
Such a database could, for example, automatically scan for new pathogens whose genetic codes overlap with those known to be dangerous, Dr. Kamil said. And it could ensure that those sequences are circulated more widely, even as health officials wait for missing details or revisions.
“Give those sequences concierge care, my gosh,” he said. “Why haven’t the agencies in charge of public health or global health stepped up their game and said, ‘This is the year 2024, we need to be safer so stuff like this doesn’t happen again?’”