A Scientist Tracked Down Chinese Coronavirus Sequences That Had Disappeared Online


Thirteen genetic sequences — isolated from people with COVID-19 infections in the early days of the pandemic in China — were mysteriously deleted from an online database last year but have now been recovered.
Jesse Bloom, a computational biologist and specialist in viral evolution at the Fred Hutchinson Cancer Research Center, found that the sequences had been removed from an online database at the request of scientists in Wuhan, China. But with some internet sleuthing, he was able to recover copies of the data stored on Google Cloud.
The sequences don’t fundamentally change scientists’ understanding of the origins of COVID-19 — including the fraught question of whether the coronavirus spread naturally from animals to people or escaped in a laboratory accident. But their deletion adds to concerns that secrecy from the Chinese government has obstructed international efforts to understand how COVID-19 emerged.
Bloom’s results were published in a preprint paper, not yet peer-reviewed by other scientists, released on Tuesday. “I think it’s certainly consistent with an attempt to hide the sequences,” he told BuzzFeed News.
Bloom learned about the deleted data after reading a paper from a team led by Carlos Farkas at the University of Manitoba in Canada about some of the earliest genetic sequences of SARS-CoV-2. Farkas’s paper described sequences sampled from hospital outpatients in a project by researchers in Wuhan who were developing diagnostic tests for the virus. But when Bloom tried to download the sequences from the Sequence Read Archive, an online database run by the US National Institutes of Health, he was given error messages showing they had been removed.
Bloom realized that the copies of SRA data are also maintained on servers run by Google, and was able to puzzle out the URLs where the missing sequences could be found in the cloud. In this way, he recovered 13 genetic sequences that may help answer questions about how the coronavirus evolved and where it came from.
Bloom found that the deleted sequences, like others collected at later dates outside the city, were more similar to bat coronaviruses — presumed to be the ultimate ancestors of the virus that causes COVID-19 — than sequences linked to the Huanan Seafood Market in Wuhan. This adds to earlier suggestions that the seafood market may have been an early victim of COVID-19, rather than the place where the coronavirus first jumped over from animals into people.
“This is a very interesting study performed by Dr. Bloom, and in my opinion the analysis is totally correct,” Farkas told BuzzFeed News by email. Scott Gottlieb, formerly head of the Food and Drug Administration, also praised the findings on Twitter.
But some scientists were less impressed. “It really adds nothing to the origins debate,” Robert Garry of Tulane University in New Orleans told BuzzFeed News by email. Garry argued that the Huanan market or other markets in Wuhan could still be the source of COVID-19.
Bloom is one of 18 scientists who in May published a letter criticizing the WHO and China’s study into the origins of SARS-CoV-2. The scientists argued the WHO–China report failed to give “balanced consideration” to the competing ideas that the coronavirus spread naturally from animals to people or escaped from a lab — a theory the report judged to be “extremely unlikely.” After the WHO–China report was published, the US and 13 other governments complained that it “lacked access to complete, original data and samples.”
The deleted virus sequences were first uploaded to the SRA in early March 2020, around the time that researchers led by Yan Li and Tiangang Liu of Wuhan University published a preprint describing their work using genetic sequencing to diagnose COVID-19. Just days before, China’s State Council had ordered that all papers related to COVID-19 be centrally approved.
The sequences were then withdrawn from the SRA in June, around the time that the final version of the paper appeared in a scientific journal. According to the NIH, the authors asked for the sequences to be removed. “The requestor indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA to avoid version control issues,” NIH spokesperson Amanda Fine told BuzzFeed News by email.
However, it’s unclear whether the sequences have since been posted online in another database.
“There is no plausible scientific reason for the deletion,” Bloom wrote in his preprint, arguing the sequences were likely “deleted to obscure their existence.” That suggested, he wrote, “a less than wholehearted effort to trace early spread of the epidemic.”
Although the sequences were deleted, Garry pointed out that key genetic mutations they contained were still published in a table in the final paper from the Wuhan team. “Jesse Bloom found exactly nothing new that is not already part of the scientific literature,” Garry told BuzzFeed News, accusing Bloom of writing his preprint in an “inflammatory way that is unscientific and unnecessary.”
Bloom wrote to the Wuhan researchers asking them why the sequences had been deleted but received no reply. Li and Liu similarly did not immediately respond to a query from BuzzFeed News.
This is not the first time scientists have raised concerns about the removal of data that may help answer questions about the origins of COVID-19. The main database containing information on coronavirus sequences maintained by the Wuhan Institute of Virology — which is the focus of speculation about a possible “lab leak” of the virus — was taken offline in September 2019. When members of the WHO–China team that studied the origins of the pandemic visited the institute in February, they were told the database, which reportedly included data on 22,000 coronavirus samples and sequence records, had bee removed after repeated hacking attempts.

Thirteen genetic sequences — isolated from people with COVID-19 infections in the early days of the pandemic in China — were mysteriously deleted from an online database last year but have now been recovered. Jesse Bloom, a computational biologist and specialist in viral evolution at the Fred Hutchinson Cancer Research…
Recent Posts
- Gabby Petito murder documentary sparks viewer backlash after it uses fake AI voiceover
- The quirky Alarmo clock is no longer exclusive to Nintendo’s online store
- The government is still threatening to ‘semi-fire’ workers who don’t answer an email from Elon Musk
- Sigma’s latest camera is so minimalist it doesn’t have a memory card slot
- China ‘sinks’ 400 servers equivalent to 30,000 gaming PCs as it powers ahead with massive underwater data center project – but I wonder what GPU they use
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010