The COVID Tracking Project exists because every person, newsroom, and government agency in the United States deserves access to the most complete picture of COVID-19 testing data that can be assembled.
In the early stages of the COVID-19 pandemic, many wealthy countries pursued a strategy of widespread testing, allowing many of them to identify regional outbreaks early enough to successfully contain them. Others, including the US, have been much, much slower to implement mass testing. As has been extensively documented elsewhere, the US testing effort started very late, rolled out slowly and unevenly, and has yet to scale widely in most parts of the country.
Alongside the failure to test early or scale up quickly, central authorities have elected not to publish complete testing data. The CDC publishes a case count—identified cases of COVID-19 confirmed by testing—but no complete accounting of how many people have been tested. The CDC does offer an incomplete, lagging, national-level account of “specimens tested,” but since they report totals as specimens (usually more than one specimen is processed per person tested) and identified positives as people, it’s impossible to coherently match totals with positives to infer a complete picture.
Case counts do have obvious value, and Johns Hopkins University maintains the gold-standard count of identified positive cases in the US. The problem with relying on case counts at the national and regional level is that a simple case count doesn’t show the true location or comparative severity of outbreaks—it shows where people are being tested, not where they are sick. A state that identifies only 3 cases after testing 2,000 people is probably in a very different stage of its outbreak from a state that identifies 3 cases after testing only 20 people.
Understanding the shape, speed, and location of regional outbreaks requires not just a number of positive tests, but the entire testing picture: how many people have actually been tested, and where, and what their results were. Unless we can show exactly where and when testing was done and how many people were tested, there’s no way to evaluate what a given area’s case counts and patient outcomes actually depict.
The work of continuously evaluating ever-changing official state/territory data sources, ingesting the data, verifying it, fortifying it with reporting when official sources go dark, and publishing it all requires a fast-moving, dedicated organization with a wide range of skills—not something easily replicated within busy newsrooms and agencies. Every day, we use scrapers and trackers to alert us to changes, but most importantly, we get multiple sets of human eyes on each state and territory’s official numbers. The work of data-gathering from official sources is now supplemented by a fast-growing group of local reporters who are constantly pushing authorities to release more complete information.
When we started the project, building on two independently created reporting spreadsheets, we expected to be updating the dataset for a few days—maybe a week—until complete federal data emerged. It never did, so we’re still here, and we’ll keep at it till we’re replaced by something better.
We also recognize that part of our work is the creation and maintenance of a historical record of the US government’s response, now with the accompanying patient outcomes. As the pandemic imprints itself on the country, we are building an accurate record of what actually happened, day by day, state by state. Until that gets done elsewhere, we'll keep doing the counts.