Researchers from Stanford, Boston University, and the University of Minnesota trained a machine learning model on US death certificates from March 2020 to December 2021. Their finding: roughly 155,500 COVID-19 deaths went unrecognized, meaning the official count missed about 19% of the pandemic's true death toll. The study, published in Science Advances, used patterns in how hospitals coded deaths to predict what happened outside hospital walls, where testing and attribution were spottier.
The undercount wasn't random. The model found that unrecognized COVID-19 deaths clustered among people with less than a high school education, as well as those identified as Hispanic, American Indian, Alaska Native, Asian, and Black. Southern counties, lower-income areas, and places with worse baseline health also saw higher rates of missed deaths. Co-author Andrew Stokes and colleagues argue this means the US death investigation system masked real health inequities during the pandemic.
Here's the catch. The model assumes hospital COVID-19 death data was accurate, using it as ground truth to train predictions for out-of-hospital deaths. But hospitals had financial incentives to identify COVID-19 cases. The CARES Act authorized a 20% Medicare reimbursement bump for COVID-19 patients. Some critics worry hospitals may have "upcoded" deaths, counting patients who died with COVID as dying from COVID. If that happened, the model's training data was itself contaminated, potentially skewing results.