In December, at the NeurIPS Machine Learning in Public Health Workshop, we and our colleagues at Amazon and the University of California, San Diego, won a best-paper award for a new approach to modeling the spread of COVID-19 infections.
In the past, researchers have used two different approaches to predicting COVID spread. One is “compartment” models, which use differential equations to compute a population’s transitions between different classes of exposure (or compartments), such as susceptibility, exposure, and infection.
The other is deep-learning models, which analyze large volumes of training data to identify factors that predict future spread. Compartment models tend to do a better job of predicting infection and recovery, but deep-learning models do a better job of predicting fatalities.
In our paper — “AutoODE: Bridging physics-based and data-driven modeling for COVID-19 forecasting” — we propose a hybrid approach that uses both ordinary differential equations (ODEs) and a simple, linear machine learning model. In experiments, we found that our approach improves on both earlier compartment models — a mean absolute error reduction of 36.5% — and deep-learning models — a mean absolute error reduction of 57.4%.
A typical compartment model sorts populations into a handful of categories. Our hybrid model is a variation of the SEIR model, in which the categories are susceptible, exposed, infected, and removed, a category that includes both recovery and death. We vary this schema slightly by adding deaths as a separate category.
Compartment models describe transitions between these categories using ODEs, whose parameters are fixed values for transmission, incubation, discovery, and recovery rates. The chief innovation of our model is to learn those parameters from data, rather than basing them on statistical analyses. We call our model AutoODE, since it automatically learns ODEs.
Fine-grained transmission rates
AutoODE’s approach to estimating transmission rate is more nuanced than existing compartment models’. First, we assume that transmission rate varies by U.S. state, as different states have enacted different policies for dealing with COVID.
To learn transmission rates, we start with a 50-by-50 adjacency matrix, which maps all 50 U.S. states against all 50 U.S. states. The entries in the matrix’s cells are simply binary indicators of whether two states are adjacent.
AutoODE then learns a correlation matrix, which is multiplied by the adjacency matrix to produce a matrix of specific transmission rates, both within states and across state lines. To improve computational efficiency, we use low-rank approximation in the computation of the correlation matrix.
We also assume that transmission rates change over time, as state policies change. So we train our model to identify inflection points in the data, which introduce new rates. The learning algorithm weights training data according to distance in time, with more recent measurements receiving greater weight than older ones.
Where deep-learning models are computationally intensive, our simple linear model is very efficient to train, so it can be updated regularly as new data arrives.
In our experiments, we compared AutoODE to six different deep-learning models and one state-of-the-art compartment model. The models were evaluated on their predictions of new infections, removals (either recoveries or fatalities), and deaths over three different seven-day periods.
On the task of predicting deaths, deep-learning models were slightly more accurate than AutoODE. But the margins were small, and AutoODE’s performance was better than the compartment model’s. On the other two tasks — predicting infections and recoveries — AutoODE outperformed all seven baselines for all three seven-day periods.