Dean Foster is in the forecasting business. More specifically, he is in the business of ensuring the forecasts Amazon makes for its supply chain are as accurate as possible.
Foster, a research scientist, works in the company’s Supply Chain Optimization Technologies (SCOT) organization. “Our main focus is trying to predict what customers will buy before they buy it,” he said. “We need to make sure that we know what people want so we can find it, get it moved across the country, and have it sitting there waiting when a customer places an order.”
Forecasting what customers might want at any given time, at scale, is inherently complex. One of the ways those forecasts are strengthened is by a concept known as calibration — a topic that Dean has researched extensively. In fact, a paper he co-authored with Rakesh Vohra 23 years ago, “Calibrated Learning and Correlated Equilibrium”, was honored this week with the Test of Time Award at the 21st ACM Conference on Economics and Computation.
The award, presented by the conference’s award committee, “recognizes the author or authors of an influential paper or series of papers published between ten and twenty-five years ago that has significantly impacted research or applications exemplifying the interplay of economics and computation.”
Foster and Vohra’s paper “spurred a sizeable theoretical literature,” notes Steve Tadelis, an Amazon Scholar and economist. It has also won praise for its influence on games played by learning agents, addressing a question emanating from an idea proposed by the famed mathematician John Forbes Nash Jr.
“If we have two different agents learning to play each other, they learn to play an equilibrium,” Foster said. “Nash came up with a fixed point and argued equilibriums exist. But the question, ‘Why would people play them?’ was open.” In other words, the actions of human beings are neither neat nor uniformly predictable. “You are not a simple creature, so modeling your behavior as if you’re going to do the exact same thing today as you’ve done every other day of your life is just wrong,” he says.
Foster and his coauthor, Rakesh Vohra, now a professor at the University of Pennsylvania, set out to account for some of that complexity by including arbitrary sequences when utilizing calibration.
Calibration, in this context, involves comparing a prediction against its actual outcome, measuring the difference between the two and then adjusting as needed. By learning from previous comparisons, prediction models can be calibrated to more accurately match outcomes.
“Not being calibrated is an embarrassment,” Foster said, “You should fix it! If I were trying to predict what nature was going to do, say rain or shine tomorrow, and suppose nature only has one goal in life—make me look stupid—in spite of that, I can still use calibration to figure out an accurate forecast.”
Foster notes that while that explanation may seem hyperbolic, it is particularly relevant to machine learning.
“That idea of calibration and doing predictions when the world’s out to get you, is now relatively standard in machine learning. It grew out of connecting a lot of computer science,” he explains. “In computer science, for most things that you can prove, you can show the worst case and the average case are about the same. So the worst possible data for a sorting algorithm is about as hard to sort as a typical problem for sorting. We took that model and said, ‘Well, is it true in statistics?’ And that’s where this idea came from. That you can make these predictions that are every bit as good, even when nature is out there trying to fool you.”
That idea has roots in game theory, an area in which calibration is particularly useful. Game theory assumes your opponent is attempting to fool you, for example, a chess player who wants you to think your queen is in danger when it really isn’t. Making errant predictions (or forecasts) means you will lose more frequently. Alternately, calibrated predictions can help you win more often.
“With a calibrated forecast, if I believe someone will take some action two-thirds of the time and two-thirds of the time they actually do that, I can now know two-thirds is the right answer,” he says. “And I can trust that I won’t have to go back and say, ‘Well, most of time, it landed the other way…’ It’s landing the way I thought it was going to land.”
Foster noted that calibration also helps ensure that forecasts aren’t tripped up by things like sample sizes. “A way to describe calibration for our forecast is that, after we announce the forecast, someone else shouldn’t be able to come along and say, ‘Hey, if you made that forecast 20% larger it would be more accurate.’ Calibration is a check to make sure that you haven’t left an easy, low-hanging fruit modification around.”
As to the future of forecasting, when it comes to supply chain optimization, Foster sees lots of potential. He is particularly excited about Amazon’s expanded usage of reinforcement learning, a machine learning approach focused on maximizing cumulative reward.
“I’ve been working on how to take a more economic viewpoint,” Foster said. “How do we connect the forecast to the economic decisions that we make so there’s a more integrated approach? We’re trying to do this by applying more reinforcement learning techniques.”
For all of the citations and accolades his paper has earned, it didn’t start smoothly. “When we first tried to publish the paper, it was a shock enough in statistics journals that it was rejected several times,” Foster recalled. “One referee said, ‘You’re trying to predict the unpredictable.’” In a way, he still is.