Lihong Li, a senior principal scientist in Amazon Ads, has won the 2023 Seoul Test of Time award for the 2010 paper “A Contextual-Bandit Approach to Personalized News Article Recommendation.” The paper, coauthored by Wei Chu, John Langford, and Robert E. Schapire, introduced an innovative approach to personalized recommendation engines.
The Seoul Test of Time Award “is awarded annually to the author or authors of a paper presented at a previous World Wide Web conference that has, as the name suggests, stood the test of time.”
“The paper tackles an important problem from a novel angle that turned out to be one of the fundamental techniques in the years to come after publication,” said Li. “The paper considers recommendation as a reinforcement learning problem, which was not a popular view at that time.”
Li and his colleagues, who worked at Yahoo! Labs in 2010, introduced a new way of thinking about personalized recommendation engines. The team addressed the challenge of creating a personalized recommendation engine to directly maximize a utility function that measures user satisfaction.
Recommender systems at the time relied on past user activities to provide meaningful recommendations at an individual level. However, the paper notes, “in many web-based scenarios, the content universe undergoes frequent changes, with content popularity changing over time as well. Furthermore, there are new visitors to a website with no historical consumption record.”
“These issues make traditional recommender-system approaches difficult to apply,” the paper states. “It thus becomes indispensable to learn the goodness of match between user interests and content from user interactions, when one or both of them are new.”
Contextual bandits
The paper proposed a contextual-bandit approach to driving personalized recommendations in news content “in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.”
“News content changes every hour within the day,” said Li. “That’s why we need a solution to quickly adapt to changing content, and recommend the best content to users.” In doing so, the solution has to balance two competing goals: maximizing user satisfaction and gathering information about “goodness of match” between user interest and content. Contextual bandits are a special class of reinforcement learning problems that are well-suited to the scenario.
The paper develops practical contextual bandit algorithms, which optimize metrics about user engagement such as click-through rates, downstream revenue, or other business impacts. Li later worked on extending his approach to scenarios in which utility is measured in terms of long-term user engagements.
“In reality, decisions change the behavior of the user and, in turn, change the future way they interact with the website and the future utility,” said Li. “So a system should be able to take these long-term impacts into account and make a decision to maximize long-term utility instead of short-term.”
The authors reported that their “computationally efficient contextual bandit algorithm” not only drove higher click-through rates but also solved for the scaling challenge because it could be “reliably evaluated offline using previously recorded random traffic.” The evaluation technique itself has also found uses in other web-based scenarios.
The path to the prize
Li received a bachelor of engineering in computer science and technology at Tsinghua University in Beijing, then went on to earn a master of science in computing science at the University of Alberta. He earned his PhD in computer science from Rutgers University, working in the area of reinforcement learning.
During his time at Rutgers, Li met two mentors who would later become coauthors on the award-winning paper. Schapire was a Princeton professor on Li’s thesis defense committee, and Langford was Li’s internship mentor at Yahoo! in 2007. In October 2020, Li joined Amazon as a senior principal scientist.
“One thing that attracted me is the customer obsession culture of Amazon that uses solid science technologies and solutions to tackle deep customer questions,” Li said. “Contextual bandits and, more generally, reinforcement learning techniques can help Amazon fulfill customer needs in shopping, entertainment, and beyond, as well as play a key role in improving large language models.”
Li and his colleagues received the Seoul Test of Time Award at the Web Conference 2023 in Austin, Texas.
“I was thrilled, and winning was totally unexpected,” said Li.
First conceived in 1989 by Tim Berners-Lee at CERN in Geneva, the Web Conference (formerly known as the International World Wide Web Conference, abbreviated as WWW) is a yearly international academic conference on the topic of the future directions of the World Wide Web.
“Scientists often publish innovation in papers. When the invention stays on paper and doesn’t reach the real world, it doesn’t feel like the story is complete,” Li said. “This award is a recognition that the invention has had a long-lasting impact, not just on the problem we worked on, but also in the field and in other parts of the industry. I’m grateful to be a recipient of the award and am gratified to see that this 13-year-old work continues to be useful.”