Anwar Walid, an Amazon applied science manager, recently won the 2022 IEEE INFOCOM Test of Time Award for his 2010 paper on distributing caching algorithms for content distribution networks. The award “recognizes papers published between 10 to 12 years ago in the INFOCOM proceedings that have been most cited and widely recognized to have a significant impact on the research community.”
“It’s a great honor,” Walid said. “I have seen a lot of feedback on the paper, and its citations keep increasing, so I think the paper has inspired a lot of work.”
Walid’s paper, “Distributed Caching Algorithms for Content Distribution Networks”, co-authored by Sem Borst, a professor of stochastic operations research at Eindhoven University of Technology, and Varun Gupta, an associate professor of operations management at The University of Chicago, provided a solution to optimizing content distribution in large networks.
In 2010, content distributors and service providers were struggling with how to improve the user experience given the sudden explosion of streaming video. Overall bandwidth demands were increasing by orders of magnitude.
At that time, large video objects were stored in central servers that were often located far from users, which resulted in lengthy video download times as well as delays or interruptions in streaming videos to viewers. There were opportunities in placing caches at different network provider locations.
The paper details distributed algorithms for managing video caches, which temporarily store frequently accessed data close to the location of network user. Because video does not have to travel the entire length of the network, cache optimization helps accelerate the delivery of content. The algorithms adapt to the changing dynamics of content popularity and ingestion rate.
“The idea is to use those distributed caches to achieve three things,” Walid explained. “One is reducing delay, so customers have better quality of service when viewing videos. The second one is improving throughput for downloading large content, and the third is reliability.”
“Caching strategies provide an effective mechanism for mitigating these massive bandwidth requirements by replicating the most popular content in the right location closer to the network edge, rather than storing it in a central site,” Walid wrote in his paper.
In the last decade, various caching solutions and enhanced proposals for online video content delivery use the ideas Walid and co-authors explained as a foundation to their approach to improving the viewer experience.
Service provider networks have multiple candidate locations for caches. A distributed policy makes decisions on whether to cache an object or evict it from the cache based on a utility function attached to each object.
“Our solution provides data-driven distributed algorithms for managing those caches,” Walid said. “Decisions includes, for example, which objects to cache and where.”
Walid, who has served as an adjunct professor at Columbia University since 2009, is also an IEEE Fellow and the senior editor for the IEEE Journal on Selected Areas in Communications.
Walid joined Amazon in September 2021 and works in Amazon Ads where, among other things, his team builds machine learning models that predicts click-through rates for advertisements. It is a role where his experience quickly proved valuable.
“In these models we deal with very large datasets and caching can drastically accelerate machine learning workflows” he explained. “When I started, the question arose as to which dataset to keep local versus remote. My background was useful in contributing to the design of a distributed caching solution that helped to solve this problem.”
IEEE, or the Institute of Electrical and Electronics Engineers, is the world’s largest technical professional organization dedicated to advancing technology to benefit humanity. IEEE INFOCOM is a major conference for researchers to present and exchange ideas in theoretical and systems research in the field of networking and related areas.