There are millions of items in the Amazon catalog. To ensure a great customer experience along a customer’s shopping journey, it is important to surface the right products or information at the right time.
Suppose you type the query “digital cameras” on Amazon.com. The system ought to recognize this as the beginning of your shopping journey. A variety of options to help you conduct your research might be better than just showing the top-selling products in the camera category. The results could include a list of editorial recommendations that call out the “must-have” features at the most affordable prices. As another example, consider two consecutive queries made by a customer: “shorts for boys” followed by “red adidas shorts for toddler boy”.
These queries have age, gender, color, brand and a product type specified, but the information is added progressively as the customer moves through their shopping mission. To ensure a good customer experience, the shopping engine should be able to display progressively more relevant results that are closely aligned with the customer intent, that is likely to evolve as the customer interacts with the store.
Enter Amazon Fellow Inderjit Dhillon, who is using his expertise in machine learning to invent and deploy new AI methods to help customers along their shopping journey.
Context-aware artificial intelligence (AI) – where agents make decisions based on a broader awareness of the customer’s actions – is one of the most challenging problems in machine learning. Developing machine learning frameworks that can enhance context-aware AI has been an area of focus for Dhillon’s entire career.
“I have been passionate about, and involved in, machine learning research for over 20 years both in academia and in the industry,” says Dhillon. “After getting my PhD from UC Berkeley, I worked at IBM Research, where I developed machine learning methods to enhance user facing systems – in particular, call center and search systems. This customer-centric work helped drive new technical research, namely, the development of new document and graph clustering and co-clustering algorithms that are effective and efficient in high-dimensional spaces.”
Dhillon’s interest in machine learning motivated him to join academia.
“After I graduated, I felt the incessant need to read up on the existing machine learning literature. I figured that the best way to stay up to date with the latest in machine learning was to teach it. So I became a professor at UT Austin, and have taught machine learning to bright-eyed and incredibly smart and motivated students since then.”
Dhillon joined Amazon as an Amazon Fellow in 2017. Amazon Fellows are a select group of scientists from academia that are working at the company to solve hard science problems that can deliver a broad impact.
“Throughout my research career, I have been passionate about developing impactful open-source software. As part of my graduate studies, I wrote mathematical software based on my PhD dissertation that became part of the ubiquitous open-source LAPACK package. My software is now used by millions of users of the popular R software package when they need to perform principal components analysis or do other eigenvalue computations. My group’s open-source software for recommender systems, high-dimensional document clustering and co-clustering, graph clustering and inverse covariance estimation has also been widely adopted.”
Dhillon was impressed by Amazon’s approach to customer-obsessed science.
“While my software had been used in fields as diverse as healthcare, computational chemistry, social-network analysis and recommender systems, what was missing was direct interaction with the end customer. So Amazon was a natural destination.”
One of the innovations that Dhillon is driving is the Prediction for Enormous and Correlated Output Spaces (PECOS) project. The PECOS machine learning framework aims to find relevant results from an enormous output space of potential candidates.
There are a large number of outputs for a typical shopping query. Many of the results displayed after a customer’s search are related to infrequent ‘tail’ items. Surfacing ‘tail’ items is tricky when you consider that training data is available primarily for the most common ‘head’ outputs. Given the inherent paucity of training data for most of the items, developing machine learning models that perform well for spaces of this size is challenging.
Dhillon’s team is tackling the problem in three phases, each of which select the most-relevant items to a shopper from Amazon’s vast catalog.
“In the semantic indexing phase, we form a data structure based on training data to organize the enormous but correlated output space. In the next matching phase, we use this index to efficiently reduce the output space to a much smaller candidate set for a given input. Finally, a more computationally expensive ranking step identifies the best outputs from the smaller match set. For each of these phases, we are developing new state-of-the-art machine learning approaches that are able to effectively transfer data that is observed for the head items to the tail items, thereby returning more relevant results. Moreover, the inference phase in PECOS needs to have incredibly low latency in order to be responsive to the evolving customer intent. Accomplishing all the above is not straightforward; indeed our work has brought together scientists and engineers working in machine learning, computer science, statistics and high-performance computing.”
Dhillon says that he is looking to solve similar problems that have large-scale impact by developing further collaborations between Amazon and academia. In doing so, he wants to build on his track record of open-sourcing general purpose software that helps Amazon’s customers, while at the same time, advancing the state-of-the-art in machine learning.
“Developing truly impactful solutions necessitates collaboration with the wider research community. To this end, we’re working with the Berkeley Artificial Intelligence Research Lab at UC Berkeley on open research projects.”
Making an impact at scale has been important to me throughout my career. It has been a personal barometer of satisfaction, and I am delighted that I am able to achieve these goals at Amazon.
Inderjit Dhillon, Amazon Fellow
Amazon is a part of the BAIR Open Research Commons’ Joint Collaboration Tier. Other industry sponsors of BAIR include Google, Facebook, and Samsung. BAIR brings together UC Berkeley researchers across the areas of machine learning, natural language processing, reinforcement learning, computer vision, and robotics. These include 30 faculty members (such as Michael I. Jordan, Amazon Scholar and professor at U.C. Berkeley), and more than 200 graduate students and postdocs pursuing research in cross-cutting themes including multi-modal deep learning, human-compatible artificial intelligence (AI), and connecting AI with other scientific disciplines and the humanities.
“Making an impact at scale has been important to me throughout my career,” says Dhillon. “It has been a personal barometer of satisfaction, and I am delighted that I am able to achieve these goals at Amazon.”