Leveraging transformers to improve product retrieval results


When a customer clicks on an item in a list of product-search results, it implies that that item is better than those not clicked. “Learning to rank” models leverage such implicit feedback to improve search results, comparing clicked and unclicked results in either “pairwise” (comparing pairs of results) or listwise (judging a results position within the list) fashion.

A problem with this approach is the lack of absolute feedback. For instance, if no items in the selection are clicked, it’s a signal that none of the results was useful. But without clicked items for comparison, learning-to-rank models can do nothing with that information. Similarly, if a customer clicks on all the items in a list, it could indicate that all the results were useful — but it could also indicate a fruitless search to find even one useful result. Again, learning-to-rank models can’t tell the difference.

Related content

Using reinforcement learning improves candidate selection and ranking for search, ad platforms, and recommender systems.

In a paper we’re presenting at this year’s International Conference on Knowledge Discovery and Data Mining (KDD), we describe a new approach to learning to rank that factors in absolute feedback. It also uses the type of transformer models so popular in natural-language processing to attend to differences among items in the same list to predict their relative likelihood of being clicked.

In experiments, we compared our approach to a standard neural-network model and to a model that used gradient-boosted decision trees (GBDTs), which have historically outperformed neural models on learning-to-rank tasks. On three public datasets, the GBDTs did come out on top, although our model outperformed the baseline neural model.

On a large set of internal Amazon search data, however, our approach outperformed the baselines across the board. We hypothesize that this is because the public datasets contain only simple features and that neural rankers become state-of-the-art only when the dataset is large and has a large number of features with complicated distributions.

Absolute feedback

Although the items in the public datasets are scored according to how well they match various search queries, we are chiefly interested in learning from implicit feedback, since that scales much better than learning from labeled data.

Related content

Models adapted from information retrieval deal well with noisy GPS input and can leverage map information.

We thus assign each item in our datasets a value of 0 if it isn’t clicked on, a value of 1 if it is clicked on, and a value of 2 if it’s purchased. We define the absolute value of a list of items as the value of its single highest-value member, on the assumption that the purpose of a product query is to identify a single item for purchase. A list with one item that results in a purchase thus has a higher value than a list all of whose items were clicked without purchase.

As input, our transformer model receives information about each product in a list of products, but it also receives a class token. For each input, it generates a vector representation: the representations of the products capture information useful for assessing how well they match a product query, but the representation of the class token captures information about the list as a whole.

The representations pass to a set of scoring heads, which score them according to their relevance to the current query. During training, however, the score of the class token and the product scores are optimized according to separate loss functions.

In addition to product features (XI), the inputs to our model include a classification token x[CLS]. The transformer outputs (ZI and z[CLS]) pass to scoring heads (hs and hd). During training, the product scores (s(XI)) and the class token score (d(XI) are optimized according to different loss functions (Ly and Lt).

The transformer’s key design feature is its attention mechanism, which learns how heavily to weight different input features according to context. In our setting, the attention mechanism determines which product features are of particular import given a product query.

Related content

Applications in product recommendation and natural-language processing demonstrate the approach’s flexibility and ease of use.

As an example, transformers are capable of learning that a 10-dollar item should be contextualized differently in a list of 20-dollar items than in a list five-dollar items. Like the scoring of the class token, this contextualization is trained on the overall, absolute feedback, which allows our model to learn from lists that generate no clicks or purchases.

Does it help?

Although our results on proprietary data were more impressive, we evaluated our proposal on publicly available datasets so that the research community can verify our results. On the internal data at Amazon Search, where a richer set of features is available, our model achieves better performance than any of the other methods, including strong GBDT models.

On the strength of those results, we are encouraged to keep learning from customer feedback. The user’s perspective is central to ranking problems, and click and purchase data appears to be a signal ripe for further research.





Source link

We will be happy to hear your thoughts

Leave a reply

Rockstary Reviews
Logo
Shopping cart