Ensuring that customers don’t miss out on trending products


Recommendation systems, naturally, tend to recommend products that have proved popular over time — whether in general or among customers with particular taste profiles.

But popularity-based recommendations can miss trending products, or products that, while they haven’t yet reached high purchase volumes, are rapidly increasing in popularity. Customers executing a query today may well feel short-changed if they miss out on a new product that, two or three days from now, will turn out to have been a game-changing entry in the product space.

We envision that Amazon customers, when performing product queries, will receive not only a list of matches based on historical data but also a list of trending matches, so they will have as much information as possible when making purchase decisions. Because we want to catch the trend as it’s happening — not after the fact, when it shows up in the data — we use time series forecasting to predict which products will be trending in the near future.

Related content

In a keynote address, the Amazon International vice president will discuss recommendations in directed graphs, training models whose target labels change, and using prediction uncertainty to improve model performance.

We describe our method in a paper we presented at this year’s ACM Conference on Recommender Systems (RecSys). First, we rigorously define trending in terms of velocity — the number of customer interactions with product pages at each time step — and acceleration — the rate at which the velocity changes from time step to time step. Then we propose a novel machine learning scheme in which we pretrain a model on the task of next-item recommendation, so it can learn product features and that correlate with interaction patterns. After pretraining, the model is fine-tuned on the task of predicting trends (acceleration rate).

To evaluate our model — TrendRec — we compared it to three baselines: one was a simple Markov model, which assumes a constant rate of increase in transaction volume from one time step to the next (constant acceleration); one was an exponential-moving-average model, which predicts the acceleration at the next time step based on a weighted sum of accelerations in the past eight time steps; and one was a neural network trained on time series of acceleration rates.

We tested all four models on five datasets, using two different metrics. We found that our model, with its additional knowledge about the correlation between product features and sales patterns, outperformed the baseline neural model across the board — by 15% and 16% on one dataset. Both neural models dramatically outperformed the heuristic baselines, indicating that changes in acceleration rates follow nonlinear patterns discernible in the data.

Representation learning

The goal of our pretraining procedure is to teach the model to produce product representations that will be useful for the trend prediction task. The assumption is that products that the same subgroups of customers interact with will exhibit similar popularity trends over time. If the model learns to correlate product features with particular subgroups, it can learn to correlate the same features with particular trend patterns.

Our pretraining procedure is motivated by the assumption that products (items C and D) that the same group of customers (blue) interact with will exhibit similar popularity trends over time.

Accordingly, the pretraining task is to predict which product a given customer will interact with next, based on that customer’s interaction history. The model receives customers interaction histories as input, and it learns to produce two vector representations (embeddings): one of a customer’s tastes and one of product features.

A probabilistic graphic model of our training procedure. The model is pretrained to predict the customer’s next product interaction (Rijt), given that customer’s interaction history (Sit). During pretraining, the model learns two embeddings: one of the customer’s tastes (Uit) and one of product features (Vjt). Then the model is fine-tuned to predict current acceleration rate (Aj(t+1)) given past acceleration rates (Aj,0:t) and the product embedding.

After pretraining, the model is fine-tuned on the task of predicting future acceleration rate from past acceleration rates. The product feature embedding is still used, but the customer taste embedding is not. The assumption is that the taste embedding will have influenced the product embedding, since they were trained together.

The architecture of the TrendRec model. Sit is customer purchase history; Uit is customer taste embedding; Rijt is predicted product interaction; Vjt is product embedding; and Aj,0:t and Aj(t+1) are past accelerations and projected acceleration, respectively.

Time intervals

An important consideration in our training procedure is the time interval over which we evaluate a model’s performance. We want to train the model to predict acceleration rate — but acceleration rate over what span of time? An hour? A day? A week?

We conjectured that there’s a relationship between the time interval over which we predict acceleration and the feasibility of learning a predictive model. If the interval is too short, the data is too noisy: for instance, the acceleration rate might happen to be flat or even negative for the first 15 minutes of the prediction period, even though it’s very high for the next three hours. Conversely, if the time interval is too long, by its end, the surge in popularity may have died down, so the overall acceleration rate looks artificially low.

The conjectured relationship between the time interval of the acceleration projection and the feasibility of learning a predictive model.

When training a trend prediction model, our goal is to find the sweet spot between too short and too long a time interval — a figure that can vary widely across datasets. In our experiments, however, we found that the simplest of our baseline heuristics — the model that assumes constant acceleration — provides a good enough estimate of the shape of the feasibility-versus-time-interval curve to enable time interval selection. And because the heuristic is so simple, the estimate can be computed efficiently.

Accuracy-versus-time-interval curves for three different datasets, validating our conjecture about the relationship between time interval and model accuracy. (In the Netflix dataset (b), the time granularity of the data is one day; effectively, the upslope of the curve is to the left of the first data point.)

In our experimental setup, TrendRec and all three baselines predicted the ten products with the highest acceleration rates over the predicted time interval, calculated using the constant-acceleration heuristic. We evaluated the results using two metrics: the cumulative accelerations of the top 10 results, normalized according to min-max scaling, and trendiness-normalized discounted cumulative gain (TNDCG), which evaluates the order of the top 10 results, assigning greater weight to the top results.

Both neural models significantly outperformed the heuristic baselines, but TrendRec yielded the best results across the board.





Source link

We will be happy to hear your thoughts

Leave a reply

Rockstary Reviews
Logo
Shopping cart