Recommendation systems, naturally, tend to recommend products that have proved popular over time — whether in general or among customers with particular taste profiles.
But popularity-based recommendations can miss trending products, or products that, while they haven’t yet reached high purchase volumes, are rapidly increasing in popularity. Customers executing a query today may well feel short-changed if they miss out on a new product that, two or three days from now, will turn out to have been a game-changing entry in the product space.
We envision that Amazon customers, when performing product queries, will receive not only a list of matches based on historical data but also a list of trending matches, so they will have as much information as possible when making purchase decisions. Because we want to catch the trend as it’s happening — not after the fact, when it shows up in the data — we use time series forecasting to predict which products will be trending in the near future.
We describe our method in a paper we presented at this year’s ACM Conference on Recommender Systems (RecSys). First, we rigorously define trending in terms of velocity — the number of customer interactions with product pages at each time step — and acceleration — the rate at which the velocity changes from time step to time step. Then we propose a novel machine learning scheme in which we pretrain a model on the task of next-item recommendation, so it can learn product features and that correlate with interaction patterns. After pretraining, the model is fine-tuned on the task of predicting trends (acceleration rate).
To evaluate our model — TrendRec — we compared it to three baselines: one was a simple Markov model, which assumes a constant rate of increase in transaction volume from one time step to the next (constant acceleration); one was an exponential-moving-average model, which predicts the acceleration at the next time step based on a weighted sum of accelerations in the past eight time steps; and one was a neural network trained on time series of acceleration rates.
We tested all four models on five datasets, using two different metrics. We found that our model, with its additional knowledge about the correlation between product features and sales patterns, outperformed the baseline neural model across the board — by 15% and 16% on one dataset. Both neural models dramatically outperformed the heuristic baselines, indicating that changes in acceleration rates follow nonlinear patterns discernible in the data.
Representation learning
The goal of our pretraining procedure is to teach the model to produce product representations that will be useful for the trend prediction task. The assumption is that products that the same subgroups of customers interact with will exhibit similar popularity trends over time. If the model learns to correlate product features with particular subgroups, it can learn to correlate the same features with particular trend patterns.
Accordingly, the pretraining task is to predict which product a given customer will interact with next, based on that customer’s interaction history. The model receives customers interaction histories as input, and it learns to produce two vector representations (embeddings): one of a customer’s tastes and one of product features.
After pretraining, the model is fine-tuned on the task of predicting future acceleration rate from past acceleration rates. The product feature embedding is still used, but the customer taste embedding is not. The assumption is that the taste embedding will have influenced the product embedding, since they were trained together.
Time intervals
An important consideration in our training procedure is the time interval over which we evaluate a model’s performance. We want to train the model to predict acceleration rate — but acceleration rate over what span of time? An hour? A day? A week?
We conjectured that there’s a relationship between the time interval over which we predict acceleration and the feasibility of learning a predictive model. If the interval is too short, the data is too noisy: for instance, the acceleration rate might happen to be flat or even negative for the first 15 minutes of the prediction period, even though it’s very high for the next three hours. Conversely, if the time interval is too long, by its end, the surge in popularity may have died down, so the overall acceleration rate looks artificially low.
When training a trend prediction model, our goal is to find the sweet spot between too short and too long a time interval — a figure that can vary widely across datasets. In our experiments, however, we found that the simplest of our baseline heuristics — the model that assumes constant acceleration — provides a good enough estimate of the shape of the feasibility-versus-time-interval curve to enable time interval selection. And because the heuristic is so simple, the estimate can be computed efficiently.
In our experimental setup, TrendRec and all three baselines predicted the ten products with the highest acceleration rates over the predicted time interval, calculated using the constant-acceleration heuristic. We evaluated the results using two metrics: the cumulative accelerations of the top 10 results, normalized according to min-max scaling, and trendiness-normalized discounted cumulative gain (TNDCG), which evaluates the order of the top 10 results, assigning greater weight to the top results.
Both neural models significantly outperformed the heuristic baselines, but TrendRec yielded the best results across the board.