Adapting language model architectures for time series forecasting

April 28, 2024

4 Views 0

SaveSavedRemoved 0

Time series forecasting is essential for decision making across industries such as retail, energy, finance, and health care. However, developing accurate machine-learning-based forecasting models has traditionally required substantial dataset-specific tuning and model customization.

Related content

Time series forecasting enables up-to-the-minute trend recognition, while novel two-step training process improves forecast accuracy.

In a paper we have just posted to arXiv, we present Chronos, a family of pretrained time series models based on language model architectures. Like large language models or vision-language models, Chronos is a foundation model, which learns from large datasets how to produce general representations useful for a wide range of tasks.

The key insight behind Chronos is treating time series data as a language to be modeled by off-the-shelf transformer architectures. To tokenize real-valued time series observations into a fixed vocabulary, we scale the time series by its absolute mean and then quantize the scaled time series into a fixed number of uniformly spaced bins.

In addition to these bin tokens, we add two special tokens, PAD and EOS, to denote padding/missing values and end-of-sequence, respectively. We can then train standard language models like T5 on such a “language of time series” using the conventional cross-entropy loss function, with no changes to the model architecture itself.

High-level depiction of Chronos. Left: Input time series is scaled and quantized to obtain a sequence of tokens. Center: The tokens are fed into a language model, which is trained using the cross-entropy loss. Right: During inference, tokens are sampled autoregressively from the model and mapped back to numerical values.

Despite its simplicity, Chronos is remarkably accurate. In a comprehensive evaluation involving 42 datasets, Chronos significantly outperformed classical statistical methods, as well as specialized deep-learning models, on data held out from its training sets. More important, on entirely new datasets, Chronos’s zero-shot performance was comparable and occasionally superior to that of models trained directly on those datasets.

A core strength of Chronos is its ability to leverage diverse time series data from different domains to improve generalization. To enhance the model’s robustness, we augmented the public data sources used for pretraining with randomly mixed-in real samples (TSMix) and with a synthetically generated dataset based on Gaussian processes (KernelSynth).