Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models
时间序列作为语言:通用时间序列基础模型的通用分词器
Yunhao Zhang, Ruiying Qi, Jiale Zheng, Jianfeng Zhang, Lujia Pan, Junchi Yan
AI总结 提出UniTok通用分词器将时间序列转化为离散令牌,并基于NTP预训练UniTok-FM基础模型,支持零样本预测、提示增强预测以及少样本生成和分类,无需任务特定修改。
详情
虽然下一个令牌预测(NTP)统一了LLM的预训练,但其对无界、连续时间序列(TS)的适应仍然是一个开放问题。为了弥合这一差距,我们引入了UniTok,一个将TS转化为离散令牌的通用分词器,以及UniTok-FM,一个在这些令牌上通过NTP预训练的基础模型。UniTok-FM是一个通用基础模型,支持零样本和提示增强的预测,以及通过无训练上下文推理进行的少样本生成和分类——这是先前工作未能实现的能力。在技术上,UniTok是一个向量量化自编码器,结合了前缀归一化以实现尺度稳定、渐进分辨率因果架构用于编码和解码,以及结构保持重建损失用于训练。UniTok-FM采用现成的LLM架构,无需针对TS的特定修改。它不是在孤立的TS上预训练,而是在由多个具有相似模式的序列形成的上下文窗口上执行NTP,旨在捕捉它们的共享动态。在预测、生成和分类上的实验表明,单个统一的UniTok-FM始终优于统计和监督基线,与任务特定的基础模型性能相当,并且独特地实现了跨任务的无训练上下文推理。
While Next-Token Prediction (NTP) has unified LLM pretraining, its adaptation to unbounded, continuous time series (TS) remains open. To bridge the gap, we introduce UniTok, a universal tokenizer that transforms TS into discrete tokens, and UniTok-FM, a foundation model pretrained via NTP on these tokens. UniTok-FM is a general-purpose foundation model that supports zero-shot and prompt-boosted forecasting, as well as few-shot generation and classification via training-free in-context inference--a capability not achieved by prior works. Technically, UniTok is a vector-quantized autoencoder incorporating prefix normalization for scale stabilization, a progressive-resolution causal architecture for encoding and decoding, and a structure-preserving reconstruction loss for training. UniTok-FM adopts an off-the-shelf LLM architecture without TS-specific modifications. Instead of pretraining on isolated TS, it performs NTP on context windows formed by multiple series with similar patterns, aiming to capture their shared dynamics. Experiments on forecasting, generation, and classification show that a single unified UniTok-FM consistently outperforms statistical and supervised baselines, achieves competitive performance with task-specific foundation models, and uniquely enables training-free in-context inference across tasks.