Artificial Intelligence 18 min read

ChatTS: A Synthetic Data‑Driven Multimodal LLM that Natively Understands Time Series

ChatTS is a time‑series‑native multimodal large language model trained on purely synthetic data, offering superior understanding and reasoning over both real and synthetic time‑series datasets, and outperforming existing LLM baselines across alignment and inference tasks.

Volcano Engine Developer Services

Jun 18, 2025

ChatTS: A Synthetic Data‑Driven Multimodal LLM that Natively Understands Time Series

Background

Recent advances in multimodal large language models (MLLM) have achieved breakthroughs in image, video, and audio domains, but systematic research on time‑series data combined with LLMs remains scarce. Existing works such as TimeLLM focus mainly on prediction tasks and cannot satisfy more complex understanding and reasoning requirements.

Motivation

As LLMs are increasingly applied to enterprise AIOps, financial analysis, and other scenarios that require processing time‑series data, the ability to "understand time series" becomes a fundamental capability for multimodal intelligent systems. To address this, we propose ChatTS , a multimodal LLM fine‑tuned on purely synthetic data that possesses native time‑series understanding and reasoning abilities and achieves significant performance gains on multiple real and synthetic benchmarks.

Challenges

Data scarcity : Unlike image‑text or speech‑text domains, large‑scale aligned time‑series‑text datasets are virtually nonexistent.

Highly structured modality : Time series contain rich patterns such as trends, seasonality, local fluctuations, and noise, requiring precise alignment strategies.

Multivariate and variable‑length inputs : Coordinated variations across multiple variables increase the difficulty of understanding.

Insufficient evaluation benchmarks : Existing benchmarks do not cover multimodal time‑series modeling tasks, limiting training and validation.

Existing Methods

Current attempts to apply LLMs to time series fall into three categories: (1) Text‑Based methods that encode series as long text strings, preserving exact values but suffering from context‑length limits and poor multivariate support; (2) Vision‑Based methods that convert series to plots for visual LLMs, which lose fine‑grained details; (3) Agent‑Based methods that combine LLMs with external analysis tools, leading to long tool‑calling chains and hallucinations.

Our Approach

We introduce ChatTS , a native multimodal LLM that directly accepts time‑series arrays as input, learns to align numerical fluctuations with language, and performs question answering and reasoning in natural language. The model has attracted attention from HuggingFace and SparkNLP project leaders.

Synthetic Data Generation

We define an attribute‑based time‑series generation framework covering four categories: Trend, Seasonality, Local Fluctuation, and Noise. Each attribute has explicit semantics and parameters, forming an "attribute pool". By sampling combinations of attributes, we generate diverse series paired with high‑quality natural‑language descriptions, ensuring tight alignment between data and text.

Attribute‑based time series generation diagram

Time Series Evol‑Instruct (TSEvol)

To equip the model with complex questioning, comparison, and reasoning abilities, we extend the Evol‑Instruct framework to time series. Starting from a seed set of Q&A pairs, TSEvol iteratively evolves new questions by increasing variable count, switching task types (recognition → comparison → reasoning), and injecting realistic business contexts such as database performance fault scenarios.

Native Multimodal Model Design

Based on Qwen2.5‑14B‑Instruct, we design a time‑series‑native input structure. The series is split into small patches, encoded by a lightweight MLP, and embedded into the original text context. This patch‑level insertion preserves the raw temporal structure while allowing the language model to locate specific series references.

We also introduce a "value‑preserving normalization" mechanism: during 0‑1 scaling, the original min/max parameters are retained in the prompt as text, enabling the model to learn series shape without losing absolute magnitude information.

Training Procedure

ChatTS is trained in two stages using fully synthetic data. The first stage, large‑scale modality alignment, employs three datasets: UTS (single‑variable global/local attribute recognition), MTS‑Shape (multivariate trend correlation), and MTS‑Local (multivariate local fluctuation correlation), totaling 100 k QA pairs. The second stage, supervised fine‑tuning, uses TSEvol‑generated complex QA covering induction, deduction, comparison, and causal reasoning, as well as instruction‑following data to preserve format understanding.

Experimental Results

We evaluate ChatTS on a comprehensive suite of alignment and reasoning tasks, including trend identification, period analysis, noise detection, local fluctuation detection, and multivariate correlation inference. On both real and synthetic test sets, ChatTS outperforms GPT‑4o and other baselines, achieving 46%–75% higher F1 on classification tasks and over 80% improvement on numeric prediction tasks.

For multivariate scenarios, ChatTS surpasses text‑based methods (limited by prompt length), vision‑based methods (limited by image resolution), and agent‑based methods (prone to tool‑calling errors), delivering accurate joint variable analysis with minimal token cost.

Case Studies

Real‑world case studies on CPU usage, e‑commerce request volume, and financial price series demonstrate ChatTS’s ability to perform both coarse shape analysis and fine‑grained numeric extraction, as well as to assist in fault diagnosis when combined with expert knowledge.

Thoughts and Outlook

ChatTS demonstrates a new paradigm: using controllable synthetic data to train a multimodal LLM that genuinely understands time series. The success suggests that "data generation + modality alignment" is a powerful approach for time‑series AI. Future work will extend ChatTS to forecasting, classification, and integration with external knowledge bases or expert rules, and explore hybrid architectures that combine ChatTS’s native understanding with high‑precision external tools.

AI time series multimodal LLM synthetic data LLM alignment TS‑MLLM

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.