TF-CoDiT: A New Approach to Synthesizing Treasury Futures Data

TF-CoDiT introduces a diffusion‑Transformer framework that converts multi‑channel treasury futures time series into discrete wavelet coefficients, encodes cross‑channel dependencies with a U‑shaped VAE, conditions generation on a structured FinMAP prompt, and achieves state‑of‑the‑art MSE and MAE scores across multiple contracts and horizons.

Bighead's Algorithm Notes
Bighead's Algorithm Notes
Bighead's Algorithm Notes
TF-CoDiT: A New Approach to Synthesizing Treasury Futures Data

Background Treasury futures are a cornerstone of the fixed‑income market, but generating realistic synthetic data is difficult because of sparse samples, strong market dependencies, and complex multivariate correlations. Existing generative methods such as GANs and diffusion models focus on unconstrained generation and struggle to capture these nuances.

Problem Definition Given a treasury‑futures time series X = {x_1, x_2, …, x_T} where each x is an 8‑dimensional vector (open, close, high, low, settlement, turnover, volume, open‑interest), the goal is to learn a conditional distribution p(X|c) where c is a natural‑language description standardized by the Financial Market Attribute Protocol (FinMAP).

Method

Signal Conversion The raw series are transformed into a discrete wavelet transform (DWT) matrix using Haar wavelets. For each channel i, the signal x^i passes through high‑pass filters φ_k and low‑pass filters ψ_k at decomposition level J, yielding approximation coefficients a_{J,k} (low‑frequency trend) and detail coefficients d_{j,k} (high‑frequency shocks). These coefficients are stacked into a three‑dimensional tensor W. Because DWT is invertible, the inverse DWT (IDWT) later reconstructs the time‑domain series from generated coefficients, effectively treating the multivariate series as a multi‑channel 2‑D image.

Backbone LLM The backbone follows the FuseDiT architecture. A pretrained large language model (LLM) encodes the textual condition c and performs denoising diffusion. At each attention layer l, the text hidden state h_l^{text} and latent hidden state h_l^{latent} are concatenated. A shared self‑attention with asymmetric masks (causal mask for text tokens, full mask for latent tokens) processes the combined sequence, with layer‑norm ( LN) and modulation ( M) applied separately to each modality.

U‑shaped Variational Auto‑Encoder (U‑VAE) The encoder splits the wavelet matrix W into non‑overlapping patches of size P_f × P_t per channel, projects each patch to a vector sequence, adds a 2‑D positional embedding E_{pos}, and feeds the sequence through L Transformer layers with latent query attention (LQA). The deepest hidden state H^L is mapped to a Gaussian latent distribution with mean μ and variance σ, from which z_0 is sampled. The decoder mirrors the encoder, using up‑sampling latent query attention to reconstruct the wavelet tensor, followed by a projection and reshaping step.

FinMAP (Financial Market Attribute Protocol) FinMAP provides a hierarchical description system that extracts 17–23 economic indicators from daily and periodic market narratives. These structured prompts guide the LLM, ensuring that generated series respect real‑world market dynamics.

Inference Generation proceeds in three steps: (1) iterative denoising starts from Gaussian noise z_T; (2) the LLM, conditioned on FinMAP prompts, predicts the clean latent z_0 using a DPM‑Solver to solve the reverse stochastic differential equation; (3) the pretrained VAE decoder maps z_0 back to wavelet coefficients, which are finally transformed to the time domain via IDWT, yielding an 8‑variable treasury‑futures sequence.

Experiments

Dataset Four contracts (TS, TF, T, TL) covering 2015‑03‑20 to 2025‑12‑31 were collected from Wind and internal databases. Daily market comments were aggregated and automatically parsed into FinMAP‑structured prompt‑time‑series pairs.

Setup Generation lengths L = 8, 32, 64, 128 days were evaluated. For each length, the average MSE and MAE of OHLC values were reported on the last 200 days of the dataset. Prompts were built using a hierarchical time‑aggregation pipeline matching the task horizon.

Baselines TF‑CoDiT was compared against three groups: (i) time‑series GANs (TimeGAN, WGAN, QuatGAN); (ii) diffusion models (GBMDiff, FinDDPM, TimeDiT, T2S); (iii) foundation models (TimeLLM, CALF).

Results Across 8‑64‑day horizons, GAN‑based models suffered high errors (MSE > 0.45, MAE ≈ 0.5). Diffusion and foundation models performed better, but TF‑CoDiT consistently achieved the lowest errors, improving average MSE by 13.4 % and MAE by 12.8 % over the best prior model (T2S). For the 128‑day ultra‑long horizon, TF‑CoDiT remained robust (MSE = 0.41, MAE = 0.42) while most baselines degraded sharply.

Conditional vs. Unconditional Removing the FinMAP prompt increased MSE from 0.30 to 0.44 and MAE from 0.27 to 0.43 for the 32‑day task, matching the performance of the unconditional FinDDPM baseline.

Ablation of FinMAP Daily‑level FinMAP contributed the most to performance; using only daily prompts reduced unconditional errors, and combining daily with periodic prompts yielded the best results.

Case Study On the 10‑year contract (T) for 32‑ and 64‑day generation, TF‑CoDiT achieved the lowest cumulative reconstruction error, with the gap to TimeDiT more than doubling at 64 days, demonstrating superior control of variance explosion in long‑term synthesis.

Discussion

MSE vs. L1 Loss Replacing the standard MSE loss with L1 reduced reconstruction error on approximation coefficients by 1.24 % and on detail coefficients by 10.94 %, because treasury‑futures exhibit heavy‑tailed distributions where MSE over‑penalizes outliers.

U‑VAE vs. Conv‑VAE On four multivariate time‑series benchmarks (ETT, WTH, QLIB, MIMIC), U‑VAE outperformed Conv‑VAE on three datasets, confirming the importance of modeling cross‑channel correlations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

benchmarkdiffusion transformerwavelet transformfinancial time series synthesisFinMAPTF-CoDiTU-VAE
Bighead's Algorithm Notes
Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.