Artificial Intelligence 14 min read

Introducing TTFA: Hong Kong University’s Open‑Source FASTER Gives VLA Models Instant Reaction

The paper identifies real‑time latency as the main obstacle for deploying VLA models on robots, proposes the TTFA metric and the FASTER framework with a Horizon‑Aware Schedule, mixed scheduling and streaming inference, and demonstrates through extensive GPU and task experiments that TTFA and reaction time can be cut by up to three‑fold without sacrificing motion quality.

Machine Heart

May 14, 2026

Introducing TTFA: Hong Kong University’s Open‑Source FASTER Gives VLA Models Instant Reaction

Problem

Vision‑Language‑Action (VLA) models such as π0.5 and X‑VLA generate action chunks that contain dozens of future motions. In real‑world robots the controller runs at a fixed frequency (e.g., 30 Hz → 33.3 ms per cycle). Model inference often exceeds this period, so the robot must pause until the whole chunk is produced. Synchronous inference therefore causes frequent stalls, while ordinary asynchronous inference only modestly reduces the delay.

New Metric: TTFA

The authors argue that reaction time is not a constant equal to model latency because external events occur at unpredictable moments. They introduce TTFA (Time to First Action) , analogous to TTFT in language models, to measure how quickly the first actionable motion can be generated after an event.

Analysis of Existing Pipelines

Current VLA pipelines use a constant‑time‑step schedule: every action in a chunk is sampled with the same number of steps (e.g., 10 steps). This forces the robot to wait for the entire chunk even though only the first action is needed to start moving, creating a bottleneck.

Using a Straightness metric the authors show that early‑stage actions follow near‑linear trajectories, have smaller estimation errors, and require fewer sampling steps than later actions. Hence early actions are easier to predict.

FASTER Framework

FASTER removes the bottleneck with three complementary techniques:

Horizon‑Aware Schedule (HAS) : each action in a chunk receives a distinct hit time. Near‑term actions are allocated fewer sampling steps, while far‑term actions retain the full sampling budget, preserving overall trajectory quality.

Mixed Scheduling Strategy : during fine‑tuning each training sample uses HAS with probability p and the original constant schedule with probability 1‑p. This keeps the pretrained knowledge and avoids a large domain gap.

Streaming client‑server interface + early‑stop : as soon as an action is sampled it is sent to the robot controller; the model continues sampling subsequent actions. When all actions required for the current execution window are ready, remaining steps are aborted, shortening the inference‑execution cycle.

All components are plug‑and‑play: they require no changes to model architecture and incur no extra training cost.

Experimental Evaluation

Real‑world latency tests on two GPUs:

RTX 4060, X‑VLA : TTFA drops from 399.5 ms to 129.2 ms (≈3× speed‑up); expected reaction time drops from 599.5 ms to 229.2 ms (≈2.6×).

Similar improvements are observed for π0.5.

Probability analysis under random external events shows that FASTER reacts faster than synchronous inference >80 % of the time and outperforms ordinary asynchronous inference, achieving a 100 % advantage on X‑VLA.

Task‑level tests:

High‑speed ping‑pong : synchronous inference fails to hit the ball, asynchronous inference yields poor angles, while FASTER enables earlier paddle adjustment and significantly higher scores.

Everyday manipulation (grasp‑place drinks, towel stacking) : FASTER reduces pause‑induced inefficiencies and improves motion stability.

Simulation benchmarks on LIBERO and CALVIN confirm that HAS does not degrade task performance; quality remains comparable to the original models with only minor drops on a few tasks.

Conclusion

FASTER demonstrates that improving VLA deployment is not about accelerating the generation of the entire action chunk but about delivering the first critical action as quickly as possible. By compressing TTFA, increasing the inference‑execution loop frequency, and preserving overall motion quality, the framework provides a simple, effective path for real‑time embodied AI on both high‑end and consumer‑grade hardware.

Paper: https://arxiv.org/abs/2603.19199

Project page: https://innovator-zero.github.io/FASTER/

Code: https://github.com/innovator-zero/FASTER

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Embodied AI Robotics Real-time inference VLA FASTER TTFA

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.