Can TISA’s Tile‑Level Dynamic Scheduling Double AI Inference Efficiency?

A paper from Yixing Intelligent detailing the Tile‑level Instruction Set Architecture (TISA) dynamic scheduling framework was accepted at ISCA 2026, showing how a new compiler‑hardware co‑design can dramatically improve AI accelerator utilization and outperform static scheduling approaches on major models.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Can TISA’s Tile‑Level Dynamic Scheduling Double AI Inference Efficiency?

The AI accelerator market is dominated by a race for higher raw compute power—more GPUs, larger chips, higher TFLOPS—but actual utilization often falls far short of theoretical peaks. Conventional static scheduling compiles a fixed execution order before runtime, leaving hardware unable to adapt to dynamic conditions, which creates pipeline bubbles and wasted cycles.

Why Static Scheduling Falls Short

Modern AI chips consist of matrix units, vector units, and DMA engines that should operate in a tightly coupled pipeline. Static compilers schedule all tasks ahead of time, so any deviation—such as a unit becoming idle or data arriving early—cannot be exploited, leading to idle periods and reduced effective throughput.

TISA: Tile‑level Instruction Set Architecture

The Yixing Intelligent team proposes TISA, a three‑component dynamic scheduling system:

Semantic‑preserving compiler : Retains full background information (operator types, dependencies, data flow) when translating AI models to instructions, providing the hardware with rich context for scheduling decisions.

Tile‑level ISA : Each tile (a sub‑task derived from a larger operator) carries a standardized “task card” describing its type, required hardware resources, data dependencies, and parallelism constraints, enabling the chip to make informed decisions at runtime.

Conflict‑aware runtime scheduler : Continuously monitors the status of matrix, vector, and DMA units. When a unit becomes free, the scheduler instantly dispatches a ready tile without waiting for compiler‑defined synchronization points, eliminating pipeline bubbles.

The scheduler’s decision latency is measured in nanoseconds, adding negligible overhead while significantly reducing idle time.

Real‑World Evaluation with FlashAttention‑3

FlashAttention‑3, the state‑of‑the‑art attention implementation for large‑model inference, serves as a demanding benchmark. The static version relies on explicit synchronization primitives such as bar.sync() and wgmma.wait() to enforce ordering, causing unavoidable stalls.

In contrast, the TISA‑enabled version removes these barriers; the ISA semantics implicitly encode dependencies, allowing the hardware to schedule overlapping matrix multiplication and softmax operations across iterations. This eliminates the need for manual sync calls.

Performance Results

Across ResNet‑50, BERT, GPT‑J, and LLaMA‑2, the TISA implementation achieved:

~30% reduction in code size

~50% fewer synchronization calls

95% of the hand‑tuned baseline performance

On the company’s own EPOCH chip, TISA delivered an average 1.46× inference latency improvement despite lower peak FLOPS compared to competing hardware, demonstrating that higher utilization can outweigh raw compute advantages.

Broader Implications

TISA represents a paradigm shift from “compiler‑fixed, hardware‑passive” static execution to a “compiler‑describes intent, hardware‑decides in real time” dynamic model, echoing the historic transition of superscalar CPUs that won over VLIW architectures. This shift promises a new efficiency frontier for AI accelerators.

TISA dynamic scheduling concept diagram
TISA dynamic scheduling concept diagram
AIDynamic Schedulingchip architectureISCA 2026Tile-level ISA
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.