Can Tiny LLMs Compute Accurately? WorldModel‑Qwen Inference‑Time WASM Execution

The article details how the small Qwen‑0.6B model was adapted to generate and run WebAssembly code during inference, achieving deterministic calculations and revealing both the promise and current limitations of integrating world‑model reasoning into tiny LLMs.

AI Engineering
AI Engineering
AI Engineering
Can Tiny LLMs Compute Accurately? WorldModel‑Qwen Inference‑Time WASM Execution

From <think> Tags to Integrated Code Execution

Initial approach wrapped prompts in <think> and <model> tags so the model generated Python code, which was then executed by an external tool. This required back‑and‑forth between the model and the executor, reducing efficiency.

WASM: A Sandbox Computing Environment

WebAssembly was selected for its simplicity and sandboxing. The Qwen‑0.6B model was modified to emit both natural‑language tokens and WASM (.wat) code, allowing immediate compilation and execution.

User: Calculate 12 * 7
Assistant: <think>I need to calculate 12 * 7...</think>
<wat_model>
(module
  (func $compute (param f64 f64) (result f64)
    local.get 0
    local.get 1
    f64.mul))
</wat_model>
<computed>84</computed>

Multi‑Layer WASM Architecture

During training three “WASM layers” were created, analogous to visual‑model layers. Each layer specialized in a basic arithmetic operation—addition, multiplication, subtraction—forming a computational‑expert mixture. The model uses a Flamingo‑style cross‑attention mechanism to generate .wat code, then a scoring system selects the best candidate, all within the inference step.

Experimental Results

After 30 training rounds the model’s outputs for the task 12 × 11 were:

Layer  3:   144.000000 (multiply) [score: 3.80]
Layer  7:   132.000000 (multiply) [score: 3.44]  # correct answer
Layer 11: SKIPPED [score: 3.07]

The correct result (132) appeared in layer 7, but the higher‑scoring layer 3 was chosen, yielding 144. The author notes the gap and suggests expanding the training data and improving the selection mechanism.

Technical Verification

Gemini was used to review the generated code. It confirmed that the system transforms Qwen into a multimodal model capable of handling natural language and WASM, generating .wat code via cross‑attention, scoring, compiling, and executing it safely in a wasmtime sandbox.

Broader Perspective

The work illustrates how adding explicit model‑and‑tool calling abilities, combined with retrieval‑augmented generation for scoring, can build a basic world model that solves computable problems and reduces hallucinations on such tasks.

Project repository: https://github.com/bigattichouse/worldmodel

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMWebAssemblyInferenceworld modelQwen-0.6BWASM execution
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.