Can Neural Networks Replace Traditional CPUs? Inside the New Neural Computer
A groundbreaking study shows how Meta AI and KAUST transformed a video‑generation model into a neural‑computer that unifies computation, storage, and I/O, enabling pixel‑perfect command‑line and graphical UI control while highlighting current limitations in arithmetic reasoning and long‑term program stability.
Neural Computer Concept
Traditional digital computers separate CPU, memory and storage, executing explicit instructions written by programmers. A neural computer (NC) replaces this architecture with a single massive set of learned weights that simultaneously implements computation, storage and I/O. Each user interaction—keyboard keystroke or mouse movement—updates an implicit internal state that serves both as working memory and as a representation of the current task context. The long‑term goal of the research team is a Completely Neural Computer (CNC) that can be programmed, execute stable workloads, and retain learned skills over long periods.
Command‑Line Pixel‑Level Probing
To evaluate feasibility, the authors selected the state‑of‑the‑art video‑generation model Wan2.1 and added a dedicated motion module. Two datasets were constructed:
Real‑world terminal recordings : captured from interactive sessions with varied commands, colors and layouts.
Synthetic script‑driven recordings : generated from fixed command scripts to provide clean, deterministic examples.
The model receives a textual prompt describing the desired terminal state and an initial screenshot. It then iteratively updates its hidden representation and renders the next frame, effectively predicting the evolution of the terminal UI.
Key findings:
Even at a 13‑pixel font size the generated frames preserve syntax highlighting, cursor movement, progress bars and column alignment with high fidelity.
More detailed textual descriptions (e.g., explicit color names, expected output layout) improve reconstruction accuracy.
Pure arithmetic remains a weakness: baseline video models achieve single‑digit accuracy on basic calculations. When the correct answer is implicitly embedded in the prompt, accuracy jumps to 83 % , indicating that the current architecture functions primarily as a high‑quality renderer rather than a symbolic reasoner.
Graphical UI Precise Control
Extending the approach to graphical user interfaces (GUIs) introduces the need for accurate cursor tracking and immediate click feedback. The authors compared three data sources:
Randomly collected interaction logs (thousands of hours).
Goal‑directed interaction logs (110 hours of scripted tasks with clear intent).
Hybrid data with explicit visual masks of the cursor.
Results:
Goal‑directed data outperformed random data despite being an order of magnitude smaller, confirming that clear intent and predictable state transitions are essential for learning UI manipulation.
Baseline cursor‑only models achieved <10 %> accuracy; after adding complex feature transformations the accuracy rose to ~13 %.
Providing a visual mask of the cursor as an additional input channel increased accuracy dramatically to 98.7 % , demonstrating the power of explicit visual supervision.
Four depth‑injection strategies for feeding keyboard and mouse events were evaluated (raw key streams, abstracted action commands, and two intermediate representations). Across all variants, deep injection consistently yielded the most coherent frame rendering and the lowest response latency.
Towards a Fully Neural Computer
The prototype demonstrates that a neural network can align input‑output streams, execute short‑term command workflows, and render visually realistic screens. However, several open challenges remain:
Stable reuse of legacy programs : current models cannot reliably load and execute existing binaries without retraining.
Complex symbolic computation : arithmetic and logical reasoning still require external assistance.
Long‑term error‑free operation : maintaining consistency over extended sessions is an unsolved problem.
Unlike conventional computers that fail catastrophically on a single instruction error, neural computers tolerate noise due to their high‑dimensional numeric representations, enabling simultaneous processing of vision, language and audio modalities.
When general‑purpose programming interfaces and persistent state are achieved, interaction with computers could be fundamentally reshaped: user actions, screenshots and spoken commands would be directly incorporated as executable code inside a continuously learning, brain‑like substrate.
Reference materials:
https://arxiv.org/pdf/2604.06425
https://metauto.ai/neuralcomputer/
https://github.com/metauto-ai/NeuralComputer
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
