Artificial Intelligence 6 min read

Can Robots Navigate Unseen Spaces with Only Language? EvoNav’s Zero‑Shot Vision‑Language Breakthrough

The EvoNav framework from Nanjing University of Science and Technology tackles the last‑hundred‑meter challenge of embodied navigation by integrating a Future Chain‑of‑Thought and a Historical Experience chain, achieving significant zero‑shot performance gains on VLN‑CE benchmarks and real‑world robot tests, with code released on GitHub.

AI Frontier Lectures

Mar 5, 2026

Can Robots Navigate Unseen Spaces with Only Language? EvoNav’s Zero‑Shot Vision‑Language Breakthrough

Task Background

Vision‑Language Navigation in continuous environments (VLN‑CE) requires an embodied agent to understand natural‑language instructions and move freely in a physical space to reach a target. Zero‑shot approaches based on large language models (LLMs) often suffer from a lack of feedback and decision hallucinations, leading to cascading failures.

Core Contribution: EvoNav Evolutionary Paradigm

EvoNav mimics the human decision process History → Now → Future and introduces two complementary modules:

Future Chain‑of‑Thought (F‑CoT) : predicts future actions and landmarks, converting complex instructions into spatio‑temporal sub‑tasks so the agent can continuously anticipate the optimal direction.

Historical Experience Chain (H‑CoE) : maintains a dynamic experience repository that aggregates successful and failed trajectories. It provides:

Text trajectory experience : global navigation logic derived from past language‑action sequences.

Visual scene experience : uses CLIP to retrieve visually similar historical images, improving local perception reliability.

Experimental Results

EvoNav was evaluated on simulated benchmarks (R2R‑CE, NavRAG‑CE) and real‑world indoor scenes.

On R2R‑CE, success rate (SR) increased by 20 % and oracle success rate (OSR) by 21 % compared with the Open‑Nav baseline.

On the more challenging NavRAG‑CE dataset, SR gained an additional 6 % .

Real‑robot deployment on an omnidirectional wheeled platform demonstrated robust zero‑shot navigation in labs, corridors, and elevator halls.

Implementation

Code and model checkpoints are publicly available at:

https://github.com/daiguangzhao/EvoNav.git

embodied AI zero-shot EvoNav Future Chain of Thought Historical Experience Vision-and-Language Navigation

Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.