Can Robots Navigate Unseen Spaces with Only Language? EvoNav’s Zero‑Shot Vision‑Language Breakthrough

The EvoNav framework from Nanjing University of Science and Technology tackles the last‑hundred‑meter challenge of embodied navigation by integrating a Future Chain‑of‑Thought and a Historical Experience chain, achieving significant zero‑shot performance gains on VLN‑CE benchmarks and real‑world robot tests, with code released on GitHub.

AI Frontier Lectures
AI Frontier Lectures
AI Frontier Lectures
Can Robots Navigate Unseen Spaces with Only Language? EvoNav’s Zero‑Shot Vision‑Language Breakthrough

Task Background

Vision‑Language Navigation in continuous environments (VLN‑CE) requires an embodied agent to understand natural‑language instructions and move freely in a physical space to reach a target. Zero‑shot approaches based on large language models (LLMs) often suffer from a lack of feedback and decision hallucinations, leading to cascading failures.

Core Contribution: EvoNav Evolutionary Paradigm

EvoNav mimics the human decision process History → Now → Future and introduces two complementary modules:

Future Chain‑of‑Thought (F‑CoT) : predicts future actions and landmarks, converting complex instructions into spatio‑temporal sub‑tasks so the agent can continuously anticipate the optimal direction.

Historical Experience Chain (H‑CoE) : maintains a dynamic experience repository that aggregates successful and failed trajectories. It provides:

Text trajectory experience : global navigation logic derived from past language‑action sequences.

Visual scene experience : uses CLIP to retrieve visually similar historical images, improving local perception reliability.

Figure 1: EvoNav research motivation
Figure 1: EvoNav research motivation
Figure 2: EvoNav overall architecture
Figure 2: EvoNav overall architecture

Experimental Results

EvoNav was evaluated on simulated benchmarks (R2R‑CE, NavRAG‑CE) and real‑world indoor scenes.

On R2R‑CE, success rate (SR) increased by 20 % and oracle success rate (OSR) by 21 % compared with the Open‑Nav baseline.

On the more challenging NavRAG‑CE dataset, SR gained an additional 6 % .

Real‑robot deployment on an omnidirectional wheeled platform demonstrated robust zero‑shot navigation in labs, corridors, and elevator halls.

Figure 3: Real‑robot navigation
Figure 3: Real‑robot navigation
Figure 4: Sample navigation trajectory
Figure 4: Sample navigation trajectory

Implementation

Code and model checkpoints are publicly available at:

https://github.com/daiguangzhao/EvoNav.git
embodied AIzero-shotEvoNavFuture Chain of ThoughtHistorical ExperienceVision-and-Language Navigation
AI Frontier Lectures
Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.