Tagged articles

zero-shot

26 articles · Page 1 of 1

May 28, 2026 · Artificial Intelligence

Can a Pre‑trained Embodied Model Work Out‑of‑the‑Box? New Chinese Open‑Source VLA Model Shows Yes

The newly open‑sourced Wall‑OSS‑0.5 VLA model demonstrates that a large‑scale pre‑trained embodied robot brain can achieve strong zero‑shot performance on 17 real‑world tasks, exhibit staircase emergence with longer pre‑training, and far surpass the industry baseline after fine‑tuning, while also revealing current precision limits.

BenchmarkEmbodied AIVLA

0 likes · 15 min read

Can a Pre‑trained Embodied Model Work Out‑of‑the‑Box? New Chinese Open‑Source VLA Model Shows Yes

Machine Heart

May 15, 2026 · Artificial Intelligence

FreeOcc: The First Training‑Free Open‑Vocabulary 3D Occupancy Mapping System (RSS‑2026)

FreeOcc introduces a training‑free, open‑vocabulary 3D occupancy prediction framework that combines SLAM‑based pose estimation, 3D Gaussian Splatting, and pretrained vision‑language models to build globally consistent semantic maps, achieving over‑two‑fold IoU improvements on EmbodiedOcc‑ScanNet and strong zero‑shot generalization on the new ReplicaOcc benchmark.

3D GaussianFreeOccSLAM

0 likes · 19 min read

FreeOcc: The First Training‑Free Open‑Vocabulary 3D Occupancy Mapping System (RSS‑2026)

Machine Heart

Apr 25, 2026 · Artificial Intelligence

Enabling Unseen Language QA Without Training LLMs: XBridge’s Plug‑in Multilingual Extension

XBridge combines a pre‑trained English‑centric LLM with an external multilingual NMT model via optimal‑transport alignment and a three‑stage training scheme, allowing zero‑training of the LLM while achieving high‑quality question answering and generation for low‑resource and unseen languages, narrowing the performance gap with high‑resource languages.

LLMNMTXBridge

0 likes · 8 min read

Enabling Unseen Language QA Without Training LLMs: XBridge’s Plug‑in Multilingual Extension

AI Frontier Lectures

Mar 5, 2026 · Artificial Intelligence

Can Robots Navigate Unseen Spaces with Only Language? EvoNav’s Zero‑Shot Vision‑Language Breakthrough

The EvoNav framework from Nanjing University of Science and Technology tackles the last‑hundred‑meter challenge of embodied navigation by integrating a Future Chain‑of‑Thought and a Historical Experience chain, achieving significant zero‑shot performance gains on VLN‑CE benchmarks and real‑world robot tests, with code released on GitHub.

Embodied AIEvoNavFuture Chain of Thought

0 likes · 6 min read

Can Robots Navigate Unseen Spaces with Only Language? EvoNav’s Zero‑Shot Vision‑Language Breakthrough

PaperAgent

Jan 30, 2026 · Artificial Intelligence

How LLM‑in‑Sandbox Turns Large Models into General‑Purpose Agents Without Extra Training

The LLM‑in‑Sandbox framework places large language models inside a virtual machine that provides external tool access, persistent storage, and code execution, yielding up to a 24.2% performance boost across six benchmark tasks without additional training, and it scales from zero‑shot to reinforcement‑learning‑enhanced agents while remaining cost‑effective.

EfficiencyLLMSandbox

0 likes · 6 min read

How LLM‑in‑Sandbox Turns Large Models into General‑Purpose Agents Without Extra Training

Frontend AI Walk

Dec 5, 2025 · Artificial Intelligence

Master Prompt Engineering: From Random Chat to Precise Control with Zero-shot, Few-shot, and Chain‑of‑Thought

This article explains how to converse effectively with large language models by mastering three core prompting techniques—Zero‑shot, Few‑shot, and Chain‑of‑Thought—illustrated with front‑end analogies, code snippets, and a step‑by‑step DeepSeek JSON‑generation exercise that shows common pitfalls and best practices.

Chain-of-ThoughtDeepSeekFew-shot

0 likes · 12 min read

Master Prompt Engineering: From Random Chat to Precise Control with Zero-shot, Few-shot, and Chain‑of‑Thought

Amap Tech

Jul 24, 2025 · Artificial Intelligence

FingER: Fine-Grained Evaluation and Reasoning for AI-Generated Videos

The paper introduces FingER, an entity-level evaluation framework and the FingER-Instruct-60k dataset for assessing AI-generated video quality with fine-grained reasoning, and demonstrates state-of-the-art zero-shot performance on multiple benchmarks using novel training strategies.

AI-generated videodatasetfine-grained evaluation

0 likes · 9 min read

FingER: Fine-Grained Evaluation and Reasoning for AI-Generated Videos

Bilibili Tech

Jul 11, 2025 · Artificial Intelligence

IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS

IndexTTS2 introduces a novel auto-regressive zero-shot text-to-speech model that achieves precise duration control and fine-grained emotional expression through a universal time‑encoding mechanism, decoupled voice‑style and emotion modeling, and a GPT‑style latent feature, outperforming state‑of‑the‑art baselines across multiple benchmarks.

Text‑to‑Speechduration controlemotional synthesis

0 likes · 23 min read

IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS

Amap Tech

Jul 9, 2025 · Artificial Intelligence

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

This article introduces VMBench, the first perception‑aligned video motion generation benchmark that defines a five‑dimensional metric suite and a meta‑guided prompt generation pipeline, and presents LD‑RPS, a zero‑shot unified image restoration framework based on latent diffusion recurrent posterior sampling, together with extensive experiments validating both systems.

BenchmarkDiffusion Modelsimage restoration

0 likes · 14 min read

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

AI Algorithm Path

Jul 1, 2025 · Artificial Intelligence

Beginner’s Guide to CLIP Inference: Step‑by‑Step with Hugging Face

This tutorial walks through loading the openai/clip‑vit‑base‑patch32 model with Hugging Face, preprocessing images and text, encoding them into a shared embedding space, computing cosine similarity for zero‑shot image‑text matching, and visualizing the results, all with concrete code examples.

CLIPCosine SimilarityHugging Face

0 likes · 6 min read

Beginner’s Guide to CLIP Inference: Step‑by‑Step with Hugging Face

AIWalker

Mar 13, 2025 · Artificial Intelligence

YOLOE: Real‑Time Open‑World Object Detection and Segmentation Unveiled

The paper introduces YOLOE, a new YOLO‑based model that supports text, visual, and no‑prompt open‑world detection and segmentation, detailing its lightweight RepRTA, SAVPE, and LRPC modules and showing benchmark gains in speed and zero‑shot performance on LVIS and COCO.

BenchmarkYOLOEcomputer vision

0 likes · 9 min read

YOLOE: Real‑Time Open‑World Object Detection and Segmentation Unveiled

AIWalker

Feb 11, 2025 · Artificial Intelligence

LLMDet: LLM‑Powered Open‑Vocabulary Detector Beats Grounding DINO

LLMDet introduces a novel training pipeline that leverages large language models to generate detailed image‑level captions and region‑level phrases, fine‑tunes an open‑vocabulary detector with the GroundingCap‑1M dataset, and achieves state‑of‑the‑art zero‑shot performance surpassing Grounding DINO across multiple benchmarks.

GroundingCapLLMDetLarge Language Models

0 likes · 20 min read

LLMDet: LLM‑Powered Open‑Vocabulary Detector Beats Grounding DINO

AIWalker

Feb 4, 2025 · Artificial Intelligence

Meta’s Open‑Source MILS Enables LLMs to See and Hear Without Training – SOTA on Images, Video, and Audio

The paper introduces MILS, a training‑free multimodal iterative LLM solver that lets large language models perceive and generate across image, video, and audio domains, achieving new state‑of‑the‑art results without any task‑specific data or fine‑tuning.

AI researchLLMMILS

0 likes · 18 min read

Meta’s Open‑Source MILS Enables LLMs to See and Hear Without Training – SOTA on Images, Video, and Audio

AIWalker

Jan 15, 2025 · Artificial Intelligence

Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation

Magic Mirror introduces a single‑stage, zero‑shot framework that fuses dual facial embeddings with a conditional adaptive normalization module inside a Video Diffusion Transformer, achieving superior identity consistency, natural dynamics, and high visual quality compared with existing video generation methods.

conditional adaptive normalizationdiffusion transformeridentity preservation

0 likes · 16 min read

Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation

JD Tech

Nov 12, 2024 · Artificial Intelligence

Prompt Engineering: Concepts, Evolution, Techniques, and JD Logistics Application

This article explains what Prompt Engineering is, traces its development from early NLP commands to modern adaptive and multimodal prompting techniques, describes various prompting strategies such as Zero‑shot, Few‑shot, Chain‑of‑Thought, Auto‑CoT, and showcases a JD Logistics case study using these methods to classify product types with code examples.

AI Prompt DesignChain-of-ThoughtFew-shot

0 likes · 27 min read

Prompt Engineering: Concepts, Evolution, Techniques, and JD Logistics Application

Xiaohongshu Tech REDtech

Feb 27, 2024 · Artificial Intelligence

InstantID: Zero-shot Identity-Preserving Generation in Seconds

InstantID, an open‑source tool released by Xiaohongshu in early 2024, generates multiple stylized portraits that preserve a person’s facial identity from a single reference photo in seconds, eliminating fine‑tuning, large storage needs, and multi‑image requirements while seamlessly working with popular diffusion models like Stable Diffusion 1.5 and SDXL.

AIInstantIDdiffusion model

0 likes · 6 min read

InstantID: Zero-shot Identity-Preserving Generation in Seconds

Rare Earth Juejin Tech Community

Jul 30, 2023 · Artificial Intelligence

ChatGPT Technical Analysis Series – Part 2: GPT1, GPT2, and GPT3 (Encoder vs Decoder, Zero‑Shot, and Scaling)

This article reviews the evolution of the GPT family from GPT‑1 to GPT‑3, comparing encoder‑decoder architectures, explaining the shift from supervised fine‑tuning to zero‑shot and few‑shot learning, and highlighting the architectural and training innovations that enabled large‑scale language models.

GPTLLMTransformer

0 likes · 13 min read

ChatGPT Technical Analysis Series – Part 2: GPT1, GPT2, and GPT3 (Encoder vs Decoder, Zero‑Shot, and Scaling)

Alibaba Cloud Developer

Jul 19, 2023 · Artificial Intelligence

Mastering Prompt Engineering: Techniques, Tips, and Real-World Examples

This comprehensive guide explores prompt engineering for large language models, covering its background, fundamental concepts, prompt formats, construction principles, advanced techniques like few‑shot, zero‑shot, and chain‑of‑thought prompting, as well as practical examples, evaluation metrics, and future directions.

Artificial IntelligenceChain-of-ThoughtFew-shot

0 likes · 33 min read

Mastering Prompt Engineering: Techniques, Tips, and Real-World Examples

DataFunTalk

Jun 21, 2023 · Artificial Intelligence

Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications

This article presents a low‑resource NLP pretraining approach that combines transformer‑based language modeling with contrastive vector learning, details the unsupervised sample‑pair construction, introduces a camel‑shaped masking distribution, and demonstrates through extensive experiments that the resulting model achieves strong zero‑shot NLU, NLG, and retrieval performance while requiring minimal compute and data.

Language Modelingcontrastive learninglow-resource

0 likes · 10 min read

Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications

ByteFE

Jun 15, 2023 · Artificial Intelligence

Effective Prompt Engineering: Techniques, Prompt Injection Prevention, Hallucination Mitigation, and Advanced Prompting Strategies

This article explains how to craft efficient prompts by combining clear instructions and questions, discusses prompt injection risks and mitigation with delimiters, addresses hallucinations, and introduces zero‑shot, few‑shot, and chain‑of‑thought prompting techniques for large language models.

Chain-of-ThoughtFew-shotHallucination

0 likes · 16 min read

Effective Prompt Engineering: Techniques, Prompt Injection Prevention, Hallucination Mitigation, and Advanced Prompting Strategies

Python Programming Learning Circle

Jun 8, 2022 · Artificial Intelligence

Leveraging PaddleNLP UIE for Zero‑Shot Logistic Parcel Information Extraction

This article explains how PaddleNLP's Universal Information Extraction (UIE) model can dramatically reduce labeling effort and improve accuracy for logistics parcel data extraction, showcasing a five‑sample experiment that boosts F1 by 18 points to 93% and providing a zero‑shot Python example.

NLPPaddleNLPPython

0 likes · 5 min read

Leveraging PaddleNLP UIE for Zero‑Shot Logistic Parcel Information Extraction

DaTaobao Tech

May 24, 2022 · Artificial Intelligence

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

GEN‑VLKT introduces a Guided‑Embedding Network with position‑ and instance‑guided embeddings to remove costly post‑processing and leverages CLIP‑based visual‑linguistic knowledge transfer for interaction understanding, achieving state‑of‑the‑art HOI detection performance and zero‑shot capability, now deployed in Alibaba’s Taobao services.

CLIPHOI detectionTransformer

0 likes · 7 min read

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

Baobao Algorithm Notes

Mar 7, 2022 · Artificial Intelligence

How CLIP Uses Natural Language Supervision for Powerful Zero‑Shot Vision

This article explains CLIP’s multimodal contrastive pre‑training, its simple yet effective architecture, code implementation, and how its zero‑shot capability can surpass supervised ImageNet models by leveraging a 400‑million image‑text dataset and shared semantic embeddings.

AICLIPMultimodal

0 likes · 7 min read

How CLIP Uses Natural Language Supervision for Powerful Zero‑Shot Vision

DataFunTalk

Jan 16, 2022 · Artificial Intelligence

DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation and Zero‑Shot Transfer

DeltaLM is a new multilingual pretrained encoder‑decoder model that leverages a pretrained encoder and a novel decoder to improve multilingual neural machine translation, offering efficient training, strong cross‑language transfer, zero‑shot translation, and superior performance on various translation and summarization tasks.

DeltaLMMachine TranslationMultilingual

0 likes · 13 min read

DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation and Zero‑Shot Transfer

DataFunSummit

Jan 13, 2022 · Artificial Intelligence

DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation

DeltaLM is a multilingual pretrained encoder‑decoder model that leverages cross‑lingual transfer from a pretrained encoder and novel decoder architecture, employs span‑corruption and translation‑pair pretraining tasks, and uses a two‑stage fine‑tuning strategy to achieve strong zero‑shot and supervised translation performance across over 100 languages.

Cross-Lingual TransferDeltaLMNeural Machine Translation

0 likes · 12 min read

DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation

DataFunTalk

Apr 7, 2021 · Artificial Intelligence

Alibaba's Advances in Multilingual Neural Machine Translation: Research and Practice

This article presents Alibaba's comprehensive research on multilingual neural machine translation, covering motivations, model architectures, intermediate language modules, data‑augmentation strategies such as repair translation, integration of pre‑trained models with adapters, and engineering optimizations that enable a production‑ready system supporting over 200 languages.

AdapterAlibabaNeural Machine Translation

0 likes · 21 min read

Alibaba's Advances in Multilingual Neural Machine Translation: Research and Practice