Tagged articles
25 articles
Page 1 of 1
Machine Heart
Machine Heart
May 15, 2026 · Artificial Intelligence

FreeOcc: The First Training‑Free Open‑Vocabulary 3D Occupancy Mapping System (RSS‑2026)

FreeOcc introduces a training‑free, open‑vocabulary 3D occupancy prediction framework that combines SLAM‑based pose estimation, 3D Gaussian Splatting, and pretrained vision‑language models to build globally consistent semantic maps, achieving over‑two‑fold IoU improvements on EmbodiedOcc‑ScanNet and strong zero‑shot generalization on the new ReplicaOcc benchmark.

3D GaussianFreeOccSLAM
0 likes · 19 min read
FreeOcc: The First Training‑Free Open‑Vocabulary 3D Occupancy Mapping System (RSS‑2026)
Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

Enabling Unseen Language QA Without Training LLMs: XBridge’s Plug‑in Multilingual Extension

XBridge combines a pre‑trained English‑centric LLM with an external multilingual NMT model via optimal‑transport alignment and a three‑stage training scheme, allowing zero‑training of the LLM while achieving high‑quality question answering and generation for low‑resource and unseen languages, narrowing the performance gap with high‑resource languages.

LLMNMTXBridge
0 likes · 8 min read
Enabling Unseen Language QA Without Training LLMs: XBridge’s Plug‑in Multilingual Extension
AI Frontier Lectures
AI Frontier Lectures
Mar 5, 2026 · Artificial Intelligence

Can Robots Navigate Unseen Spaces with Only Language? EvoNav’s Zero‑Shot Vision‑Language Breakthrough

The EvoNav framework from Nanjing University of Science and Technology tackles the last‑hundred‑meter challenge of embodied navigation by integrating a Future Chain‑of‑Thought and a Historical Experience chain, achieving significant zero‑shot performance gains on VLN‑CE benchmarks and real‑world robot tests, with code released on GitHub.

Embodied AIEvoNavFuture Chain of Thought
0 likes · 6 min read
Can Robots Navigate Unseen Spaces with Only Language? EvoNav’s Zero‑Shot Vision‑Language Breakthrough
PaperAgent
PaperAgent
Jan 30, 2026 · Artificial Intelligence

How LLM‑in‑Sandbox Turns Large Models into General‑Purpose Agents Without Extra Training

The LLM‑in‑Sandbox framework places large language models inside a virtual machine that provides external tool access, persistent storage, and code execution, yielding up to a 24.2% performance boost across six benchmark tasks without additional training, and it scales from zero‑shot to reinforcement‑learning‑enhanced agents while remaining cost‑effective.

Agentic AILLMReinforcement Learning
0 likes · 6 min read
How LLM‑in‑Sandbox Turns Large Models into General‑Purpose Agents Without Extra Training
Frontend AI Walk
Frontend AI Walk
Dec 5, 2025 · Artificial Intelligence

Master Prompt Engineering: From Random Chat to Precise Control with Zero-shot, Few-shot, and Chain‑of‑Thought

This article explains how to converse effectively with large language models by mastering three core prompting techniques—Zero‑shot, Few‑shot, and Chain‑of‑Thought—illustrated with front‑end analogies, code snippets, and a step‑by‑step DeepSeek JSON‑generation exercise that shows common pitfalls and best practices.

Chain-of-ThoughtDeepSeekFew-Shot
0 likes · 12 min read
Master Prompt Engineering: From Random Chat to Precise Control with Zero-shot, Few-shot, and Chain‑of‑Thought
Amap Tech
Amap Tech
Jul 24, 2025 · Artificial Intelligence

FingER: Fine-Grained Evaluation and Reasoning for AI-Generated Videos

The paper introduces FingER, an entity-level evaluation framework and the FingER-Instruct-60k dataset for assessing AI-generated video quality with fine-grained reasoning, and demonstrates state-of-the-art zero-shot performance on multiple benchmarks using novel training strategies.

AI-generated videoDatasetfine-grained evaluation
0 likes · 9 min read
FingER: Fine-Grained Evaluation and Reasoning for AI-Generated Videos
Bilibili Tech
Bilibili Tech
Jul 11, 2025 · Artificial Intelligence

IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS

IndexTTS2 introduces a novel auto-regressive zero-shot text-to-speech model that achieves precise duration control and fine-grained emotional expression through a universal time‑encoding mechanism, decoupled voice‑style and emotion modeling, and a GPT‑style latent feature, outperforming state‑of‑the‑art baselines across multiple benchmarks.

duration controlemotional synthesisspeech generation
0 likes · 23 min read
IndexTTS2: Emotionally Expressive, Duration-Controlled Zero-Shot TTS
Amap Tech
Amap Tech
Jul 9, 2025 · Artificial Intelligence

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

This article introduces VMBench, the first perception‑aligned video motion generation benchmark that defines a five‑dimensional metric suite and a meta‑guided prompt generation pipeline, and presents LD‑RPS, a zero‑shot unified image restoration framework based on latent diffusion recurrent posterior sampling, together with extensive experiments validating both systems.

Diffusion ModelsImage RestorationVideo Generation
0 likes · 14 min read
VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration
AI Algorithm Path
AI Algorithm Path
Jul 1, 2025 · Artificial Intelligence

Beginner’s Guide to CLIP Inference: Step‑by‑Step with Hugging Face

This tutorial walks through loading the openai/clip‑vit‑base‑patch32 model with Hugging Face, preprocessing images and text, encoding them into a shared embedding space, computing cosine similarity for zero‑shot image‑text matching, and visualizing the results, all with concrete code examples.

CLIPCosine SimilarityHugging Face
0 likes · 6 min read
Beginner’s Guide to CLIP Inference: Step‑by‑Step with Hugging Face
AIWalker
AIWalker
Mar 13, 2025 · Artificial Intelligence

YOLOE: Real‑Time Open‑World Object Detection and Segmentation Unveiled

The paper introduces YOLOE, a new YOLO‑based model that supports text, visual, and no‑prompt open‑world detection and segmentation, detailing its lightweight RepRTA, SAVPE, and LRPC modules and showing benchmark gains in speed and zero‑shot performance on LVIS and COCO.

Computer VisionYOLOEbenchmark
0 likes · 9 min read
YOLOE: Real‑Time Open‑World Object Detection and Segmentation Unveiled
AIWalker
AIWalker
Feb 11, 2025 · Artificial Intelligence

LLMDet: LLM‑Powered Open‑Vocabulary Detector Beats Grounding DINO

LLMDet introduces a novel training pipeline that leverages large language models to generate detailed image‑level captions and region‑level phrases, fine‑tunes an open‑vocabulary detector with the GroundingCap‑1M dataset, and achieves state‑of‑the‑art zero‑shot performance surpassing Grounding DINO across multiple benchmarks.

GroundingCapLLMDetLarge Language Models
0 likes · 20 min read
LLMDet: LLM‑Powered Open‑Vocabulary Detector Beats Grounding DINO
AIWalker
AIWalker
Jan 15, 2025 · Artificial Intelligence

Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation

Magic Mirror introduces a single‑stage, zero‑shot framework that fuses dual facial embeddings with a conditional adaptive normalization module inside a Video Diffusion Transformer, achieving superior identity consistency, natural dynamics, and high visual quality compared with existing video generation methods.

Diffusion TransformerVideo Generationconditional adaptive normalization
0 likes · 16 min read
Magic Mirror: Zero‑Shot Identity‑Preserved High‑Quality Personalized Video Generation
JD Tech
JD Tech
Nov 12, 2024 · Artificial Intelligence

Prompt Engineering: Concepts, Evolution, Techniques, and JD Logistics Application

This article explains what Prompt Engineering is, traces its development from early NLP commands to modern adaptive and multimodal prompting techniques, describes various prompting strategies such as Zero‑shot, Few‑shot, Chain‑of‑Thought, Auto‑CoT, and showcases a JD Logistics case study using these methods to classify product types with code examples.

AI Prompt DesignChain-of-ThoughtFew-Shot
0 likes · 27 min read
Prompt Engineering: Concepts, Evolution, Techniques, and JD Logistics Application
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Feb 27, 2024 · Artificial Intelligence

InstantID: Zero-shot Identity-Preserving Generation in Seconds

InstantID, an open‑source tool released by Xiaohongshu in early 2024, generates multiple stylized portraits that preserve a person’s facial identity from a single reference photo in seconds, eliminating fine‑tuning, large storage needs, and multi‑image requirements while seamlessly working with popular diffusion models like Stable Diffusion 1.5 and SDXL.

Image GenerationInstantIDai
0 likes · 6 min read
InstantID: Zero-shot Identity-Preserving Generation in Seconds
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 30, 2023 · Artificial Intelligence

ChatGPT Technical Analysis Series – Part 2: GPT1, GPT2, and GPT3 (Encoder vs Decoder, Zero‑Shot, and Scaling)

This article reviews the evolution of the GPT family from GPT‑1 to GPT‑3, comparing encoder‑decoder architectures, explaining the shift from supervised fine‑tuning to zero‑shot and few‑shot learning, and highlighting the architectural and training innovations that enabled large‑scale language models.

Fine-tuningGPTLLM
0 likes · 13 min read
ChatGPT Technical Analysis Series – Part 2: GPT1, GPT2, and GPT3 (Encoder vs Decoder, Zero‑Shot, and Scaling)
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 19, 2023 · Artificial Intelligence

Mastering Prompt Engineering: Techniques, Tips, and Real-World Examples

This comprehensive guide explores prompt engineering for large language models, covering its background, fundamental concepts, prompt formats, construction principles, advanced techniques like few‑shot, zero‑shot, and chain‑of‑thought prompting, as well as practical examples, evaluation metrics, and future directions.

Artificial IntelligenceChain-of-ThoughtFew-Shot
0 likes · 33 min read
Mastering Prompt Engineering: Techniques, Tips, and Real-World Examples
DataFunTalk
DataFunTalk
Jun 21, 2023 · Artificial Intelligence

Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications

This article presents a low‑resource NLP pretraining approach that combines transformer‑based language modeling with contrastive vector learning, details the unsupervised sample‑pair construction, introduces a camel‑shaped masking distribution, and demonstrates through extensive experiments that the resulting model achieves strong zero‑shot NLU, NLG, and retrieval performance while requiring minimal compute and data.

Language ModelingLow-Resourcecontrastive learning
0 likes · 10 min read
Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications
ByteFE
ByteFE
Jun 15, 2023 · Artificial Intelligence

Effective Prompt Engineering: Techniques, Prompt Injection Prevention, Hallucination Mitigation, and Advanced Prompting Strategies

This article explains how to craft efficient prompts by combining clear instructions and questions, discusses prompt injection risks and mitigation with delimiters, addresses hallucinations, and introduces zero‑shot, few‑shot, and chain‑of‑thought prompting techniques for large language models.

Chain-of-ThoughtFew-ShotLLM
0 likes · 16 min read
Effective Prompt Engineering: Techniques, Prompt Injection Prevention, Hallucination Mitigation, and Advanced Prompting Strategies
Python Programming Learning Circle
Python Programming Learning Circle
Jun 8, 2022 · Artificial Intelligence

Leveraging PaddleNLP UIE for Zero‑Shot Logistic Parcel Information Extraction

This article explains how PaddleNLP's Universal Information Extraction (UIE) model can dramatically reduce labeling effort and improve accuracy for logistics parcel data extraction, showcasing a five‑sample experiment that boosts F1 by 18 points to 93% and providing a zero‑shot Python example.

Information ExtractionLogisticsNLP
0 likes · 5 min read
Leveraging PaddleNLP UIE for Zero‑Shot Logistic Parcel Information Extraction
DaTaobao Tech
DaTaobao Tech
May 24, 2022 · Artificial Intelligence

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

GEN‑VLKT introduces a Guided‑Embedding Network with position‑ and instance‑guided embeddings to remove costly post‑processing and leverages CLIP‑based visual‑linguistic knowledge transfer for interaction understanding, achieving state‑of‑the‑art HOI detection performance and zero‑shot capability, now deployed in Alibaba’s Taobao services.

CLIPHOI detectionTransformer
0 likes · 7 min read
GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
DataFunTalk
DataFunTalk
Jan 16, 2022 · Artificial Intelligence

DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation and Zero‑Shot Transfer

DeltaLM is a new multilingual pretrained encoder‑decoder model that leverages a pretrained encoder and a novel decoder to improve multilingual neural machine translation, offering efficient training, strong cross‑language transfer, zero‑shot translation, and superior performance on various translation and summarization tasks.

DeltaLMNMTmachine translation
0 likes · 13 min read
DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation and Zero‑Shot Transfer
DataFunSummit
DataFunSummit
Jan 13, 2022 · Artificial Intelligence

DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation

DeltaLM is a multilingual pretrained encoder‑decoder model that leverages cross‑lingual transfer from a pretrained encoder and novel decoder architecture, employs span‑corruption and translation‑pair pretraining tasks, and uses a two‑stage fine‑tuning strategy to achieve strong zero‑shot and supervised translation performance across over 100 languages.

Cross-Lingual TransferDeltaLMNeural Machine Translation
0 likes · 12 min read
DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation
DataFunTalk
DataFunTalk
Apr 7, 2021 · Artificial Intelligence

Alibaba's Advances in Multilingual Neural Machine Translation: Research and Practice

This article presents Alibaba's comprehensive research on multilingual neural machine translation, covering motivations, model architectures, intermediate language modules, data‑augmentation strategies such as repair translation, integration of pre‑trained models with adapters, and engineering optimizations that enable a production‑ready system supporting over 200 languages.

AdapterAlibabaNeural Machine Translation
0 likes · 21 min read
Alibaba's Advances in Multilingual Neural Machine Translation: Research and Practice