Tagged articles

retrieval

98 articles · Page 1 of 1

Jul 4, 2026 · Artificial Intelligence

How Pinecone Nexus Turns Vector Search into an Agent Knowledge Engine

The article analyzes the shift to agent‑centric AI, explains why traditional retrieval creates a costly "Ten blue links" loop, and details how Pinecone Nexus’s context compiler and composable retriever, together with the KnowQL language, provide structured, governed knowledge that boosts task completion rates, cuts latency, and reduces token usage by up to 90%.

AI AgentsKnowQLPinecone

0 likes · 14 min read

How Pinecone Nexus Turns Vector Search into an Agent Knowledge Engine

Wu Shixiong's Large Model Academy

Jun 23, 2026 · Artificial Intelligence

When RAG Returns Junk, Why a LLM Can’t Fix It – Building an Agentic RAG

The article examines why traditional single‑step Retrieval‑Augmented Generation fails when retrieved passages are irrelevant, outlines the three fundamental flaws of that pipeline, and presents the Agentic RAG paradigm—turning retrieval into a reusable tool with planning, reflection, and decision loops, illustrated with code, interview scenarios, and practical deployment tips.

AIAgentic RAGKnowledge Base

0 likes · 32 min read

When RAG Returns Junk, Why a LLM Can’t Fix It – Building an Agentic RAG

Machine Heart

Jun 11, 2026 · Artificial Intelligence

Can Agents Search Without a Vector Database? A Simple Grep Is Enough

The paper introduces Direct Corpus Interaction (DCI), letting LLM agents bypass vector indexes and use command‑line tools like grep to directly search raw text, achieving higher accuracy and lower cost on complex multi‑hop QA and retrieval benchmarks.

Agentic SearchDirect Corpus Interactionbenchmark

0 likes · 12 min read

Can Agents Search Without a Vector Database? A Simple Grep Is Enough

AgentGuide

Jun 8, 2026 · Artificial Intelligence

Agentic RAG vs Regular RAG: Key Differences, Trade‑offs, and Interview‑Ready Answer

This article explains what Agentic RAG is, contrasts it with ordinary RAG by detailing its dynamic decision‑making, multi‑step retrieval loop, higher cost and latency, and suitable scenarios, and outlines two implementation patterns—single‑agent and multi‑agent—plus a concise interview response.

AI AgentsAgentic RAGLLM

0 likes · 5 min read

Agentic RAG vs Regular RAG: Key Differences, Trade‑offs, and Interview‑Ready Answer

ITPUB

Jun 2, 2026 · Artificial Intelligence

Why Memory Architecture Remains Elusive: An In‑Depth Analysis of Agent Memory Systems

The article argues that memory for AI agents is not mere storage but a closed‑loop system comprising a raw ledger, derived views, and a policy layer, and examines how non‑parametric memory, time‑aware structures, and system‑2 control affect scalability, reliability, and performance.

Agentmemorynon‑parametric

0 likes · 45 min read

Why Memory Architecture Remains Elusive: An In‑Depth Analysis of Agent Memory Systems

AI Engineering

Jun 2, 2026 · Artificial Intelligence

Why Your Enterprise AI Looks Impressive Yet Produces Garbage Results

Even with the world’s best large language models, chaotic internal notes, calls, and processes turn enterprise AI output into junk; a five‑layer architecture—capture, retrieval, source‑truth, permission, and feedback—plus a six‑question test can turn a noisy "company brain" into a useful tool, as shown by Single Grain’s dramatic time‑saving results.

AI ArchitectureAutomationEnterprise AI

0 likes · 7 min read

Why Your Enterprise AI Looks Impressive Yet Produces Garbage Results

Woodpecker Software Testing

Jun 1, 2026 · Artificial Intelligence

2026 RAG Testing Trends: From ‘Can Run’ to Trustworthy, Controllable, and Testable AI

In 2026, Retrieval‑Augmented Generation (RAG) has become a core reasoning paradigm for high‑compliance domains, prompting a shift from simple output correctness to multi‑stage falsifiable testing, dynamic adversarial knowledge graphs, LLM‑as‑Tester automation, and audit‑ready compliance reporting.

AI testingLLM-as-TesterRAG

0 likes · 8 min read

2026 RAG Testing Trends: From ‘Can Run’ to Trustworthy, Controllable, and Testable AI

AI Engineer Programming

May 29, 2026 · Artificial Intelligence

How to Build a Reliable RAG Test Dataset

The article explains why a structured test set is essential for Retrieval‑Augmented Generation systems, outlines failure modes, describes layered evaluation of retrieval and generation, details infrastructure like chunk IDs and manifests, and provides a complete annotation pipeline with cold‑start and adversarial strategies.

EvaluationLLMRAG

0 likes · 24 min read

How to Build a Reliable RAG Test Dataset

AI Engineer Programming

May 27, 2026 · Artificial Intelligence

MMR for RAG: Low-Cost Chunk Limits Balance Relevance and Diversity

When a long document is split into many highly similar chunks, vector‑based top‑k retrieval tends to return multiple pieces from the same source, causing document dominance; applying a per‑document chunk limit together with Maximal Marginal Relevance (MMR) re‑ranking introduces diversity while preserving relevance, offering a low‑cost way to improve RAG answer quality.

ChunkingDPPDiversity

0 likes · 17 min read

MMR for RAG: Low-Cost Chunk Limits Balance Relevance and Diversity

PaperAgent

May 26, 2026 · Artificial Intelligence

Why External Retrieval in RAG Is Redundant: Insights from NVIDIA’s INTRA Paper

The INTRA paper shows that using a decoder’s cross‑attention as an internal retrieval mechanism eliminates the need for a separate retriever, achieving state‑of‑the‑art multihop QA performance with only 164 K trainable parameters and shared pre‑encoded representations.

INTRARAGattention

0 likes · 8 min read

Why External Retrieval in RAG Is Redundant: Insights from NVIDIA’s INTRA Paper

James' Growth Diary

May 23, 2026 · Artificial Intelligence

Choosing the Right Retrieval Strategy: Full‑Text vs Vector vs Graph Search

This article breaks down the underlying logic, ideal scenarios, benchmark data, decision trees, and real‑world case studies for full‑text (BM25), vector, and graph retrieval, showing why hybrid approaches dominate production while each technique has distinct strengths and trade‑offs.

Full-Text SearchHybrid SearchRAG

0 likes · 25 min read

Choosing the Right Retrieval Strategy: Full‑Text vs Vector vs Graph Search

AI Large-Model Wave and Transformation Guide

May 22, 2026 · Artificial Intelligence

Can Agentic Search Replace Traditional RAG? A Deep Dive into Their Differences

The article explains agentic search as an LLM‑driven, multi‑step retrieval process, contrasts it with traditional RAG pipelines, provides concrete examples, discusses when each approach is appropriate, and argues that agentic search will augment rather than fully replace RAG.

AIAgentic SearchLLM

0 likes · 7 min read

Can Agentic Search Replace Traditional RAG? A Deep Dive into Their Differences

James' Growth Diary

May 20, 2026 · Artificial Intelligence

Boosting RAG Retrieval Quality with Cohere Rerank and Cross‑Encoder

After achieving high recall with hybrid Elasticsearch and vector search, the article shows how inserting a reranker—either Cohere's cloud API or a local Cross‑Encoder—compresses the top‑20 candidates to the most relevant three to five, dramatically improving answer accuracy, cutting token costs, and detailing a dual‑track implementation for production and development environments.

CohereCross-EncoderLangChain

0 likes · 22 min read

Boosting RAG Retrieval Quality with Cohere Rerank and Cross‑Encoder

AI Architecture Hub

May 19, 2026 · Artificial Intelligence

Agent Memory: From Theory to Practical Implementation

The article explains how AI agents can acquire long‑term memory by combining three functions—coherence, context, and learning—with four memory types, describes the full retrieval‑store loop, and provides a step‑by‑step Python implementation using OpenAI embeddings, ChromaDB, and forgetting strategies.

AI AgentsChromaDBMemory systems

0 likes · 17 min read

Agent Memory: From Theory to Practical Implementation

Architect

May 17, 2026 · Artificial Intelligence

Agent Skills Survey: How Process Knowledge Becomes Technical Debt

The recent arXiv survey on Agent Skills maps the full lifecycle of skills—representation, acquisition, retrieval, and evolution—and warns that unchecked growth can turn a valuable process asset into technical debt, urging teams to enforce admission quality, robust routing, versioning, testing, and retirement mechanisms.

AI EngineeringAgent SkillsProcess Assets

0 likes · 26 min read

Agent Skills Survey: How Process Knowledge Becomes Technical Debt

ITPUB

May 13, 2026 · Databases

Is the Hype Around Vector Databases a Pseudo‑Demand in the AI Era?

The article questions whether dedicated vector databases are truly needed for AI applications, examining market hype, the rapid emergence of many vector‑DB products, real‑world examples like PostgreSQL pgvector and major vendor integrations, and the hidden costs of data fragmentation and operational complexity.

AIPostgreSQLRAG

0 likes · 15 min read

Is the Hype Around Vector Databases a Pseudo‑Demand in the AI Era?

Wuming AI

May 10, 2026 · Artificial Intelligence

Can Large Models Really Understand 1 M Tokens? Lessons from the RULER Benchmark

The article examines why a model’s advertised context window (e.g., 128 K or 1 M tokens) does not guarantee effective long‑context reasoning, summarizing the RULER framework that breaks long‑context ability into retrieval, interference resistance, multi‑hop tracking, aggregation, and multi‑answer recall, and offering practical guidance for evaluating and using such models.

AggregationEvaluationLLM

0 likes · 16 min read

Can Large Models Really Understand 1 M Tokens? Lessons from the RULER Benchmark

Data Party THU

May 10, 2026 · Artificial Intelligence

From Theory to Production: Mastering the Full Memory Pipeline of Modern AI Agents

The article explains why stateless LLM calls require a structured memory system for AI agents, describes four memory types, a five‑stage pipeline, design patterns, common pitfalls, and provides a detailed production architecture with performance numbers and code examples.

AI AgentsKnowledge GraphLLM

0 likes · 23 min read

From Theory to Production: Mastering the Full Memory Pipeline of Modern AI Agents

inShocking

May 7, 2026 · Artificial Intelligence

What to Store and When to Skip: Lessons from Claude Code’s Memory Mechanism

The article dissects Claude Code’s memory system, showing that the real challenge is deciding what information to keep and when to discard, and it details design principles, index‑content separation, LLM‑based retrieval, expiration handling, write‑path isolation, and practical improvements applied to the author’s own agent platform.

Claude CodeLLMMemory Management

0 likes · 16 min read

What to Store and When to Skip: Lessons from Claude Code’s Memory Mechanism

dbaplus Community

May 5, 2026 · Artificial Intelligence

The True Nature of Agent Memory: Deep Dive into Architecture and Design

The article analyses why a real agent must have memory, defining memory as an external state that feeds decision‑making, proposing a three‑part architecture (Raw Ledger, Views, Policy), contrasting parametric and non‑parametric approaches, and detailing bottlenecks, temporal handling, and procedural extensions.

Agent MemoryMemory Architecturenon‑parametric memory

0 likes · 46 min read

The True Nature of Agent Memory: Deep Dive into Architecture and Design

Linyb Geek Road

May 5, 2026 · Artificial Intelligence

How to Fully Evaluate a RAG System – Metrics for Retrieval and Generation Stages

The article explains why RAG systems require stage‑wise evaluation, detailing retrieval metrics such as Precision, Recall, F1, MRR, NDCG and Context Relevance, and generation metrics like Faithfulness, Answer Relevance and Completeness, while discussing LLM‑as‑Judge automation and a three‑layer assessment framework.

EvaluationLLM-as-JudgeRAG

0 likes · 14 min read

How to Fully Evaluate a RAG System – Metrics for Retrieval and Generation Stages

AI Engineer Programming

May 2, 2026 · Artificial Intelligence

From Demo to Production: How to Evaluate RAG Effectively

This guide outlines a comprehensive RAG evaluation framework covering failure modes, multi‑layer metrics, test‑set construction, open‑source tools, CI/CD quality gates, production monitoring, and special considerations for agentic RAG to ensure reliable, trustworthy retrieval‑augmented generation systems.

AIEvaluationLLM

0 likes · 18 min read

From Demo to Production: How to Evaluate RAG Effectively

AI Engineer Programming

May 1, 2026 · Artificial Intelligence

From Naive Retrieval to Knowledge Runtime: The Full Evolution of RAG

The article traces the evolution of Retrieval‑Augmented Generation from its 2020 Naive baseline through Advanced, Modular, Graph, and Agentic generations, detailing architectural shifts, optimization techniques, self‑correction mechanisms, and future challenges such as long‑context handling and multimodal retrieval.

LLMRAGagentic

0 likes · 14 min read

From Naive Retrieval to Knowledge Runtime: The Full Evolution of RAG

MaGe Linux Operations

Apr 22, 2026 · Artificial Intelligence

5 Essential Design Principles for Building High‑Quality RAG Systems

This article outlines five critical design principles for constructing high‑quality Retrieval‑Augmented Generation (RAG) systems, covering document chunking strategies, embedding model selection, hybrid retrieval architectures, metadata filtering with multi‑level indexes, and reranking mechanisms, and provides concrete code snippets and evaluation metrics.

EmbeddingEvaluationHybrid Retrieval

0 likes · 17 min read

5 Essential Design Principles for Building High‑Quality RAG Systems

Wu Shixiong's Large Model Academy

Apr 22, 2026 · Artificial Intelligence

How to Classify and Manage Agent Memories for Better Retrieval

This article dissects Claude Code's memory system, explains why unstructured memory degrades performance, introduces four distinct memory types with concrete examples and schema, shows how to handle expiration and retrieval strategies, and provides step‑by‑step implementation code to improve agent reliability.

Agent MemoryLLMMemory Management

0 likes · 19 min read

How to Classify and Manage Agent Memories for Better Retrieval

dbaplus Community

Apr 12, 2026 · Artificial Intelligence

Boost RAG Accuracy to 94%: 11 Proven Strategies and How to Combine Them

After struggling with naive RAG that delivered only 60% accuracy, the author outlines eleven advanced strategies—including context-aware chunking, query expansion, re‑ranking, multi‑query, knowledge graphs, and agent‑based retrieval—that together raise performance to 94%, and provides detailed implementation examples, trade‑offs, and a step‑by‑step deployment roadmap.

AIEmbeddingKnowledge Graph

0 likes · 32 min read

Boost RAG Accuracy to 94%: 11 Proven Strategies and How to Combine Them

AI Code to Success

Apr 3, 2026 · Artificial Intelligence

Can Your AI Agent Earn a College Degree? Exploring Clawvard’s Evaluation Platform

The author explores Clawvard, an AI‑agent assessment platform that tests agents across eight dimensions, shares personal test results showing an initial A‑ rating with a critical retrieval weakness, details the customized improvement rules applied, and demonstrates a subsequent A+ rating, while also discussing the platform’s limits and practical use cases.

AI AgentEvaluationPrompt Engineering

0 likes · 8 min read

Can Your AI Agent Earn a College Degree? Exploring Clawvard’s Evaluation Platform

AgentGuide

Apr 3, 2026 · Artificial Intelligence

How to Evaluate RAG Systems: Key Metrics and the Ragas Framework

The article explains how to assess Retrieval-Augmented Generation (RAG) projects using the Ragas automated evaluation framework, detailing four key dimensions—recall quality, answer faithfulness, answer relevance, and context utilization—and describes the underlying metrics for both retrieval and generation stages.

EvaluationLLMMetrics

0 likes · 5 min read

How to Evaluate RAG Systems: Key Metrics and the Ragas Framework

AI Step-by-Step

Mar 29, 2026 · Artificial Intelligence

Engineering Long-Term Memory for Agents: Practical Architecture and Best Practices

The article explains how to engineer persistent, cross‑session memory for AI agents by persisting key user facts, task states, and decisions in a multi‑layer storage architecture, detailing retrieval before each request and update after each interaction.

AI EngineeringState Managementagent architecture

0 likes · 11 min read

Engineering Long-Term Memory for Agents: Practical Architecture and Best Practices

Architecture and Beyond

Mar 29, 2026 · Artificial Intelligence

Designing Efficient Memory for Claude Code: Typed Storage, Indexed Management, Triggered Retrieval, and Pre‑Use Validation

This article analyzes Claude Code's memory system, explaining how typed storage separates user, feedback, project, and reference data, how an indexed MEMORY.md file keeps the index lightweight, how triggered retrieval balances relevance, freshness, and reliability, and why pre‑use validation prevents stale or incorrect facts from contaminating model responses.

AI memoryClaudePrompt Engineering

0 likes · 17 min read

Designing Efficient Memory for Claude Code: Typed Storage, Indexed Management, Triggered Retrieval, and Pre‑Use Validation

Java One

Mar 28, 2026 · Artificial Intelligence

Building a Vector‑Free RAG System with Hierarchical Page Indexing

This guide explains how to create a retrieval‑augmented generation (RAG) system that avoids embeddings by converting documents into a hierarchical tree, using an LLM to navigate, summarize, and retrieve answers, complete with a full Python implementation and a GitHub repository.

LLMPythonRAG

0 likes · 15 min read

Building a Vector‑Free RAG System with Hierarchical Page Indexing

AgentGuide

Mar 25, 2026 · Artificial Intelligence

What Is Retrieval‑Augmented Generation (RAG) and Why Must Large Models Look Up Information First?

Retrieval‑Augmented Generation (RAG) lets large language models first fetch relevant documents and then generate answers, addressing the inability of models to answer private or domain‑specific queries by precisely feeding them the most pertinent knowledge.

EmbeddingRAGlarge language models

0 likes · 5 min read

What Is Retrieval‑Augmented Generation (RAG) and Why Must Large Models Look Up Information First?

SuanNi

Mar 21, 2026 · Artificial Intelligence

Mastering Context Engineering: Six Pillars, Retrieval Strategies, and Structured Output

This article explains the six pillars of context engineering, focusing on structuring techniques, advanced retrieval methods, hybrid search, reranking, query transformation, and practical pipelines that turn raw data into reliable, LLM‑ready inputs for higher quality AI responses.

Hybrid SearchLLMRAG

0 likes · 14 min read

Mastering Context Engineering: Six Pillars, Retrieval Strategies, and Structured Output

Wu Shixiong's Large Model Academy

Mar 17, 2026 · Artificial Intelligence

Mastering Chunk Splitting for RAG: From Fixed Length to Semantic Segmentation

Chunk splitting, a critical yet often overlooked step in RAG pipelines, dramatically impacts retrieval recall and LLM output quality; this guide walks through three evolution stages—from naive fixed‑length splits to sentence‑aware overlaps and finally semantic, structure‑driven segmentation—complete with code, experiments, and practical pitfalls.

ChunkingLLMRAG

0 likes · 15 min read

Mastering Chunk Splitting for RAG: From Fixed Length to Semantic Segmentation

Wu Shixiong's Large Model Academy

Mar 16, 2026 · Artificial Intelligence

Designing a Complete RAG System from Zero: A Step‑by‑Step Interview Guide

This article outlines a full‑stack RAG architecture—offline parsing, query understanding, online retrieval, and context generation—explains six critical module interactions, and provides a concise interview framework for presenting the design from start to finish.

LLMRAGinterview preparation

0 likes · 14 min read

Designing a Complete RAG System from Zero: A Step‑by‑Step Interview Guide

PaperAgent

Mar 10, 2026 · Artificial Intelligence

How MemSifter Delivers High‑Precision, Low‑Cost Long‑Term Memory for LLMs

MemSifter introduces a lightweight agent that outsources memory retrieval for large language models, using a Think‑and‑Rank pipeline and a task‑result‑oriented reinforcement‑learning training paradigm to achieve superior retrieval accuracy and efficiency across eight benchmark tasks while keeping inference overhead minimal.

AgentEfficiencyLLM

0 likes · 13 min read

How MemSifter Delivers High‑Precision, Low‑Cost Long‑Term Memory for LLMs

Data Party THU

Mar 8, 2026 · Artificial Intelligence

6 Practical Context‑Engineering Techniques to Tame RAG Hallucinations

This article explains why retrieval‑augmented generation (RAG) models often hallucinate, introduces the concept of context engineering, and details six practical techniques—including selective retrieval, context compression, hierarchical layout, dynamic query rewriting, memory management, and tool‑aware context—along with their trade‑offs and real‑world impact.

AILLMRAG

0 likes · 23 min read

6 Practical Context‑Engineering Techniques to Tame RAG Hallucinations

AI Tech Publishing

Mar 4, 2026 · Artificial Intelligence

AI Agent Context Management: Comparing Six Major Companies' Approaches

The article analyzes how six leading AI‑agent providers—Manus, Cursor, Anthropic, OpenAI, Google, and LangChain—tackle the fundamental problem of when and how a large language model should see information, detailing each solution, a cross‑company comparison matrix, consensus points, controversies, and open research questions.

AI AgentsContext ManagementLLM

0 likes · 19 min read

AI Agent Context Management: Comparing Six Major Companies' Approaches

DataFunSummit

Feb 24, 2026 · Artificial Intelligence

How Large Language Models Are Redefining Search Ranking at Tencent

This article details Tencent Search's exploration of large‑model‑driven ranking, covering the evolution from traditional keyword retrieval to RAG‑based AI search, the multi‑stage AI ranking architecture (L0‑L5), model training pipelines, distillation, synthetic data generation, and future research directions.

LLMRAGranking architecture

0 likes · 21 min read

How Large Language Models Are Redefining Search Ranking at Tencent

PaperAgent

Feb 20, 2026 · Artificial Intelligence

Why Graph-Based Memory Is the Next Frontier for AI Agents

This article surveys recent advances in graph‑structured agent memory, presenting a taxonomy, lifecycle stages from extraction to evolution, open‑source tools, and benchmark suites that together illustrate how graph memory can overcome knowledge truncation, tool incompetence, and performance saturation in LLM‑driven AI agents.

AI Agentsevolutiongraph memory

0 likes · 8 min read

Why Graph-Based Memory Is the Next Frontier for AI Agents

DaTaobao Tech

Feb 9, 2026 · Artificial Intelligence

Boosting Trustworthiness in Retrieval‑Augmented Generation: The Trustworthy Generation Design Pattern

This article presents the Trustworthy Generation design pattern for Retrieval‑Augmented Generation (RAG) systems, analyzes four root causes of low trustworthiness—retrieval errors, content reliability, pre‑retrieval reasoning mistakes, and model hallucinations—and proposes layered solutions, citation techniques, CRAG and Self‑RAG architectures, guardrails, and practical trade‑offs.

AI safetyLLMRAG

0 likes · 16 min read

Boosting Trustworthiness in Retrieval‑Augmented Generation: The Trustworthy Generation Design Pattern

Architecture and Beyond

Feb 8, 2026 · Artificial Intelligence

Designing Scalable Long-Term Memory for AI Agents: Capture, Compress, Retrieve

This article explains how to build a controllable, editable, and cost‑effective long‑term memory system for AI agents by categorizing memory types, structuring a three‑stage pipeline of capture, AI‑driven compression, and smart retrieval, and choosing appropriate storage back‑ends such as files, knowledge bases, or databases.

Agent DesignKnowledge Baseartificial-intelligence

0 likes · 18 min read

Designing Scalable Long-Term Memory for AI Agents: Capture, Compress, Retrieve

PaperAgent

Feb 4, 2026 · Artificial Intelligence

How Agent KB Enables Cross‑Framework Knowledge Sharing for Smarter AI Agents

The article presents Agent KB, a universal memory infrastructure that lets heterogeneous AI agents share experiences through a Reason‑Retrieve‑Refine pipeline and a teacher‑student dual‑agent architecture, showing significant performance gains across benchmarks like GAIA, SWE‑bench, and various LLM families.

AI AgentsKnowledge Basecross‑framework

0 likes · 10 min read

How Agent KB Enables Cross‑Framework Knowledge Sharing for Smarter AI Agents

Architecture and Beyond

Feb 1, 2026 · Artificial Intelligence

5 High‑ROI Strategies to Supercharge RAG Retrieval Performance

This article outlines five practical engineering strategies—multi‑vector retrieval, manual splitting and labeling, scalar enhancement, context augmentation, and dense‑sparse vector integration—that together address common RAG retrieval bottlenecks and dramatically improve recall stability and answer quality.

BM25LLMRAG

0 likes · 17 min read

5 High‑ROI Strategies to Supercharge RAG Retrieval Performance

Wu Shixiong's Large Model Academy

Nov 12, 2025 · Artificial Intelligence

Agent Memory Modules Explained: Short‑Term vs Long‑Term Strategies for LLM Agents

This article breaks down the memory systems behind LLM‑based agents, explaining why persistent memory is needed, the differences between short‑term context buffers and long‑term vector stores, practical implementation choices, maintenance strategies, and how to articulate these concepts effectively in technical interviews.

AgentLLMretrieval

0 likes · 14 min read

Agent Memory Modules Explained: Short‑Term vs Long‑Term Strategies for LLM Agents

Data Party THU

Nov 9, 2025 · Artificial Intelligence

Mastering Chunking Strategies for Effective RAG: Fixed, Recursive, Semantic, Structured, and Delayed

This article walks through the core RAG pipeline, explains why chunking is the linchpin of retrieval quality, and provides detailed definitions, trade‑offs, and implementation examples for five chunking techniques—fixed, recursive, semantic, structure‑aware, and delayed—so you can choose the right approach for any document‑heavy AI application.

AIChunkingLLM

0 likes · 10 min read

Mastering Chunking Strategies for Effective RAG: Fixed, Recursive, Semantic, Structured, and Delayed

Wu Shixiong's Large Model Academy

Nov 1, 2025 · Artificial Intelligence

Turn a Basic RAG Demo into a High‑Impact Interview Project

This guide shows how to evolve a simple Retrieval‑Augmented Generation prototype into a production‑grade system by strengthening data ingestion, optimizing retrieval with hybrid and reranking techniques, adding query rewriting, long‑context handling, reinforcement learning, and multimodal support, so candidates can demonstrate real engineering depth in interviews.

AILLMRAG

0 likes · 7 min read

Turn a Basic RAG Demo into a High‑Impact Interview Project

DeWu Technology

Oct 29, 2025 · Artificial Intelligence

Why Chunking Can Make or Break Your RAG System – Practical Strategies & Code

This article explains how proper document chunking—choosing the right chunk size, overlap, and structure‑aware boundaries—directly impacts the relevance, factuality, and efficiency of Retrieval‑Augmented Generation pipelines, and provides multiple Python implementations ranging from simple fixed‑length splits to semantic and hybrid approaches.

ChunkingEmbeddingLLM

0 likes · 29 min read

Why Chunking Can Make or Break Your RAG System – Practical Strategies & Code

Amap Tech

Oct 17, 2025 · Artificial Intelligence

How Ranking Improves In-Context Example Retrieval: Insights from NeurIPS ’25

This article explains the limitations of current pointwise in‑context learning methods, introduces a novel ranking‑based approach called SeDPO that learns preference orders among examples, and demonstrates its superior performance across multiple NLP tasks through extensive experiments and ablation studies.

In-Context LearningNeurIPSRanking

0 likes · 10 min read

How Ranking Improves In-Context Example Retrieval: Insights from NeurIPS ’25

DataFunTalk

Oct 6, 2025 · Artificial Intelligence

Mastering Context Engineering: 5 Proven Strategies to Boost AI Agent Performance

This article explores the emerging concept of context engineering for AI agents, explains why managing long‑range context is critical, and details five practical strategies—Offload, Reduce, Retrieve, Isolate, and Cache—backed by insights from leading industry teams and the "Bitter Lesson" philosophy.

AI AgentsLLM OptimizationPrompt Engineering

0 likes · 30 min read

Mastering Context Engineering: 5 Proven Strategies to Boost AI Agent Performance

Data STUDIO

Sep 28, 2025 · Artificial Intelligence

Top Reranker Models for RAG in 2025: A Comparative Review

This article explains why initial retrieval in Retrieval‑Augmented Generation often yields noisy results, describes how rerankers act as quality filters to improve relevance, compares the leading 2025 reranker models—including Cohere, bge‑reranker, Voyage, Jina, FlashRank, and MixedBread—and provides code snippets, evaluation metrics, and guidance for selecting the right model for specific use cases.

AICross-EncoderLLM

0 likes · 31 min read

Top Reranker Models for RAG in 2025: A Comparative Review

Data Thinking Notes

Sep 7, 2025 · Artificial Intelligence

Unlocking AI Agent Memory: How LLMs Use Retrieval and Planning to Stay Smart

This article explains the core architecture of AI agents powered by large language models, detailing how planning, short‑term and long‑term memory, and tool integration work together through vector databases, retrieval‑augmented generation, and summarization to enable stateful, intelligent interactions across multiple sessions.

AI AgentLLMmemory

0 likes · 10 min read

Unlocking AI Agent Memory: How LLMs Use Retrieval and Planning to Stay Smart

Amap Tech

Sep 2, 2025 · Artificial Intelligence

How Pos2Distill Eliminates Positional Bias in Large Language Models

This article introduces Pos2Distill, a novel knowledge‑distillation framework that transfers capabilities from advantageous to disadvantaged positions in large language models, effectively mitigating positional bias and improving performance on long‑text retrieval and in‑context reasoning tasks.

in-context reasoningknowledge distillationlarge language models

0 likes · 10 min read

How Pos2Distill Eliminates Positional Bias in Large Language Models

Instant Consumer Technology Team

Sep 2, 2025 · Artificial Intelligence

Why RAG Is Dead: Jeff Huber’s 5 Retrieval Secrets and Context Engineering

Jeff Huber, founder of Chroma, argues that traditional RAG is obsolete, introduces context engineering as the new paradigm, and shares five practical retrieval strategies, a complete pipeline, and insights on handling context rot, memory, and generative benchmarking to build production‑grade AI applications.

AIGenerative BenchmarkingRAG

0 likes · 11 min read

Why RAG Is Dead: Jeff Huber’s 5 Retrieval Secrets and Context Engineering

Instant Consumer Technology Team

Aug 19, 2025 · Artificial Intelligence

Mastering Document Chunking for RAG: Strategies, Code & Best Practices

This article explores why proper document chunking is crucial for Retrieval‑Augmented Generation, explains core concepts like context windows and signal‑to‑noise, compares various chunking strategies—from simple fixed‑size splits to semantic and hybrid approaches—and provides practical Python code examples to help you build more effective RAG pipelines.

LLMRAGText Splitting

0 likes · 24 min read

Mastering Document Chunking for RAG: Strategies, Code & Best Practices

Alimama Tech

Jul 9, 2025 · Artificial Intelligence

How to Make LLMs Recognize and Resolve Their Own Uncertainty

This article introduces ConfuseBench, a benchmark that classifies LLM uncertainty into document‑missing, ability‑limited, and ambiguous types, and presents methods—including retrieval, chain‑of‑thought, and clarification—to detect and actively resolve uncertainty, improving answer quality across diverse tasks.

Chain-of-ThoughtClarificationInquiry

0 likes · 17 min read

How to Make LLMs Recognize and Resolve Their Own Uncertainty

Tencent Technical Engineering

Jun 16, 2025 · Artificial Intelligence

Mastering RAG and AI Agents: Practical Tips, Code Samples, and Evaluation Strategies

This comprehensive guide walks you through the fundamentals of Retrieval‑Augmented Generation (RAG) and AI agents, explains their inner workings, shares optimization tricks, provides ready‑to‑run code snippets, and demonstrates how to evaluate performance with metrics such as recall, faithfulness, and answer relevance.

AI AgentsEvaluationLLM

0 likes · 36 min read

Mastering RAG and AI Agents: Practical Tips, Code Samples, and Evaluation Strategies

ITPUB

Jun 15, 2025 · Artificial Intelligence

How to Build a High‑Performance Enterprise RAG System with Model Context Protocol (MCP)

This article presents a step‑by‑step guide for constructing a scalable enterprise Retrieval‑Augmented Generation (RAG) solution using the Model Context Protocol (MCP), covering architecture comparison, system design, Milvus‑backed knowledge store, Python client implementation, deployment scripts, code examples, and best‑practice recommendations.

KnowledgeBaseLLMMCP

0 likes · 22 min read

How to Build a High‑Performance Enterprise RAG System with Model Context Protocol (MCP)

DataFunSummit

May 9, 2025 · Artificial Intelligence

Practical Experience Building Zhihu Direct Answer: An AI‑Powered Search Product

This article presents a comprehensive overview of Zhihu Direct Answer, describing its AI‑driven search architecture, RAG framework, query understanding, retrieval, chunking, reranking, generation, evaluation mechanisms, engineering optimizations, and the professional edition, while sharing concrete performance‑boosting practices and future development plans.

AIEvaluationProduct Development

0 likes · 14 min read

Practical Experience Building Zhihu Direct Answer: An AI‑Powered Search Product

AsiaInfo Technology: New Tech Exploration

Apr 25, 2025 · Artificial Intelligence

How Evidence Generation Boosts Document-Grounded Dialogue with LLMs

This study introduces DGDE, a document‑grounded dialogue framework that leverages large language model‑generated evidence, combining retrieval, reranking, fine‑tuning, and iterative question correction to markedly improve accuracy, comprehensiveness, coherence, and completeness on the Doc2dial benchmark.

document-grounded dialogueevidence generationfine-tuning

0 likes · 21 min read

How Evidence Generation Boosts Document-Grounded Dialogue with LLMs

Fun with Large Models

Apr 25, 2025 · Artificial Intelligence

Why Your RAG System Underperforms and How to Boost Its Effectiveness by 20%

This article analyzes common shortcomings of RAG pipelines—data preparation, retrieval, and LLM generation—and provides concrete optimization techniques such as advanced chunking, embedding model selection, retrieval parameter tuning, rerank models, and prompt engineering, promising up to a 20% performance gain.

ChunkingEmbeddingPrompt Engineering

0 likes · 17 min read

Why Your RAG System Underperforms and How to Boost Its Effectiveness by 20%

Tencent Technical Engineering

Apr 22, 2025 · Artificial Intelligence

Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB

Conan‑Embedding‑V2, a newly trained 1.4 B‑parameter LLM with a custom tokenizer, 32 k token context, SoftMask, cross‑lingual retrieval data and dynamic hard‑negative mining, delivers state‑of‑the‑art multilingual embeddings that surpass larger models on both English and Chinese MTEB benchmarks while remaining compact and fast.

EmbeddingLarge Language ModelMTEB

0 likes · 14 min read

Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB

Fun with Large Models

Apr 18, 2025 · Artificial Intelligence

How RAG Works: From Data Prep to LLM Generation Explained

This article breaks down Retrieval‑Augmented Generation (RAG) into its three core stages—data preparation, data retrieval, and LLM generation—showing how document chunking, embedding, vector databases, similarity search, and optional re‑ranking combine to let large language models produce more accurate, knowledge‑grounded answers.

EmbeddingLLMRAG

0 likes · 9 min read

How RAG Works: From Data Prep to LLM Generation Explained

Ma Wei Says

Mar 24, 2025 · Artificial Intelligence

Master BGE Multilingual Embeddings: Models, Installation, and Quick Usage

Explore the BGE (BAAI General Embedding) family—including v1, v1.5, M3, Multilingual Gemma2, and EN‑ICL—detailing their multilingual capabilities, model variants, token limits, optimal use cases, and step‑by‑step installation and Python usage instructions with code examples for embedding generation and similarity scoring.

EmbeddingLLMPython

0 likes · 8 min read

Master BGE Multilingual Embeddings: Models, Installation, and Quick Usage

Alibaba Cloud Developer

Mar 12, 2025 · Artificial Intelligence

How to Optimize DB‑GPT RAG Pipelines for Better Retrieval and Knowledge Processing

This article explains how to use the DB‑GPT application framework to improve Retrieval‑Augmented Generation (RAG) by detailing knowledge loading, chunking, embedding, graph extraction, storage options, retrieval strategies, workflow customization, evaluation metrics, and real‑world case studies.

AIDB-GPTRAG

0 likes · 27 min read

How to Optimize DB‑GPT RAG Pipelines for Better Retrieval and Knowledge Processing

AI Algorithm Path

Mar 5, 2025 · Artificial Intelligence

Understanding NV-Embed: How NVIDIA’s Decoder‑Only Model Achieves State‑of‑the‑Art Embeddings

This article dissects NVIDIA’s open‑source NV‑Embed model, explaining its decoder‑only architecture, latent attention layer, two‑stage contrastive training, data curation strategies, and experimental results that together push embedding performance to the top of the MTEB benchmark.

EmbeddingMistralNV-Embed

0 likes · 9 min read

Understanding NV-Embed: How NVIDIA’s Decoder‑Only Model Achieves State‑of‑the‑Art Embeddings

Architecture and Beyond

Feb 22, 2025 · Artificial Intelligence

Understanding Retrieval‑Augmented Generation (RAG) and Its Role in Enhancing Large Language Models

The article explains how the inherent knowledge‑staleness, hallucination, lack of private data, non‑traceable output, limited long‑text handling, and data‑security concerns of large language models can be mitigated by Retrieval‑Augmented Generation, which combines external retrieval, augmentation, and generation to provide up‑to‑date, reliable, and secure AI responses.

AIKnowledge augmentationLLM

0 likes · 15 min read

Understanding Retrieval‑Augmented Generation (RAG) and Its Role in Enhancing Large Language Models

iKang Technology Team

Feb 7, 2025 · Artificial Intelligence

Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation

Retrieval‑Augmented Generation (RAG) using LangChain lets developers enhance large language models by embedding user queries, fetching relevant documents from a vector store, inserting the context into a prompt template, and generating concise, source‑grounded answers, offering low‑cost, up‑to‑date knowledge while reducing hallucinations and fine‑tuning expenses.

LLMLangChainRAG

0 likes · 10 min read

Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation

Zhihu Tech Column

Jan 17, 2025 · Artificial Intelligence

Zhihu Direct Answer: Product Overview and Technical Practices

This article summarizes the key technical insights from Zhihu Direct Answer, an AI-powered search product, covering its product overview, RAG framework, query understanding, retrieval strategies, chunking, reranking, generation techniques, evaluation methods, and engineering optimizations for cost and performance.

AI SearchChunkingEngineering Optimization

0 likes · 13 min read

Zhihu Direct Answer: Product Overview and Technical Practices

JD Tech Talk

Nov 26, 2024 · Artificial Intelligence

Design and Implementation of an Automated Logistics QA Bot Using Retrieval, Rerank, and Data Augmentation Techniques

This article describes a low‑cost, privacy‑preserving chatbot for logistics that combines data cleaning, large‑model‑based data augmentation, BM25 and vector retrieval, a DNN rerank model, and LLM‑driven answer rewriting to deliver accurate, compliant automated responses.

AIBM25Data Augmentation

0 likes · 11 min read

Design and Implementation of an Automated Logistics QA Bot Using Retrieval, Rerank, and Data Augmentation Techniques

DataFunSummit

Nov 8, 2024 · Artificial Intelligence

ChatDBA: An AI‑Powered Database Fault Diagnosis Assistant Using Retrieval‑Augmented Generation

ChatDBA, developed by Shanghai Aikesheng, is an AI-driven database operation assistant that leverages large language models and Retrieval‑Augmented Generation to provide fault diagnosis, knowledge learning, SQL generation and optimization, addressing challenges such as vague outputs, complex troubleshooting logic, and memory management through a structured architecture and multi‑modal retrieval strategies.

AIFault diagnosisLarge Language Model

0 likes · 10 min read

ChatDBA: An AI‑Powered Database Fault Diagnosis Assistant Using Retrieval‑Augmented Generation

DevOps

Oct 27, 2024 · Artificial Intelligence

Best Practices for Building Efficient Retrieval‑Augmented Generation (RAG) Systems

This article reviews Wang et al.'s 2024 research on Retrieval‑Augmented Generation, outlining optimal practices such as query classification, chunk sizing, hybrid metadata search, embedding selection, vector databases, query transformation, reranking, document repacking, summarization, fine‑tuning, and multimodal retrieval to guide developers in constructing high‑performance RAG pipelines.

LLMQuery ClassificationRAG

0 likes · 11 min read

Best Practices for Building Efficient Retrieval‑Augmented Generation (RAG) Systems

AntTech

Sep 12, 2024 · Artificial Intelligence

Knowledge‑Enhanced Large Model Service Framework (KAG): Integrating Knowledge Graphs with LLMs for Vertical Domain Applications

The KAG framework combines knowledge‑graph‑driven symbolic reasoning with large language model generation to improve accuracy, reduce hallucinations, and enable controllable, domain‑specific AI services such as government and medical Q&A, with open‑source support via OpenSPG and TuGraph‑DB.

AIKnowledge Graphframework

0 likes · 13 min read

Knowledge‑Enhanced Large Model Service Framework (KAG): Integrating Knowledge Graphs with LLMs for Vertical Domain Applications

AntData

Jul 12, 2024 · Databases

Recent Advances in Vector Databases Presented at SIGMOD 2024

This article reviews the latest vector database research showcased at SIGMOD 2024, covering system designs such as Starling, Vexless, RaBitQ, and ACORN, and discusses current academic hotspots including query processing, index structures, optimization techniques, and hardware acceleration for large‑scale similarity search.

AIIndexingSIGMOD 2024

0 likes · 20 min read

Recent Advances in Vector Databases Presented at SIGMOD 2024

Baidu Intelligent Cloud Tech Hub

May 27, 2024 · Databases

Baidu’s Enterprise Vector Database: Architecture, Performance, and RAG Secrets

An exclusive interview with Baidu’s senior database architects reveals the motivations behind building a dedicated enterprise vector database, details its novel column‑store engine, C++‑based retrieval stack, performance gains over open‑source solutions, multi‑modal support, RAG integration, and future research directions.

AIRAGStorage Engine

0 likes · 28 min read

Baidu’s Enterprise Vector Database: Architecture, Performance, and RAG Secrets

AI Large Model Application Practice

May 27, 2024 · Artificial Intelligence

Building Agentic RAG with LlamaIndex: From Tool Agents to a Top Agent

This article walks through the design and implementation of an Agentic Retrieval‑Augmented Generation system using LlamaIndex, showing how to wrap multiple RAG engines as tools, orchestrate them with hierarchical AI agents, and scale the solution with tool retrieval for large document collections.

AI AgentLlamaIndexPython

0 likes · 14 min read

Building Agentic RAG with LlamaIndex: From Tool Agents to a Top Agent

AI Large Model Application Practice

Mar 29, 2024 · Artificial Intelligence

How RAG Architecture Evolves: From Simple Chains to Flexible RAG Flows

This article examines the evolution of Retrieval‑Augmented Generation (RAG) architectures for large language models, outlines the challenges they face, introduces the modular RAG Flow concept with four workflow paradigms, and provides a step‑by‑step implementation using LangChain and LlamaIndex with code examples.

LLMLangChainRAG

0 likes · 15 min read

How RAG Architecture Evolves: From Simple Chains to Flexible RAG Flows

Alibaba Cloud Big Data AI Platform

Nov 27, 2023 · Artificial Intelligence

How OpenSearch Supercharges Vector Search for Large‑Model Applications

This article explains how Alibaba Cloud OpenSearch leverages vector retrieval, engineering and algorithmic optimizations, heterogeneous CPU‑GPU computing, and dense‑sparse hybrid memory to deliver billion‑scale, high‑throughput search performance and enable conversational AI use cases such as intelligent Q&A and SmartArXiv.

AIOpenSearchretrieval

0 likes · 16 min read

How OpenSearch Supercharges Vector Search for Large‑Model Applications

DataFunSummit

Nov 24, 2023 · Artificial Intelligence

Cold-Start Content Recommendation Practices at Kuaishou

This article describes Kuaishou's approach to cold-start content recommendation, outlining the problems addressed, challenges in modeling sparse new videos, and solutions including graph neural networks, I2U retrieval, TDM hierarchical retrieval, bias correction, and future research directions.

Bias CorrectionGraph Neural NetworkKuaishou

0 likes · 19 min read

Cold-Start Content Recommendation Practices at Kuaishou

HomeTech

Sep 26, 2023 · Artificial Intelligence

Integrating Large Language Models with Search for Automotive Knowledge Retrieval

This article explores how combining traditional keyword search with large language models (LLMs) enhances understanding of user intent, builds a robust automotive knowledge base, and delivers more accurate, context‑aware answers through a multi‑stage retrieval and generation pipeline.

AIAutomotiveKnowledge Base

0 likes · 17 min read

Integrating Large Language Models with Search for Automotive Knowledge Retrieval

Volcano Engine Developer Services

Sep 19, 2023 · Databases

Unlocking AI with Vector Databases: Architecture, Optimization, and Real-World Cases

This article explores how vector databases serve as the memory layer for large AI models, detailing their distributed, compute‑separated architecture, performance optimizations, hybrid vector‑scalar retrieval, and practical deployments across TikTok’s ecosystem such as image search, intelligent Q&A, and multimodal AI services.

AIKnowledge Basedistributed architecture

0 likes · 11 min read

Unlocking AI with Vector Databases: Architecture, Optimization, and Real-World Cases

21CTO

Jun 16, 2023 · Artificial Intelligence

Why Are LLM Stacks Becoming Essential for Modern Companies?

A comprehensive look at how companies are rapidly adopting large language model APIs, retrieval techniques, and custom model strategies, revealing key statistics, emerging toolchains, and the shifting balance between closed‑source LLM services and open‑source custom stacks.

AI adoptionCustom ModelsLLM

0 likes · 8 min read

Why Are LLM Stacks Becoming Essential for Modern Companies?

Python Programming Learning Circle

Mar 27, 2023 · Artificial Intelligence

OpenAI Launches ChatGPT Plugins: Browser, Code Interpreter, Retrieval and Third‑Party Extensions

OpenAI has unveiled a suite of ChatGPT plugins—including a web‑browser, a code interpreter, a retrieval tool, and support for third‑party services—enabling the model to access up‑to‑date information, run Python code, query vector databases, and integrate external APIs, dramatically expanding its practical capabilities.

ChatGPTCode interpreterPlugins

0 likes · 8 min read

OpenAI Launches ChatGPT Plugins: Browser, Code Interpreter, Retrieval and Third‑Party Extensions

DataFunSummit

Mar 24, 2023 · Artificial Intelligence

OpenAI Launches ChatGPT Plugin System: Features, Examples, and Safety Discussion

OpenAI announced a safety‑focused ChatGPT plugin system that connects the model to third‑party APIs for real‑time information retrieval, knowledge‑base access, and task execution, showcasing first‑party browser and code‑interpreter plugins, third‑party extensions, an open‑source retrieval plugin, and a detailed debate on security implications.

AI safetyChatGPTCode interpreter

0 likes · 9 min read

OpenAI Launches ChatGPT Plugin System: Features, Examples, and Safety Discussion

DataFunTalk

Jan 28, 2023 · Artificial Intelligence

Industry Search: Background, Technologies, and Real‑World Applications

This article presents a comprehensive overview of industry search, covering its background, core retrieval and ranking technologies—including sparse and dense retrieval, pre‑trained language models, tokenization, NER, adaptive multi‑task training, and re‑ranking models—followed by detailed case studies such as address analysis, family‑ID unification, emergency call handling, education photo‑search, and power‑knowledge‑base integration.

NLPPretrained Modelsaddress analysis

0 likes · 13 min read

Industry Search: Background, Technologies, and Real‑World Applications

DataFunTalk

Nov 8, 2022 · Artificial Intelligence

Retrieval-Based Dialogue System Framework for Customer Service: Architecture, Retrieval, Ranking, and Practical Applications

This article presents a comprehensive retrieval‑based dialogue system designed to assist customer‑service agents by recommending candidate replies, detailing its five‑layer architecture, metric suite, text and vector retrieval modules, ranking strategies, and real‑world deployment results across multiple business scenarios.

AIRankingcustomer service

0 likes · 34 min read

Retrieval-Based Dialogue System Framework for Customer Service: Architecture, Retrieval, Ranking, and Practical Applications

Zhuanzhuan Tech

Sep 29, 2022 · Artificial Intelligence

Design and Implementation of ZhiZhuan's Low-Result Search Module with Hybrid Hard and Soft Retrieval

The article details the architecture and techniques of ZhiZhuan's low-result search module, explaining how it combines ElasticSearch hard matching and sBert semantic vector soft matching, along with sophisticated negative sample strategies, to improve recommendation coverage and user experience.

FAISSSearchrecommendation

0 likes · 17 min read

Design and Implementation of ZhiZhuan's Low-Result Search Module with Hybrid Hard and Soft Retrieval

DataFunSummit

Feb 21, 2022 · Artificial Intelligence

Advances in E‑commerce Search: Embedding, Knowledge Graphs, and Retrieval Models

This article reviews recent research on e‑commerce search, covering transformer‑based complementary rankings, Alibaba's cognitive concept net and its extension, joint deep retrieval with product quantization, personalized semantic retrieval, multi‑granularity deep semantic retrieval, and graph‑attention networks for long‑tail shop search.

AIEmbeddingGraph Neural Network

0 likes · 12 min read

Advances in E‑commerce Search: Embedding, Knowledge Graphs, and Retrieval Models

DataFunTalk

Dec 13, 2021 · Artificial Intelligence

Dual Vector Foil (DVF): Decoupled Index and Model for Large‑Scale Retrieval

The article introduces the Dual Vector Foil (DVF) algorithm system, which decouples index construction from model training to enable lightweight, high‑precision large‑scale recall using arbitrary complex models, and details its two‑stage and one‑stage solutions, graph‑based retrieval implementation, performance optimizations, and experimental results.

Large ScaleRecommendation Systemsalgorithm

0 likes · 28 min read

Dual Vector Foil (DVF): Decoupled Index and Model for Large‑Scale Retrieval

DataFunTalk

Feb 15, 2021 · Artificial Intelligence

Deep Tree Matching (TDM): Evolution and Practice in Large-Scale Retrieval at Alibaba

This article explains Alibaba's Deep Tree Matching (TDM) technology, covering the challenges of large‑scale match retrieval, the progression from classic two‑stage recall to tree‑based indexing, max‑heap tree modeling, beam‑search retrieval, and the joint model‑index learning across TDM 1.0, 2.0, and 3.0, highlighting significant offline and online performance gains and future research directions.

AlibabaBeam Searchdeep learning

0 likes · 15 min read

Deep Tree Matching (TDM): Evolution and Practice in Large-Scale Retrieval at Alibaba

DataFunTalk

Jul 8, 2020 · Artificial Intelligence

Multi‑Level Multi‑Modal Search Engine and Graph Engine for Video Content at Youku

The article presents a detailed technical overview of Youku's video search system, covering multi‑modal inputs, multi‑level element indexing, face search, cross‑level and cross‑modal retrieval, and the design and applications of a multimodal graph engine with knowledge‑graph integration.

AIKnowledge GraphMultimodal

0 likes · 12 min read

Multi‑Level Multi‑Modal Search Engine and Graph Engine for Video Content at Youku

DataFunTalk

Aug 16, 2019 · Artificial Intelligence

Tree‑based Deep Match (TDM): Design, Implementation, and Applications in Large‑Scale Retrieval

This article presents a comprehensive overview of the Tree‑based Deep Match (TDM) algorithm, describing the evolution of retrieval technology, the limitations of traditional Match‑Rank pipelines, the design of a one‑stage tree‑indexed deep matching model, its training methodology, performance gains on public datasets, and its deployment in Alibaba’s advertising and e‑commerce platforms.

Large ScaleRecommendation SystemsTDM

0 likes · 23 min read

Tree‑based Deep Match (TDM): Design, Implementation, and Applications in Large‑Scale Retrieval

360 Tech Engineering

Jul 31, 2019 · Backend Development

Design and Key Technologies of the 360 Search Engine for Billion‑Scale Web Retrieval

This article explains how 360 Search processes billions of web pages daily, detailing its backend architecture, offline indexing, online retrieval, index organization, and relevance models that enable efficient search over a hundred‑billion‑scale web corpus.

Big DataHBaseIndexing

0 likes · 21 min read

Design and Key Technologies of the 360 Search Engine for Billion‑Scale Web Retrieval

Qunar Tech Salon

Mar 1, 2018 · Artificial Intelligence

Open-Domain Chatbot Implementation: Retrieval and Generative Approaches

This article explains the implementation of open-domain chatbots for customer service, comparing retrieval‑based and generative seq2seq approaches, describing hybrid methods that first attempt constrained retrieval before falling back to generation, and discusses training data, keyword extraction, and performance observations.

AIChatbotSeq2Seq

0 likes · 6 min read

Open-Domain Chatbot Implementation: Retrieval and Generative Approaches

Baidu Intelligent Testing

Jun 8, 2016 · Fundamentals

Evaluation Framework for Search Retrieval Systems: Speed, Relevance, Recall, and Freshness

The article introduces a four‑dimensional evaluation framework for retrieval systems—Fast, Accurate, Complete, and New—explaining how each metric is measured, why it matters to users, and how crowdsourced testing across devices and networks can provide objective quality assessments.

EvaluationInformation RetrievalMetrics

0 likes · 6 min read

Evaluation Framework for Search Retrieval Systems: Speed, Relevance, Recall, and Freshness

Baidu Intelligent Testing

Apr 28, 2016 · Operations

Testing and Evaluation Practices for Baidu Doctor Platform

This article details Baidu Doctor’s comprehensive testing and monitoring strategies, covering user experience data analysis, source data trust, online monitoring systems, log‑based automated checks, retrieval backend testing, evaluation metrics, Badcase mining, and user search habit analysis to ensure high‑quality medical O2O services.

Monitoringdata analysismedical platform

0 likes · 14 min read

Testing and Evaluation Practices for Baidu Doctor Platform

Qunar Tech Salon

Feb 20, 2016 · Artificial Intelligence

Mobile Image Search: Algorithm Framework and Implementation at Paizhi Tao

Mobile image search has become a critical user demand, and since its 2014 launch, Alibaba’s Paizhi Tao has evolved through multiple iterations to a robust AI-driven pipeline comprising category prediction, object detection, deep and local image feature extraction, scalable retrieval indexing, and relevance-based ranking.

deep learningimage searchmobile AI

0 likes · 6 min read

Mobile Image Search: Algorithm Framework and Implementation at Paizhi Tao

21CTO

Jan 29, 2016 · Artificial Intelligence

How Mobile Image Search Powers Real-Time Shopping: Inside Pailitao’s AI Algorithm

Mobile visual search, a long‑standing dream, has evolved from early research to a production‑grade system at Pailitao, where a five‑module AI pipeline—category prediction, object detection, feature extraction, indexing, and ranking—enables billions of images to be searched instantly on mobile devices.

computer visiondeep learningimage search

0 likes · 8 min read