Tagged articles
82 articles
Page 1 of 1
AI Architecture Hub
AI Architecture Hub
May 19, 2026 · Artificial Intelligence

Agent Memory: From Theory to Practical Implementation

The article explains how AI agents can acquire long‑term memory by combining three functions—coherence, context, and learning—with four memory types, describes the full retrieval‑store loop, and provides a step‑by‑step Python implementation using OpenAI embeddings, ChromaDB, and forgetting strategies.

AI agentsChromaDBPython
0 likes · 17 min read
Agent Memory: From Theory to Practical Implementation
Architect
Architect
May 17, 2026 · Artificial Intelligence

Agent Skills Survey: How Process Knowledge Becomes Technical Debt

The recent arXiv survey on Agent Skills maps the full lifecycle of skills—representation, acquisition, retrieval, and evolution—and warns that unchecked growth can turn a valuable process asset into technical debt, urging teams to enforce admission quality, robust routing, versioning, testing, and retirement mechanisms.

AI EngineeringAgent SkillsProcess Assets
0 likes · 26 min read
Agent Skills Survey: How Process Knowledge Becomes Technical Debt
ITPUB
ITPUB
May 13, 2026 · Databases

Is the Hype Around Vector Databases a Pseudo‑Demand in the AI Era?

The article questions whether dedicated vector databases are truly needed for AI applications, examining market hype, the rapid emergence of many vector‑DB products, real‑world examples like PostgreSQL pgvector and major vendor integrations, and the hidden costs of data fragmentation and operational complexity.

AIPostgreSQLRAG
0 likes · 15 min read
Is the Hype Around Vector Databases a Pseudo‑Demand in the AI Era?
dbaplus Community
dbaplus Community
May 5, 2026 · Artificial Intelligence

The True Nature of Agent Memory: Deep Dive into Architecture and Design

The article analyses why a real agent must have memory, defining memory as an external state that feeds decision‑making, proposing a three‑part architecture (Raw Ledger, Views, Policy), contrasting parametric and non‑parametric approaches, and detailing bottlenecks, temporal handling, and procedural extensions.

Agent MemoryMemory Architecturenon‑parametric memory
0 likes · 46 min read
The True Nature of Agent Memory: Deep Dive into Architecture and Design
AI Engineer Programming
AI Engineer Programming
May 2, 2026 · Artificial Intelligence

From Demo to Production: How to Evaluate RAG Effectively

This guide outlines a comprehensive RAG evaluation framework covering failure modes, multi‑layer metrics, test‑set construction, open‑source tools, CI/CD quality gates, production monitoring, and special considerations for agentic RAG to ensure reliable, trustworthy retrieval‑augmented generation systems.

AIGenerationLLM
0 likes · 18 min read
From Demo to Production: How to Evaluate RAG Effectively
AI Engineer Programming
AI Engineer Programming
May 1, 2026 · Artificial Intelligence

From Naive Retrieval to Knowledge Runtime: The Full Evolution of RAG

The article traces the evolution of Retrieval‑Augmented Generation from its 2020 Naive baseline through Advanced, Modular, Graph, and Agentic generations, detailing architectural shifts, optimization techniques, self‑correction mechanisms, and future challenges such as long‑context handling and multimodal retrieval.

AgenticLLMRAG
0 likes · 14 min read
From Naive Retrieval to Knowledge Runtime: The Full Evolution of RAG
MaGe Linux Operations
MaGe Linux Operations
Apr 22, 2026 · Artificial Intelligence

5 Essential Design Principles for Building High‑Quality RAG Systems

This article outlines five critical design principles for constructing high‑quality Retrieval‑Augmented Generation (RAG) systems, covering document chunking strategies, embedding model selection, hybrid retrieval architectures, metadata filtering with multi‑level indexes, and reranking mechanisms, and provides concrete code snippets and evaluation metrics.

EmbeddingHybrid RetrievalRAG
0 likes · 17 min read
5 Essential Design Principles for Building High‑Quality RAG Systems
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Apr 22, 2026 · Artificial Intelligence

How to Classify and Manage Agent Memories for Better Retrieval

This article dissects Claude Code's memory system, explains why unstructured memory degrades performance, introduces four distinct memory types with concrete examples and schema, shows how to handle expiration and retrieval strategies, and provides step‑by‑step implementation code to improve agent reliability.

Agent MemoryLLMMemory Management
0 likes · 19 min read
How to Classify and Manage Agent Memories for Better Retrieval
dbaplus Community
dbaplus Community
Apr 12, 2026 · Artificial Intelligence

Boost RAG Accuracy to 94%: 11 Proven Strategies and How to Combine Them

After struggling with naive RAG that delivered only 60% accuracy, the author outlines eleven advanced strategies—including context-aware chunking, query expansion, re‑ranking, multi‑query, knowledge graphs, and agent‑based retrieval—that together raise performance to 94%, and provides detailed implementation examples, trade‑offs, and a step‑by‑step deployment roadmap.

AIEmbeddingKnowledge Graph
0 likes · 32 min read
Boost RAG Accuracy to 94%: 11 Proven Strategies and How to Combine Them
AI Code to Success
AI Code to Success
Apr 3, 2026 · Artificial Intelligence

Can Your AI Agent Earn a College Degree? Exploring Clawvard’s Evaluation Platform

The author explores Clawvard, an AI‑agent assessment platform that tests agents across eight dimensions, shares personal test results showing an initial A‑ rating with a critical retrieval weakness, details the customized improvement rules applied, and demonstrates a subsequent A+ rating, while also discussing the platform’s limits and practical use cases.

AI AgentPrompt engineeringartificial intelligence
0 likes · 8 min read
Can Your AI Agent Earn a College Degree? Exploring Clawvard’s Evaluation Platform
AgentGuide
AgentGuide
Apr 3, 2026 · Artificial Intelligence

How to Evaluate RAG Systems: Key Metrics and the Ragas Framework

The article explains how to assess Retrieval-Augmented Generation (RAG) projects using the Ragas automated evaluation framework, detailing four key dimensions—recall quality, answer faithfulness, answer relevance, and context utilization—and describes the underlying metrics for both retrieval and generation stages.

LLMRAGRAGAS
0 likes · 5 min read
How to Evaluate RAG Systems: Key Metrics and the Ragas Framework
Architecture and Beyond
Architecture and Beyond
Mar 29, 2026 · Artificial Intelligence

Designing Efficient Memory for Claude Code: Typed Storage, Indexed Management, Triggered Retrieval, and Pre‑Use Validation

This article analyzes Claude Code's memory system, explaining how typed storage separates user, feedback, project, and reference data, how an indexed MEMORY.md file keeps the index lightweight, how triggered retrieval balances relevance, freshness, and reliability, and why pre‑use validation prevents stale or incorrect facts from contaminating model responses.

AI memoryClaudePrompt engineering
0 likes · 17 min read
Designing Efficient Memory for Claude Code: Typed Storage, Indexed Management, Triggered Retrieval, and Pre‑Use Validation
Java One
Java One
Mar 28, 2026 · Artificial Intelligence

Building a Vector‑Free RAG System with Hierarchical Page Indexing

This guide explains how to create a retrieval‑augmented generation (RAG) system that avoids embeddings by converting documents into a hierarchical tree, using an LLM to navigate, summarize, and retrieve answers, complete with a full Python implementation and a GitHub repository.

Hierarchical IndexingLLMPython
0 likes · 15 min read
Building a Vector‑Free RAG System with Hierarchical Page Indexing
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 17, 2026 · Artificial Intelligence

Mastering Chunk Splitting for RAG: From Fixed Length to Semantic Segmentation

Chunk splitting, a critical yet often overlooked step in RAG pipelines, dramatically impacts retrieval recall and LLM output quality; this guide walks through three evolution stages—from naive fixed‑length splits to sentence‑aware overlaps and finally semantic, structure‑driven segmentation—complete with code, experiments, and practical pitfalls.

LLMRAGchunking
0 likes · 15 min read
Mastering Chunk Splitting for RAG: From Fixed Length to Semantic Segmentation
PaperAgent
PaperAgent
Mar 10, 2026 · Artificial Intelligence

How MemSifter Delivers High‑Precision, Low‑Cost Long‑Term Memory for LLMs

MemSifter introduces a lightweight agent that outsources memory retrieval for large language models, using a Think‑and‑Rank pipeline and a task‑result‑oriented reinforcement‑learning training paradigm to achieve superior retrieval accuracy and efficiency across eight benchmark tasks while keeping inference overhead minimal.

AgentBenchmarkLLM
0 likes · 13 min read
How MemSifter Delivers High‑Precision, Low‑Cost Long‑Term Memory for LLMs
Data Party THU
Data Party THU
Mar 8, 2026 · Artificial Intelligence

6 Practical Context‑Engineering Techniques to Tame RAG Hallucinations

This article explains why retrieval‑augmented generation (RAG) models often hallucinate, introduces the concept of context engineering, and details six practical techniques—including selective retrieval, context compression, hierarchical layout, dynamic query rewriting, memory management, and tool‑aware context—along with their trade‑offs and real‑world impact.

AIContext EngineeringLLM
0 likes · 23 min read
6 Practical Context‑Engineering Techniques to Tame RAG Hallucinations
AI Tech Publishing
AI Tech Publishing
Mar 4, 2026 · Artificial Intelligence

AI Agent Context Management: Comparing Six Major Companies' Approaches

The article analyzes how six leading AI‑agent providers—Manus, Cursor, Anthropic, OpenAI, Google, and LangChain—tackle the fundamental problem of when and how a large language model should see information, detailing each solution, a cross‑company comparison matrix, consensus points, controversies, and open research questions.

AI agentsContext managementLLM
0 likes · 19 min read
AI Agent Context Management: Comparing Six Major Companies' Approaches
DataFunSummit
DataFunSummit
Feb 24, 2026 · Artificial Intelligence

How Large Language Models Are Redefining Search Ranking at Tencent

This article details Tencent Search's exploration of large‑model‑driven ranking, covering the evolution from traditional keyword retrieval to RAG‑based AI search, the multi‑stage AI ranking architecture (L0‑L5), model training pipelines, distillation, synthetic data generation, and future research directions.

LLMRAGranking architecture
0 likes · 21 min read
How Large Language Models Are Redefining Search Ranking at Tencent
PaperAgent
PaperAgent
Feb 20, 2026 · Artificial Intelligence

Why Graph-Based Memory Is the Next Frontier for AI Agents

This article surveys recent advances in graph‑structured agent memory, presenting a taxonomy, lifecycle stages from extraction to evolution, open‑source tools, and benchmark suites that together illustrate how graph memory can overcome knowledge truncation, tool incompetence, and performance saturation in LLM‑driven AI agents.

AI agentsevolutiongraph memory
0 likes · 8 min read
Why Graph-Based Memory Is the Next Frontier for AI Agents
DaTaobao Tech
DaTaobao Tech
Feb 9, 2026 · Artificial Intelligence

Boosting Trustworthiness in Retrieval‑Augmented Generation: The Trustworthy Generation Design Pattern

This article presents the Trustworthy Generation design pattern for Retrieval‑Augmented Generation (RAG) systems, analyzes four root causes of low trustworthiness—retrieval errors, content reliability, pre‑retrieval reasoning mistakes, and model hallucinations—and proposes layered solutions, citation techniques, CRAG and Self‑RAG architectures, guardrails, and practical trade‑offs.

AI SafetyGenerationLLM
0 likes · 16 min read
Boosting Trustworthiness in Retrieval‑Augmented Generation: The Trustworthy Generation Design Pattern
Architecture and Beyond
Architecture and Beyond
Feb 8, 2026 · Artificial Intelligence

Designing Scalable Long-Term Memory for AI Agents: Capture, Compress, Retrieve

This article explains how to build a controllable, editable, and cost‑effective long‑term memory system for AI agents by categorizing memory types, structuring a three‑stage pipeline of capture, AI‑driven compression, and smart retrieval, and choosing appropriate storage back‑ends such as files, knowledge bases, or databases.

Agent DesignKnowledge BaseLong-term Memory
0 likes · 18 min read
Designing Scalable Long-Term Memory for AI Agents: Capture, Compress, Retrieve
PaperAgent
PaperAgent
Feb 4, 2026 · Artificial Intelligence

How Agent KB Enables Cross‑Framework Knowledge Sharing for Smarter AI Agents

The article presents Agent KB, a universal memory infrastructure that lets heterogeneous AI agents share experiences through a Reason‑Retrieve‑Refine pipeline and a teacher‑student dual‑agent architecture, showing significant performance gains across benchmarks like GAIA, SWE‑bench, and various LLM families.

AI agentsKnowledge Basecross‑framework
0 likes · 10 min read
How Agent KB Enables Cross‑Framework Knowledge Sharing for Smarter AI Agents
Architecture and Beyond
Architecture and Beyond
Feb 1, 2026 · Artificial Intelligence

5 High‑ROI Strategies to Supercharge RAG Retrieval Performance

This article outlines five practical engineering strategies—multi‑vector retrieval, manual splitting and labeling, scalar enhancement, context augmentation, and dense‑sparse vector integration—that together address common RAG retrieval bottlenecks and dramatically improve recall stability and answer quality.

BM25EngineeringLLM
0 likes · 17 min read
5 High‑ROI Strategies to Supercharge RAG Retrieval Performance
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Nov 12, 2025 · Artificial Intelligence

Agent Memory Modules Explained: Short‑Term vs Long‑Term Strategies for LLM Agents

This article breaks down the memory systems behind LLM‑based agents, explaining why persistent memory is needed, the differences between short‑term context buffers and long‑term vector stores, practical implementation choices, maintenance strategies, and how to articulate these concepts effectively in technical interviews.

AgentLLMretrieval
0 likes · 14 min read
Agent Memory Modules Explained: Short‑Term vs Long‑Term Strategies for LLM Agents
Data Party THU
Data Party THU
Nov 9, 2025 · Artificial Intelligence

Mastering Chunking Strategies for Effective RAG: Fixed, Recursive, Semantic, Structured, and Delayed

This article walks through the core RAG pipeline, explains why chunking is the linchpin of retrieval quality, and provides detailed definitions, trade‑offs, and implementation examples for five chunking techniques—fixed, recursive, semantic, structure‑aware, and delayed—so you can choose the right approach for any document‑heavy AI application.

AILLMRAG
0 likes · 10 min read
Mastering Chunking Strategies for Effective RAG: Fixed, Recursive, Semantic, Structured, and Delayed
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Nov 1, 2025 · Artificial Intelligence

Turn a Basic RAG Demo into a High‑Impact Interview Project

This guide shows how to evolve a simple Retrieval‑Augmented Generation prototype into a production‑grade system by strengthening data ingestion, optimizing retrieval with hybrid and reranking techniques, adding query rewriting, long‑context handling, reinforcement learning, and multimodal support, so candidates can demonstrate real engineering depth in interviews.

AILLMRAG
0 likes · 7 min read
Turn a Basic RAG Demo into a High‑Impact Interview Project
DeWu Technology
DeWu Technology
Oct 29, 2025 · Artificial Intelligence

Why Chunking Can Make or Break Your RAG System – Practical Strategies & Code

This article explains how proper document chunking—choosing the right chunk size, overlap, and structure‑aware boundaries—directly impacts the relevance, factuality, and efficiency of Retrieval‑Augmented Generation pipelines, and provides multiple Python implementations ranging from simple fixed‑length splits to semantic and hybrid approaches.

EmbeddingLLMRAG
0 likes · 29 min read
Why Chunking Can Make or Break Your RAG System – Practical Strategies & Code
Amap Tech
Amap Tech
Oct 17, 2025 · Artificial Intelligence

How Ranking Improves In-Context Example Retrieval: Insights from NeurIPS ’25

This article explains the limitations of current pointwise in‑context learning methods, introduces a novel ranking‑based approach called SeDPO that learns preference orders among examples, and demonstrates its superior performance across multiple NLP tasks through extensive experiments and ablation studies.

In-Context LearningNeurIPSSeDPO
0 likes · 10 min read
How Ranking Improves In-Context Example Retrieval: Insights from NeurIPS ’25
DataFunTalk
DataFunTalk
Oct 6, 2025 · Artificial Intelligence

Mastering Context Engineering: 5 Proven Strategies to Boost AI Agent Performance

This article explores the emerging concept of context engineering for AI agents, explains why managing long‑range context is critical, and details five practical strategies—Offload, Reduce, Retrieve, Isolate, and Cache—backed by insights from leading industry teams and the "Bitter Lesson" philosophy.

AI agentsContext EngineeringLLM optimization
0 likes · 30 min read
Mastering Context Engineering: 5 Proven Strategies to Boost AI Agent Performance
Data STUDIO
Data STUDIO
Sep 28, 2025 · Artificial Intelligence

Top Reranker Models for RAG in 2025: A Comparative Review

This article explains why initial retrieval in Retrieval‑Augmented Generation often yields noisy results, describes how rerankers act as quality filters to improve relevance, compares the leading 2025 reranker models—including Cohere, bge‑reranker, Voyage, Jina, FlashRank, and MixedBread—and provides code snippets, evaluation metrics, and guidance for selecting the right model for specific use cases.

AICross-EncoderLLM
0 likes · 31 min read
Top Reranker Models for RAG in 2025: A Comparative Review
Data Thinking Notes
Data Thinking Notes
Sep 7, 2025 · Artificial Intelligence

Unlocking AI Agent Memory: How LLMs Use Retrieval and Planning to Stay Smart

This article explains the core architecture of AI agents powered by large language models, detailing how planning, short‑term and long‑term memory, and tool integration work together through vector databases, retrieval‑augmented generation, and summarization to enable stateful, intelligent interactions across multiple sessions.

AI AgentLLMMemory
0 likes · 10 min read
Unlocking AI Agent Memory: How LLMs Use Retrieval and Planning to Stay Smart
Amap Tech
Amap Tech
Sep 2, 2025 · Artificial Intelligence

How Pos2Distill Eliminates Positional Bias in Large Language Models

This article introduces Pos2Distill, a novel knowledge‑distillation framework that transfers capabilities from advantageous to disadvantaged positions in large language models, effectively mitigating positional bias and improving performance on long‑text retrieval and in‑context reasoning tasks.

in-context reasoningknowledge distillationlarge language models
0 likes · 10 min read
How Pos2Distill Eliminates Positional Bias in Large Language Models
Instant Consumer Technology Team
Instant Consumer Technology Team
Sep 2, 2025 · Artificial Intelligence

Why RAG Is Dead: Jeff Huber’s 5 Retrieval Secrets and Context Engineering

Jeff Huber, founder of Chroma, argues that traditional RAG is obsolete, introduces context engineering as the new paradigm, and shares five practical retrieval strategies, a complete pipeline, and insights on handling context rot, memory, and generative benchmarking to build production‑grade AI applications.

AIContext EngineeringGenerative Benchmarking
0 likes · 11 min read
Why RAG Is Dead: Jeff Huber’s 5 Retrieval Secrets and Context Engineering
Instant Consumer Technology Team
Instant Consumer Technology Team
Aug 19, 2025 · Artificial Intelligence

Mastering Document Chunking for RAG: Strategies, Code & Best Practices

This article explores why proper document chunking is crucial for Retrieval‑Augmented Generation, explains core concepts like context windows and signal‑to‑noise, compares various chunking strategies—from simple fixed‑size splits to semantic and hybrid approaches—and provides practical Python code examples to help you build more effective RAG pipelines.

LLMRAGText Splitting
0 likes · 24 min read
Mastering Document Chunking for RAG: Strategies, Code & Best Practices
Alimama Tech
Alimama Tech
Jul 9, 2025 · Artificial Intelligence

How to Make LLMs Recognize and Resolve Their Own Uncertainty

This article introduces ConfuseBench, a benchmark that classifies LLM uncertainty into document‑missing, ability‑limited, and ambiguous types, and presents methods—including retrieval, chain‑of‑thought, and clarification—to detect and actively resolve uncertainty, improving answer quality across diverse tasks.

BenchmarkClarificationInquiry
0 likes · 17 min read
How to Make LLMs Recognize and Resolve Their Own Uncertainty
Tencent Technical Engineering
Tencent Technical Engineering
Jun 16, 2025 · Artificial Intelligence

Mastering RAG and AI Agents: Practical Tips, Code Samples, and Evaluation Strategies

This comprehensive guide walks you through the fundamentals of Retrieval‑Augmented Generation (RAG) and AI agents, explains their inner workings, shares optimization tricks, provides ready‑to‑run code snippets, and demonstrates how to evaluate performance with metrics such as recall, faithfulness, and answer relevance.

AI agentsLLMPrompt engineering
0 likes · 36 min read
Mastering RAG and AI Agents: Practical Tips, Code Samples, and Evaluation Strategies
ITPUB
ITPUB
Jun 15, 2025 · Artificial Intelligence

How to Build a High‑Performance Enterprise RAG System with Model Context Protocol (MCP)

This article presents a step‑by‑step guide for constructing a scalable enterprise Retrieval‑Augmented Generation (RAG) solution using the Model Context Protocol (MCP), covering architecture comparison, system design, Milvus‑backed knowledge store, Python client implementation, deployment scripts, code examples, and best‑practice recommendations.

KnowledgeBaseLLMMCP
0 likes · 22 min read
How to Build a High‑Performance Enterprise RAG System with Model Context Protocol (MCP)
DataFunSummit
DataFunSummit
May 9, 2025 · Artificial Intelligence

Practical Experience Building Zhihu Direct Answer: An AI‑Powered Search Product

This article presents a comprehensive overview of Zhihu Direct Answer, describing its AI‑driven search architecture, RAG framework, query understanding, retrieval, chunking, reranking, generation, evaluation mechanisms, engineering optimizations, and the professional edition, while sharing concrete performance‑boosting practices and future development plans.

AIGenerationProduct Development
0 likes · 14 min read
Practical Experience Building Zhihu Direct Answer: An AI‑Powered Search Product
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Apr 25, 2025 · Artificial Intelligence

How Evidence Generation Boosts Document-Grounded Dialogue with LLMs

This study introduces DGDE, a document‑grounded dialogue framework that leverages large language model‑generated evidence, combining retrieval, reranking, fine‑tuning, and iterative question correction to markedly improve accuracy, comprehensiveness, coherence, and completeness on the Doc2dial benchmark.

Fine-tuningdocument-grounded dialogueevidence generation
0 likes · 21 min read
How Evidence Generation Boosts Document-Grounded Dialogue with LLMs
Fun with Large Models
Fun with Large Models
Apr 25, 2025 · Artificial Intelligence

Why Your RAG System Underperforms and How to Boost Its Effectiveness by 20%

This article analyzes common shortcomings of RAG pipelines—data preparation, retrieval, and LLM generation—and provides concrete optimization techniques such as advanced chunking, embedding model selection, retrieval parameter tuning, rerank models, and prompt engineering, promising up to a 20% performance gain.

EmbeddingPrompt engineeringRAG
0 likes · 17 min read
Why Your RAG System Underperforms and How to Boost Its Effectiveness by 20%
Tencent Technical Engineering
Tencent Technical Engineering
Apr 22, 2025 · Artificial Intelligence

Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB

Conan‑Embedding‑V2, a newly trained 1.4 B‑parameter LLM with a custom tokenizer, 32 k token context, SoftMask, cross‑lingual retrieval data and dynamic hard‑negative mining, delivers state‑of‑the‑art multilingual embeddings that surpass larger models on both English and Chinese MTEB benchmarks while remaining compact and fast.

EmbeddingMTEBcross-lingual retrieval
0 likes · 14 min read
Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB
Fun with Large Models
Fun with Large Models
Apr 18, 2025 · Artificial Intelligence

How RAG Works: From Data Prep to LLM Generation Explained

This article breaks down Retrieval‑Augmented Generation (RAG) into its three core stages—data preparation, data retrieval, and LLM generation—showing how document chunking, embedding, vector databases, similarity search, and optional re‑ranking combine to let large language models produce more accurate, knowledge‑grounded answers.

EmbeddingLLMRAG
0 likes · 9 min read
How RAG Works: From Data Prep to LLM Generation Explained
Ma Wei Says
Ma Wei Says
Mar 24, 2025 · Artificial Intelligence

Master BGE Multilingual Embeddings: Models, Installation, and Quick Usage

Explore the BGE (BAAI General Embedding) family—including v1, v1.5, M3, Multilingual Gemma2, and EN‑ICL—detailing their multilingual capabilities, model variants, token limits, optimal use cases, and step‑by‑step installation and Python usage instructions with code examples for embedding generation and similarity scoring.

EmbeddingLLMPython
0 likes · 8 min read
Master BGE Multilingual Embeddings: Models, Installation, and Quick Usage
Architecture and Beyond
Architecture and Beyond
Feb 22, 2025 · Artificial Intelligence

Understanding Retrieval‑Augmented Generation (RAG) and Its Role in Enhancing Large Language Models

The article explains how the inherent knowledge‑staleness, hallucination, lack of private data, non‑traceable output, limited long‑text handling, and data‑security concerns of large language models can be mitigated by Retrieval‑Augmented Generation, which combines external retrieval, augmentation, and generation to provide up‑to‑date, reliable, and secure AI responses.

AIKnowledge augmentationLLM
0 likes · 15 min read
Understanding Retrieval‑Augmented Generation (RAG) and Its Role in Enhancing Large Language Models
iKang Technology Team
iKang Technology Team
Feb 7, 2025 · Artificial Intelligence

Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation

Retrieval‑Augmented Generation (RAG) using LangChain lets developers enhance large language models by embedding user queries, fetching relevant documents from a vector store, inserting the context into a prompt template, and generating concise, source‑grounded answers, offering low‑cost, up‑to‑date knowledge while reducing hallucinations and fine‑tuning expenses.

LLMLangChainRAG
0 likes · 10 min read
Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation
Zhihu Tech Column
Zhihu Tech Column
Jan 17, 2025 · Artificial Intelligence

Zhihu Direct Answer: Product Overview and Technical Practices

This article summarizes the key technical insights from Zhihu Direct Answer, an AI-powered search product, covering its product overview, RAG framework, query understanding, retrieval strategies, chunking, reranking, generation techniques, evaluation methods, and engineering optimizations for cost and performance.

AI searchEngineering OptimizationGeneration
0 likes · 13 min read
Zhihu Direct Answer: Product Overview and Technical Practices
DataFunSummit
DataFunSummit
Nov 8, 2024 · Artificial Intelligence

ChatDBA: An AI‑Powered Database Fault Diagnosis Assistant Using Retrieval‑Augmented Generation

ChatDBA, developed by Shanghai Aikesheng, is an AI-driven database operation assistant that leverages large language models and Retrieval‑Augmented Generation to provide fault diagnosis, knowledge learning, SQL generation and optimization, addressing challenges such as vague outputs, complex troubleshooting logic, and memory management through a structured architecture and multi‑modal retrieval strategies.

AIFault DiagnosisRAG
0 likes · 10 min read
ChatDBA: An AI‑Powered Database Fault Diagnosis Assistant Using Retrieval‑Augmented Generation
DevOps
DevOps
Oct 27, 2024 · Artificial Intelligence

Best Practices for Building Efficient Retrieval‑Augmented Generation (RAG) Systems

This article reviews Wang et al.'s 2024 research on Retrieval‑Augmented Generation, outlining optimal practices such as query classification, chunk sizing, hybrid metadata search, embedding selection, vector databases, query transformation, reranking, document repacking, summarization, fine‑tuning, and multimodal retrieval to guide developers in constructing high‑performance RAG pipelines.

LLMQuery ClassificationRAG
0 likes · 11 min read
Best Practices for Building Efficient Retrieval‑Augmented Generation (RAG) Systems
AntTech
AntTech
Sep 12, 2024 · Artificial Intelligence

Knowledge‑Enhanced Large Model Service Framework (KAG): Integrating Knowledge Graphs with LLMs for Vertical Domain Applications

The KAG framework combines knowledge‑graph‑driven symbolic reasoning with large language model generation to improve accuracy, reduce hallucinations, and enable controllable, domain‑specific AI services such as government and medical Q&A, with open‑source support via OpenSPG and TuGraph‑DB.

AIFrameworkKnowledge Graph
0 likes · 13 min read
Knowledge‑Enhanced Large Model Service Framework (KAG): Integrating Knowledge Graphs with LLMs for Vertical Domain Applications
AntData
AntData
Jul 12, 2024 · Databases

Recent Advances in Vector Databases Presented at SIGMOD 2024

This article reviews the latest vector database research showcased at SIGMOD 2024, covering system designs such as Starling, Vexless, RaBitQ, and ACORN, and discusses current academic hotspots including query processing, index structures, optimization techniques, and hardware acceleration for large‑scale similarity search.

AISIGMOD 2024indexing
0 likes · 20 min read
Recent Advances in Vector Databases Presented at SIGMOD 2024
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 27, 2024 · Databases

Baidu’s Enterprise Vector Database: Architecture, Performance, and RAG Secrets

An exclusive interview with Baidu’s senior database architects reveals the motivations behind building a dedicated enterprise vector database, details its novel column‑store engine, C++‑based retrieval stack, performance gains over open‑source solutions, multi‑modal support, RAG integration, and future research directions.

AIRAGStorage Engine
0 likes · 28 min read
Baidu’s Enterprise Vector Database: Architecture, Performance, and RAG Secrets
AI Large Model Application Practice
AI Large Model Application Practice
Mar 29, 2024 · Artificial Intelligence

How RAG Architecture Evolves: From Simple Chains to Flexible RAG Flows

This article examines the evolution of Retrieval‑Augmented Generation (RAG) architectures for large language models, outlines the challenges they face, introduces the modular RAG Flow concept with four workflow paradigms, and provides a step‑by‑step implementation using LangChain and LlamaIndex with code examples.

LLMLangChainRAG
0 likes · 15 min read
How RAG Architecture Evolves: From Simple Chains to Flexible RAG Flows
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 27, 2023 · Artificial Intelligence

How OpenSearch Supercharges Vector Search for Large‑Model Applications

This article explains how Alibaba Cloud OpenSearch leverages vector retrieval, engineering and algorithmic optimizations, heterogeneous CPU‑GPU computing, and dense‑sparse hybrid memory to deliver billion‑scale, high‑throughput search performance and enable conversational AI use cases such as intelligent Q&A and SmartArXiv.

AIOpenSearchretrieval
0 likes · 16 min read
How OpenSearch Supercharges Vector Search for Large‑Model Applications
DataFunSummit
DataFunSummit
Nov 24, 2023 · Artificial Intelligence

Cold-Start Content Recommendation Practices at Kuaishou

This article describes Kuaishou's approach to cold-start content recommendation, outlining the problems addressed, challenges in modeling sparse new videos, and solutions including graph neural networks, I2U retrieval, TDM hierarchical retrieval, bias correction, and future research directions.

Bias CorrectionGraph Neural NetworkKuaishou
0 likes · 19 min read
Cold-Start Content Recommendation Practices at Kuaishou
HomeTech
HomeTech
Sep 26, 2023 · Artificial Intelligence

Integrating Large Language Models with Search for Automotive Knowledge Retrieval

This article explores how combining traditional keyword search with large language models (LLMs) enhances understanding of user intent, builds a robust automotive knowledge base, and delivers more accurate, context‑aware answers through a multi‑stage retrieval and generation pipeline.

AIKnowledge BaseLLM
0 likes · 17 min read
Integrating Large Language Models with Search for Automotive Knowledge Retrieval
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 19, 2023 · Databases

Unlocking AI with Vector Databases: Architecture, Optimization, and Real-World Cases

This article explores how vector databases serve as the memory layer for large AI models, detailing their distributed, compute‑separated architecture, performance optimizations, hybrid vector‑scalar retrieval, and practical deployments across TikTok’s ecosystem such as image search, intelligent Q&A, and multimodal AI services.

AIKnowledge Basedistributed architecture
0 likes · 11 min read
Unlocking AI with Vector Databases: Architecture, Optimization, and Real-World Cases
21CTO
21CTO
Jun 16, 2023 · Artificial Intelligence

Why Are LLM Stacks Becoming Essential for Modern Companies?

A comprehensive look at how companies are rapidly adopting large language model APIs, retrieval techniques, and custom model strategies, revealing key statistics, emerging toolchains, and the shifting balance between closed‑source LLM services and open‑source custom stacks.

AI adoptionCustom ModelsLLM
0 likes · 8 min read
Why Are LLM Stacks Becoming Essential for Modern Companies?
Python Programming Learning Circle
Python Programming Learning Circle
Mar 27, 2023 · Artificial Intelligence

OpenAI Launches ChatGPT Plugins: Browser, Code Interpreter, Retrieval and Third‑Party Extensions

OpenAI has unveiled a suite of ChatGPT plugins—including a web‑browser, a code interpreter, a retrieval tool, and support for third‑party services—enabling the model to access up‑to‑date information, run Python code, query vector databases, and integrate external APIs, dramatically expanding its practical capabilities.

ChatGPTCode InterpreterPlugins
0 likes · 8 min read
OpenAI Launches ChatGPT Plugins: Browser, Code Interpreter, Retrieval and Third‑Party Extensions
DataFunSummit
DataFunSummit
Mar 24, 2023 · Artificial Intelligence

OpenAI Launches ChatGPT Plugin System: Features, Examples, and Safety Discussion

OpenAI announced a safety‑focused ChatGPT plugin system that connects the model to third‑party APIs for real‑time information retrieval, knowledge‑base access, and task execution, showcasing first‑party browser and code‑interpreter plugins, third‑party extensions, an open‑source retrieval plugin, and a detailed debate on security implications.

AI SafetyChatGPTCode Interpreter
0 likes · 9 min read
OpenAI Launches ChatGPT Plugin System: Features, Examples, and Safety Discussion
DataFunTalk
DataFunTalk
Jan 28, 2023 · Artificial Intelligence

Industry Search: Background, Technologies, and Real‑World Applications

This article presents a comprehensive overview of industry search, covering its background, core retrieval and ranking technologies—including sparse and dense retrieval, pre‑trained language models, tokenization, NER, adaptive multi‑task training, and re‑ranking models—followed by detailed case studies such as address analysis, family‑ID unification, emergency call handling, education photo‑search, and power‑knowledge‑base integration.

NLPaddress analysisindustry search
0 likes · 13 min read
Industry Search: Background, Technologies, and Real‑World Applications
DataFunTalk
DataFunTalk
Nov 8, 2022 · Artificial Intelligence

Retrieval-Based Dialogue System Framework for Customer Service: Architecture, Retrieval, Ranking, and Practical Applications

This article presents a comprehensive retrieval‑based dialogue system designed to assist customer‑service agents by recommending candidate replies, detailing its five‑layer architecture, metric suite, text and vector retrieval modules, ranking strategies, and real‑world deployment results across multiple business scenarios.

AIcustomer-servicedialogue system
0 likes · 34 min read
Retrieval-Based Dialogue System Framework for Customer Service: Architecture, Retrieval, Ranking, and Practical Applications
DataFunSummit
DataFunSummit
Feb 21, 2022 · Artificial Intelligence

Advances in E‑commerce Search: Embedding, Knowledge Graphs, and Retrieval Models

This article reviews recent research on e‑commerce search, covering transformer‑based complementary rankings, Alibaba's cognitive concept net and its extension, joint deep retrieval with product quantization, personalized semantic retrieval, multi‑granularity deep semantic retrieval, and graph‑attention networks for long‑tail shop search.

AIEmbeddingGraph Neural Network
0 likes · 12 min read
Advances in E‑commerce Search: Embedding, Knowledge Graphs, and Retrieval Models
DataFunTalk
DataFunTalk
Dec 13, 2021 · Artificial Intelligence

Dual Vector Foil (DVF): Decoupled Index and Model for Large‑Scale Retrieval

The article introduces the Dual Vector Foil (DVF) algorithm system, which decouples index construction from model training to enable lightweight, high‑precision large‑scale recall using arbitrary complex models, and details its two‑stage and one‑stage solutions, graph‑based retrieval implementation, performance optimizations, and experimental results.

Deep LearningRecommendation Systemsalgorithm
0 likes · 28 min read
Dual Vector Foil (DVF): Decoupled Index and Model for Large‑Scale Retrieval
DataFunTalk
DataFunTalk
Feb 15, 2021 · Artificial Intelligence

Deep Tree Matching (TDM): Evolution and Practice in Large-Scale Retrieval at Alibaba

This article explains Alibaba's Deep Tree Matching (TDM) technology, covering the challenges of large‑scale match retrieval, the progression from classic two‑stage recall to tree‑based indexing, max‑heap tree modeling, beam‑search retrieval, and the joint model‑index learning across TDM 1.0, 2.0, and 3.0, highlighting significant offline and online performance gains and future research directions.

AlibabaBeam SearchDeep Learning
0 likes · 15 min read
Deep Tree Matching (TDM): Evolution and Practice in Large-Scale Retrieval at Alibaba
DataFunTalk
DataFunTalk
Jul 8, 2020 · Artificial Intelligence

Multi‑Level Multi‑Modal Search Engine and Graph Engine for Video Content at Youku

The article presents a detailed technical overview of Youku's video search system, covering multi‑modal inputs, multi‑level element indexing, face search, cross‑level and cross‑modal retrieval, and the design and applications of a multimodal graph engine with knowledge‑graph integration.

AIKnowledge Graphface search
0 likes · 12 min read
Multi‑Level Multi‑Modal Search Engine and Graph Engine for Video Content at Youku
DataFunTalk
DataFunTalk
Aug 16, 2019 · Artificial Intelligence

Tree‑based Deep Match (TDM): Design, Implementation, and Applications in Large‑Scale Retrieval

This article presents a comprehensive overview of the Tree‑based Deep Match (TDM) algorithm, describing the evolution of retrieval technology, the limitations of traditional Match‑Rank pipelines, the design of a one‑stage tree‑indexed deep matching model, its training methodology, performance gains on public datasets, and its deployment in Alibaba’s advertising and e‑commerce platforms.

Recommendation SystemsTDMlarge scale
0 likes · 23 min read
Tree‑based Deep Match (TDM): Design, Implementation, and Applications in Large‑Scale Retrieval
Qunar Tech Salon
Qunar Tech Salon
Mar 1, 2018 · Artificial Intelligence

Open-Domain Chatbot Implementation: Retrieval and Generative Approaches

This article explains the implementation of open-domain chatbots for customer service, comparing retrieval‑based and generative seq2seq approaches, describing hybrid methods that first attempt constrained retrieval before falling back to generation, and discusses training data, keyword extraction, and performance observations.

AIChatbotGeneration
0 likes · 6 min read
Open-Domain Chatbot Implementation: Retrieval and Generative Approaches
Baidu Intelligent Testing
Baidu Intelligent Testing
Apr 28, 2016 · Operations

Testing and Evaluation Practices for Baidu Doctor Platform

This article details Baidu Doctor’s comprehensive testing and monitoring strategies, covering user experience data analysis, source data trust, online monitoring systems, log‑based automated checks, retrieval backend testing, evaluation metrics, Badcase mining, and user search habit analysis to ensure high‑quality medical O2O services.

User experiencedata analysismedical platform
0 likes · 14 min read
Testing and Evaluation Practices for Baidu Doctor Platform
Qunar Tech Salon
Qunar Tech Salon
Feb 20, 2016 · Artificial Intelligence

Mobile Image Search: Algorithm Framework and Implementation at Paizhi Tao

Mobile image search has become a critical user demand, and since its 2014 launch, Alibaba’s Paizhi Tao has evolved through multiple iterations to a robust AI-driven pipeline comprising category prediction, object detection, deep and local image feature extraction, scalable retrieval indexing, and relevance-based ranking.

Deep LearningMobile AIimage search
0 likes · 6 min read
Mobile Image Search: Algorithm Framework and Implementation at Paizhi Tao
21CTO
21CTO
Jan 29, 2016 · Artificial Intelligence

How Mobile Image Search Powers Real-Time Shopping: Inside Pailitao’s AI Algorithm

Mobile visual search, a long‑standing dream, has evolved from early research to a production‑grade system at Pailitao, where a five‑module AI pipeline—category prediction, object detection, feature extraction, indexing, and ranking—enables billions of images to be searched instantly on mobile devices.

Computer VisionDeep LearningMobile AI
0 likes · 8 min read
How Mobile Image Search Powers Real-Time Shopping: Inside Pailitao’s AI Algorithm