Tagged articles

Embedding

255 articles · Page 1 of 3

Jun 25, 2026 · Artificial Intelligence

Why Rerank Is Essential: From 100 Retrieved Docs to the 5 Correct Answers in RAG

Even with a perfectly populated vector database, a RAG pipeline often returns irrelevant answers because the initial Bi‑encoder retrieval only narrows the pool to about 100 candidates, and without a Cross‑encoder rerank step the truly correct document—often buried around rank 37—never reaches the LLM for answering.

Bi-EncoderCross-EncoderEmbedding

0 likes · 9 min read

Why Rerank Is Essential: From 100 Retrieved Docs to the 5 Correct Answers in RAG

DataFunSummit

Jun 25, 2026 · Cloud Computing

From Text to Images: Building Multi‑Modal Product Search with Elasticsearch Serverless

The article walks through the evolution of e‑commerce search from simple keyword matching to multi‑modal retrieval, explains a generic architecture that fuses text and image embeddings, details core techniques such as dense, sparse and hybrid models, vector similarity metrics, quantization methods like SQ and BBQ, and demonstrates how Elasticsearch Serverless provides a server‑less, cost‑effective platform to implement the end‑to‑end solution.

AIElasticsearchEmbedding

0 likes · 21 min read

From Text to Images: Building Multi‑Modal Product Search with Elasticsearch Serverless

Lisa Notes

Jun 25, 2026 · Artificial Intelligence

NLP Study Notes: How Word Vectors Capture Meaning

This article explains the evolution of natural language processing, introduces transformer‑based large models such as BERT, GPT and T5, and details how words are represented through one‑hot vectors and dense word embeddings, illustrating their training and analogy capabilities.

CBOWEmbeddingNLP

0 likes · 7 min read

NLP Study Notes: How Word Vectors Capture Meaning

IT Services Circle

Jun 20, 2026 · Artificial Intelligence

How I Doubled RAG Accuracy with These Optimizations

This article walks through a complete RAG pipeline, identifying common pitfalls from document preprocessing to prompt construction, and provides concrete Python and Java examples, chunking strategies, embedding tweaks, hybrid retrieval, reranking, advanced techniques, and evaluation methods to reliably double retrieval accuracy.

EmbeddingJavaPrompt Engineering

0 likes · 35 min read

How I Doubled RAG Accuracy with These Optimizations

DataFunSummit

Jun 18, 2026 · Artificial Intelligence

From Text to Images: Building Multimodal Product Search with Elasticsearch Serverless

This article examines the evolution of e‑commerce search from simple keyword matching to multimodal, cross‑modal retrieval, explains the core embedding and vector‑search technologies, compares dense, sparse and hybrid models, and demonstrates how Elasticsearch Serverless and Alibaba Cloud AI Search Platform enable a low‑cost, serverless, high‑performance end‑to‑end multimodal product search solution.

AI search platformBBQElasticsearch Serverless

0 likes · 21 min read

Su San Talks Tech

Jun 15, 2026 · Artificial Intelligence

How I Doubled RAG Accuracy with Targeted Optimizations

This article walks through a comprehensive, step‑by‑step analysis of why RAG pipelines often underperform and presents concrete optimizations—including OCR preprocessing, table extraction, metadata enrichment, recursive chunking, embedding fine‑tuning, hybrid vector‑keyword retrieval, reranking, prompt templates, and a production‑grade Java implementation—backed by code snippets, benchmark figures, and evaluation metrics.

ChunkingEmbeddingHybrid Retrieval

0 likes · 36 min read

How I Doubled RAG Accuracy with Targeted Optimizations

Java Architect Handbook

Jun 5, 2026 · Artificial Intelligence

What Is Embedding in RAG and Why Does It Use 1536 Dimensions?

The article explains that embedding converts text into a 1536‑dimensional floating‑point vector that serves as a semantic fingerprint, describes how the vector is generated, why 1536 dimensions are chosen, how similarity is measured, and provides Java Spring AI code examples along with model‑selection guidance and common interview pitfalls.

DimensionEmbeddingOpenAI

0 likes · 16 min read

What Is Embedding in RAG and Why Does It Use 1536 Dimensions?

Old Zhang's AI Learning

May 30, 2026 · Artificial Intelligence

vLLM Semantic Router Deep Dive: Engineering Multimodal Routing and Bug Fixes

The article details the vLLM Semantic Router's Signal-Decision architecture, explores multimodal routing challenges, uncovers an 82% visual signal reversal issue, and walks through three layered bug fixes that restore cosine similarity above 0.999 across extensive tests.

Bug FixEmbeddingMultimodal

0 likes · 13 min read

vLLM Semantic Router Deep Dive: Engineering Multimodal Routing and Bug Fixes

The Dominant Programmer

May 28, 2026 · Artificial Intelligence

Spring AI RAG: Concepts, Hands‑On Implementation, and Full Code

This article explains the limitations of large language models, introduces Retrieval‑Augmented Generation (RAG) and its four‑step workflow, details Spring AI's RAG components and vector‑store options, and provides complete, runnable Java code—including Maven, configuration, and service classes—to build a local knowledge‑base Q&A system.

EmbeddingJavaOllama

0 likes · 18 min read

Spring AI RAG: Concepts, Hands‑On Implementation, and Full Code

DataFunSummit

May 27, 2026 · Artificial Intelligence

From Text to Images: Building Multi‑Modal Product Search with Elasticsearch Serverless

This article walks through a complete multi‑modal product search solution that transforms textual and visual product data into embeddings, leverages dense, sparse and hybrid models, applies vector similarity and quantization techniques such as SQ and BBQ, and demonstrates how Elasticsearch Serverless provides a serverless, cost‑effective, auto‑scaling backbone for end‑to‑end retrieval.

AI Search Open PlatformElasticsearch ServerlessEmbedding

0 likes · 22 min read

JD Retail Technology

May 25, 2026 · Artificial Intelligence

How Adaptive Semantic IDs Enable Precise and Generalizable Generative Retrieval

The article introduces the SA²CRQ framework, which adaptively allocates semantic ID length and transfers residual knowledge to resolve head‑item ID collisions and tail‑item generalization gaps in large‑scale e‑commerce generative retrieval, achieving stable gains on both industrial and public datasets.

Adaptive QuantizationE-commerce SearchEmbedding

0 likes · 19 min read

How Adaptive Semantic IDs Enable Precise and Generalizable Generative Retrieval

SuanNi

May 23, 2026 · Artificial Intelligence

Deploy the Open-Source ChatLaw Legal LLM on the SuanWang Platform

This article introduces ChatLaw, an open‑source legal large language model trained on 936,727 real cases, explains its high‑dimensional embedding ChatLaw‑Text2Vec for fast knowledge alignment, and provides a step‑by‑step guide to deploy it on the SuanWang cloud platform using Python and MLU resources.

ChatLawEmbeddingLLM

0 likes · 3 min read

Deploy the Open-Source ChatLaw Legal LLM on the SuanWang Platform

Architect's Ambition

May 18, 2026 · Artificial Intelligence

Building Enterprise Private Knowledge Bases: End-to-End Crawl, Clean, and RAG Pipeline

The article outlines a complete six‑stage workflow for constructing enterprise‑grade private knowledge bases—starting with targeted web‑crawling and API ingestion, through data cleaning, chunking, embedding generation, vector storage, and finally multi‑stage RAG retrieval optimization—highlighting why early stages set the performance ceiling and offering practical tips from real‑world projects.

AI AgentChunkingEmbedding

0 likes · 10 min read

Building Enterprise Private Knowledge Bases: End-to-End Crawl, Clean, and RAG Pipeline

DataFunSummit

May 15, 2026 · Artificial Intelligence

From Text to Images: Building Multimodal Product Search with Elasticsearch Serverless

The article analyzes the shift from keyword‑based to multimodal e‑commerce search, outlines a generic architecture that combines text and image embedding with vector retrieval, and demonstrates how Elasticsearch Serverless and Alibaba Cloud AI Search platform enable a low‑cost, scalable, and high‑performance product search solution.

AI SearchElasticsearchEmbedding

0 likes · 20 min read

DataFunSummit

May 12, 2026 · Artificial Intelligence

From Text to Images: Building Multimodal Product Search with Elasticsearch Serverless

This article presents a comprehensive, end‑to‑end solution for multimodal product search, detailing how embedding, vector retrieval, and Elasticsearch Serverless combine to enable text, image, and natural‑language queries with high relevance and low operational overhead.

ElasticsearchEmbeddingQuantization

0 likes · 19 min read

AI Architect Hub

May 10, 2026 · Artificial Intelligence

RAG Series Recap: From Chunking to Prompt – A Complete Technical Roadmap

This article systematically reviews the nine‑stage RAG pipeline—from data cleaning and text chunking through embedding, vector indexing, retrieval, reranking, and finally prompt assembly—highlighting core concepts, practical code snippets, common pitfalls, and optimization tips for building production‑grade systems.

AIEmbeddingLLM

0 likes · 22 min read

RAG Series Recap: From Chunking to Prompt – A Complete Technical Roadmap

Old Zhang's AI Learning

May 9, 2026 · Artificial Intelligence

Why Gemini’s Multimodal RAG with File Search Is So Compelling

The article analyzes Google Gemini’s File Search tool as a fully managed multimodal RAG solution, detailing its architecture, key features, pricing model, step‑by‑step usage, strengths, limitations, and how it compares with OpenAI Assistants File Search and Vertex AI Search.

AI RetrievalEmbeddingFile Search

0 likes · 14 min read

Why Gemini’s Multimodal RAG with File Search Is So Compelling

DataFunSummit

May 7, 2026 · Artificial Intelligence

From Text to Images: Building Multimodal Product Search with Elasticsearch Serverless

This article walks through a complete multimodal product search solution, explaining how embedding and vector retrieval technologies—combined with Elasticsearch Serverless and Alibaba Cloud AI Search—enable image‑based and semantic queries, detailing the architecture, key algorithms, quantization tricks, and practical deployment steps.

AI SearchElasticsearchEmbedding

0 likes · 22 min read

Alibaba Cloud Big Data AI Platform

May 1, 2026 · Artificial Intelligence

Zero Deployment, Zero Ops: Alibaba Cloud Milvus Embedding Service Makes Vectorization Plug‑and‑Play

The article explains how Alibaba Cloud's Milvus Embedding Service eliminates the need for self‑hosted embedding models by integrating model inference, vector generation and Milvus indexing into a managed pipeline, dramatically reducing deployment complexity, operational overhead, and time‑to‑value for semantic search, RAG and multimodal retrieval use cases.

Alibaba CloudEmbeddingMilvus

0 likes · 19 min read

Zero Deployment, Zero Ops: Alibaba Cloud Milvus Embedding Service Makes Vectorization Plug‑and‑Play

DeepHub IMBA

Apr 30, 2026 · Artificial Intelligence

Why Real RAG Systems Need Both BM25 and Vector Search

The article analyzes how BM25 excels at exact token matching while vector embeddings capture semantic intent, explains their distinct failure modes, and shows that a hybrid retriever—combined with metadata filtering, proper chunking, and reciprocal rank fusion—delivers the most reliable results for RAG pipelines.

BM25EmbeddingHybrid Retrieval

0 likes · 17 min read

Why Real RAG Systems Need Both BM25 and Vector Search

MaGe Linux Operations

Apr 28, 2026 · Artificial Intelligence

Why Your RAG Performance Is Poor: Common Issues and Optimization Strategies

This article systematically analyzes why Retrieval‑Augmented Generation pipelines often underperform—covering embedding model selection, chunking strategies, hybrid retrieval, reranking, context window waste, evaluation metrics, and a detailed troubleshooting checklist—while providing concrete code examples and best‑practice recommendations for engineers.

ChunkingEmbeddingEvaluation

0 likes · 19 min read

Why Your RAG Performance Is Poor: Common Issues and Optimization Strategies

AI Illustrated Series

Apr 27, 2026 · Artificial Intelligence

Comprehensive RAG Interview Q&A: 22 In-Depth Questions and Answers

This extensive interview guide covers 22 core RAG questions, detailing the definition, workflow, embedding selection, vector database choices, retrieval optimization, multi‑turn handling, context compression, evaluation metrics, knowledge‑graph integration, operational challenges, Agentic and hybrid RAG, document update strategies, similarity algorithms, and hallucination mitigation, providing concrete examples and practical advice for AI interview preparation.

AI interviewEmbeddingRAG

0 likes · 29 min read

Comprehensive RAG Interview Q&A: 22 In-Depth Questions and Answers

Wu Shixiong's Large Model Academy

Apr 27, 2026 · Artificial Intelligence

Can Your RAG Pass the Demo? Scaling to 5,000 Docs for Reliable Answers

The article walks through the practical challenges of turning a RAG demo into a production system for 5,000 insurance documents, covering knowledge‑base chunking, embedding model selection, recall‑threshold tuning, hybrid vector‑BM25 retrieval, intent‑aware query routing, prompt constraints, confidence scoring, and operational scaling, with concrete metrics and code examples.

EmbeddingHybrid RetrievalIntent Recognition

0 likes · 16 min read

Can Your RAG Pass the Demo? Scaling to 5,000 Docs for Reliable Answers

The Dominant Programmer

Apr 27, 2026 · Artificial Intelligence

Building a Private Document Vector Search with SpringBoot, LangChain4j, and Ollama RAG

This guide walks through why Retrieval‑Augmented Generation (RAG) is needed for large language models, explains the three‑step indexing and query workflow, details LangChain4j’s core components, and provides a complete SpringBoot example—including Maven setup, configuration, service code, and troubleshooting—to create a private document‑vector search system powered by Ollama.

EmbeddingLangChain4jOllama

0 likes · 13 min read

Building a Private Document Vector Search with SpringBoot, LangChain4j, and Ollama RAG

AI Engineer Programming

Apr 26, 2026 · Artificial Intelligence

From Bag‑of‑Words to Semantics: How Embeddings Turn Meaning into Numbers (Part 2)

The article explains how embedding techniques encode semantic information into numeric vectors, covering Word2Vec and GloVe fundamentals, BERT anisotropy, SimCSE contrastive learning, alignment and uniformity metrics, ANN index structures such as HNSW, IVF and PQ, Matryoshka representation learning, practical deployment challenges, and evaluation best practices.

ANNBERTEmbedding

0 likes · 23 min read

From Bag‑of‑Words to Semantics: How Embeddings Turn Meaning into Numbers (Part 2)

AI Architect Hub

Apr 26, 2026 · Artificial Intelligence

Embedding Explained: How Vectorization Turns Text into Numbers for RAG

This article walks through why traditional keyword matching fails for RAG, explains the evolution from one‑hot encoding to Word2Vec and BERT, details sentence‑level embeddings and similarity metrics, compares leading Chinese and multilingual embedding models using the C‑MTEB benchmark, and provides practical LangChain code, deployment tips, and common pitfalls.

Chinese NLPEmbeddingLangChain

0 likes · 18 min read

Embedding Explained: How Vectorization Turns Text into Numbers for RAG

AI Illustrated Series

Apr 25, 2026 · Artificial Intelligence

How AI Agents Remember Everything: A Deep Dive into Memory System Design

The article explains why large language models lack persistent memory, introduces a three‑layer memory architecture for AI agents—sensory, working, and long‑term memory—and details how vector databases, embedding models, and retrieval strategies enable cross‑session knowledge retention and personalized assistance.

AI AgentEmbeddingMemory Architecture

0 likes · 24 min read

How AI Agents Remember Everything: A Deep Dive into Memory System Design

AI Architect Hub

Apr 24, 2026 · Artificial Intelligence

RAG Level 1: Avoid Dirty Data Poisoning Your AI – A Data Cleaning Guide

This article explains why noisy documents cripple Retrieval‑Augmented Generation, enumerates common garbage data types, describes three typical data‑quality problems, warns against over‑cleaning, encoding, and regex pitfalls, and provides a configurable LangChain pipeline with deduplication and validation best practices.

AIDeduplicationEmbedding

0 likes · 21 min read

RAG Level 1: Avoid Dirty Data Poisoning Your AI – A Data Cleaning Guide

MaGe Linux Operations

Apr 22, 2026 · Artificial Intelligence

5 Essential Design Principles for Building High‑Quality RAG Systems

This article outlines five critical design principles for constructing high‑quality Retrieval‑Augmented Generation (RAG) systems, covering document chunking strategies, embedding model selection, hybrid retrieval architectures, metadata filtering with multi‑level indexes, and reranking mechanisms, and provides concrete code snippets and evaluation metrics.

EmbeddingEvaluationHybrid Retrieval

0 likes · 17 min read

5 Essential Design Principles for Building High‑Quality RAG Systems

Architecture Digest

Apr 22, 2026 · Artificial Intelligence

Why RAG Is Anything But Simple: A Full Production‑Level Technical Breakdown

The article dissects every stage of a production‑grade Retrieval‑Augmented Generation pipeline—from document parsing and chunking, through embedding selection and vector indexing, to query rewriting, multi‑retrieval fusion, re‑ranking, context optimization, hallucination control, evaluation metrics, and the decision between RAG and fine‑tuning—showing why each link is a critical engineering challenge.

EmbeddingHallucinationMitigationLLM

0 likes · 14 min read

Why RAG Is Anything But Simple: A Full Production‑Level Technical Breakdown

Programmer XiaoFu

Apr 20, 2026 · Artificial Intelligence

How Java + LangChain4j Can Eliminate Messy Chunking for High‑Quality RAG Document Splitting

The article explains why fixed‑size chunking harms RAG recall, demonstrates three semantic‑chunking strategies—including recursive punctuation splitting, overlapping windows, and parent‑child document mapping—and provides complete Java/LangChain4j code that integrates tokenizers, Redis, and Qdrant to boost retrieval performance.

EmbeddingJavaLangChain4j

0 likes · 10 min read

How Java + LangChain4j Can Eliminate Messy Chunking for High‑Quality RAG Document Splitting

Linyb Geek Road

Apr 20, 2026 · Artificial Intelligence

How to Choose the Right Embedding Model for RAG Architectures

This article explains why embedding models are the foundation of Retrieval‑Augmented Generation, outlines five evaluation dimensions, compares leading open‑source and commercial models, provides a decision tree, practical validation steps, common pitfalls, and future trends to help developers select the most suitable embedding model for their RAG system.

EmbeddingHybrid SearchMTEB

0 likes · 10 min read

How to Choose the Right Embedding Model for RAG Architectures

DataFunSummit

Apr 19, 2026 · Artificial Intelligence

How to Build a Multimodal Product Search Engine with Embedding and Vector Retrieval on Elasticsearch Serverless

This article explains a complete multimodal product search solution that combines text and image embeddings, dense, sparse, and hybrid models, vector similarity metrics, and Elasticsearch Serverless features such as dense_vector, sparse_vector, hybrid search, quantization, and RRF ranking to achieve fast, accurate, and cost‑effective retrieval.

AIElasticsearchEmbedding

0 likes · 20 min read

How to Build a Multimodal Product Search Engine with Embedding and Vector Retrieval on Elasticsearch Serverless

Big Data and Microservices

Apr 17, 2026 · Industry Insights

What Is a Vector Database? Features, Indexing, and Top Open‑Source Options

This article explains what a vector database is, how it stores and retrieves high‑dimensional vector data, outlines its key characteristics and indexing mechanisms, compares it with traditional databases, and reviews common open‑source vector database solutions such as Milvus, Faiss, Weaviate, PgVector, Chroma, LanceDB, Elasticsearch and Qdrant.

AIEmbeddingIndexing

0 likes · 14 min read

What Is a Vector Database? Features, Indexing, and Top Open‑Source Options

Zhuanzhuan Tech

Apr 15, 2026 · Artificial Intelligence

Boosting Bag Item Identification with Metric Learning: A ZhiZhuan Case Study

ZhiZhuan’s in‑house “photo‑to‑SKU” system tackles large‑scale bag identification by combining dual‑stage object detection, metric‑learning‑based embedding training, and a hybrid vector‑plus‑scalar retrieval pipeline, achieving superior top‑K accuracy over third‑party solutions while addressing fine‑grained visual nuances and long‑tail SKU coverage.

Embeddingbag identificationdeep learning

0 likes · 16 min read

Boosting Bag Item Identification with Metric Learning: A ZhiZhuan Case Study

IT Services Circle

Apr 14, 2026 · Artificial Intelligence

What Is RAG? A Complete Guide to Retrieval‑Augmented Generation for AI Engineers

This article explains Retrieval‑Augmented Generation (RAG), covering why large language models need external knowledge, the full offline‑and‑online workflow, document chunking, embedding evolution, vector database choices, multi‑path retrieval, evaluation metrics, hallucination types, and practical strategies to mitigate them.

AI evaluationEmbeddingRAG

0 likes · 55 min read

What Is RAG? A Complete Guide to Retrieval‑Augmented Generation for AI Engineers

James' Growth Diary

Apr 12, 2026 · Artificial Intelligence

Build a Complete Private Knowledge Base with RAG: A Hands‑On Guide

This article walks through a complete, production‑ready Retrieval‑Augmented Generation pipeline that lets AI answer a company’s private documents, covering chunking strategies, embedding model choices, vector‑database selection, retrieval methods, full LangChain chain assembly, and common pitfalls to avoid.

EmbeddingLangChainPromptEngineering

0 likes · 18 min read

Build a Complete Private Knowledge Base with RAG: A Hands‑On Guide

dbaplus Community

Apr 12, 2026 · Artificial Intelligence

Boost RAG Accuracy to 94%: 11 Proven Strategies and How to Combine Them

After struggling with naive RAG that delivered only 60% accuracy, the author outlines eleven advanced strategies—including context-aware chunking, query expansion, re‑ranking, multi‑query, knowledge graphs, and agent‑based retrieval—that together raise performance to 94%, and provides detailed implementation examples, trade‑offs, and a step‑by‑step deployment roadmap.

AIEmbeddingKnowledge Graph

0 likes · 32 min read

Boost RAG Accuracy to 94%: 11 Proven Strategies and How to Combine Them

Mingyi World Elasticsearch

Apr 10, 2026 · Artificial Intelligence

Easysearch vs Elasticsearch Vector Search: Compatibility Explained in One Guide

The article compares Easysearch and Elasticsearch vector‑search capabilities, showing that both support vector queries but use different field types and DSL structures, and it outlines migration pitfalls and practical advice for choosing the right system.

API compatibilityEasysearchElasticsearch

0 likes · 7 min read

Easysearch vs Elasticsearch Vector Search: Compatibility Explained in One Guide

DeepHub IMBA

Apr 8, 2026 · Artificial Intelligence

Choosing a Vector Database: Pinecone for Production, Chroma for Prototyping, Weaviate for Hybrid Search

This article compares three popular vector databases—Pinecone, Chroma, and Weaviate—explaining how they store embeddings for RAG systems, showing Python setup code, and outlining each solution's architecture, scaling limits, cost considerations, and ideal use cases.

ChromaEmbeddingHybrid Search

0 likes · 7 min read

Choosing a Vector Database: Pinecone for Production, Chroma for Prototyping, Weaviate for Hybrid Search

DeepHub IMBA

Apr 3, 2026 · Artificial Intelligence

Multi‑Aspect Embedding: Integrating Context Signals into Vector Similarity Search

The article analyzes how traditional vector database pipelines use external filters for context constraints and proposes the Aspect Database’s multi‑aspect embedding approach, which encodes contextual attributes directly into similarity vectors to enable unified, context‑aware retrieval for AI systems.

AI SystemsANN searchEmbedding

0 likes · 9 min read

Multi‑Aspect Embedding: Integrating Context Signals into Vector Similarity Search

AndroidPub

Apr 2, 2026 · Artificial Intelligence

How to Build Offline, Privacy‑First AI with On‑Device Retrieval‑Augmented Generation

This article explains how to implement on‑device Retrieval‑Augmented Generation (RAG) for large language models, covering embedding, vector indexing, model selection, quantization, data chunking, incremental updates, hybrid search, and agentic RAG to deliver fast, private, and personalized AI experiences on mobile devices.

EmbeddingLLMRAG

0 likes · 18 min read

How to Build Offline, Privacy‑First AI with On‑Device Retrieval‑Augmented Generation

DataFunSummit

Mar 29, 2026 · Artificial Intelligence

How to Build a Multimodal Product Search Engine with Embedding and Vector Retrieval on Elasticsearch Serverless

This article explores the evolution of e‑commerce search toward multimodal and cross‑modal capabilities, outlines a generic architecture that combines text and image processing via embedding and vector retrieval, and demonstrates how to implement the solution using Alibaba Cloud's AI Search Open Platform and Elasticsearch Serverless with detailed guidance on models, similarity metrics, quantization, and performance optimization.

AIElasticsearchEmbedding

0 likes · 22 min read

AgentGuide

Mar 25, 2026 · Artificial Intelligence

What Is Retrieval‑Augmented Generation (RAG) and Why Must Large Models Look Up Information First?

Retrieval‑Augmented Generation (RAG) lets large language models first fetch relevant documents and then generate answers, addressing the inability of models to answer private or domain‑specific queries by precisely feeding them the most pertinent knowledge.

EmbeddingRAGlarge language models

0 likes · 5 min read

What Is Retrieval‑Augmented Generation (RAG) and Why Must Large Models Look Up Information First?

DataFunSummit

Mar 24, 2026 · Artificial Intelligence

How to Build a Multimodal Product Search System with Embedding and Vector Retrieval

This article presents a comprehensive, end‑to‑end solution for multimodal product search, detailing the evolution from keyword to image‑based queries, the core embedding and vector retrieval technologies, practical Elasticsearch Serverless integration, quantization methods, and a complete demo workflow for building a high‑performance, low‑cost search platform.

AI search platformElasticsearchEmbedding

0 likes · 21 min read

How to Build a Multimodal Product Search System with Embedding and Vector Retrieval

Full-Stack Cultivation Path

Mar 23, 2026 · Artificial Intelligence

What Exactly Is a Token in LLMs? A First‑Principles Explanation

The article explains that a token is the smallest discrete text unit a large language model processes, detailing why tokenization is essential, how tokenizers work, how tokens flow through the transformer, and how token counts affect context windows, cost, latency, and overall model behavior.

EmbeddingLLMTokenization

0 likes · 20 min read

What Exactly Is a Token in LLMs? A First‑Principles Explanation

Wu Shixiong's Large Model Academy

Mar 15, 2026 · Artificial Intelligence

Choosing the Right Embedding and Rerank Models for RAG (Interview‑Ready Guide)

This article explains the role of embedding models in Retrieval‑Augmented Generation, compares the most popular 2024‑2025 open‑source embeddings and rerankers, offers concrete selection rules, shows how to read the MTEB leaderboard, and provides a structured answer framework for interviewers.

AIEmbeddingMTEB

0 likes · 13 min read

Choosing the Right Embedding and Rerank Models for RAG (Interview‑Ready Guide)

Data STUDIO

Mar 9, 2026 · Artificial Intelligence

Boost RAG Accuracy from 60% to 94% with 11 Proven Strategies

This article dissects why naive Retrieval‑Augmented Generation (RAG) often yields only 60% accuracy, then presents eleven concrete ingestion, query, and hybrid techniques—complete with code samples, performance trade‑offs, and real‑world case studies—that together can raise RAG accuracy to 94% while outlining practical implementation roadmaps and common pitfalls.

EmbeddingKnowledge GraphLLM

0 likes · 31 min read

Boost RAG Accuracy from 60% to 94% with 11 Proven Strategies

Open Source Tech Hub

Mar 4, 2026 · Artificial Intelligence

Boost PHP AI Performance with MemVector: In‑Process Vector Search & Reranking

MemVector is a high‑performance PHP extension that brings native AI capabilities—embedding, vector similarity search, and cross‑encoder reranking—directly into the PHP process, eliminating external services and delivering sub‑10 ms latency for full RAG pipelines on commodity CPUs.

AIEmbeddingExtension

0 likes · 13 min read

Boost PHP AI Performance with MemVector: In‑Process Vector Search & Reranking

Data STUDIO

Feb 22, 2026 · Artificial Intelligence

Building AI Agents with LangGraph: Implementing RAG and Long‑Term Memory

This tutorial walks through adding Retrieval‑Augmented Generation (RAG) and persistent long‑term memory to a LangGraph AI agent, covering concepts, step‑by‑step code for document loading, vector store creation, prompt engineering, memory management, and best‑practice pitfalls.

AI AgentEmbeddingLangChain

0 likes · 16 min read

Building AI Agents with LangGraph: Implementing RAG and Long‑Term Memory

AI Tech Publishing

Feb 19, 2026 · Artificial Intelligence

Add Long-Term Memory to Your Agent with Lightweight RAG (Lesson 5)

This tutorial shows how to equip an AI agent with long‑term memory using Retrieval‑Augmented Generation (RAG), covering the concepts of vector embeddings, FAISS indexing, building and querying a knowledge base, and providing complete Python code examples.

AgentEmbeddingFAISS

0 likes · 13 min read

Add Long-Term Memory to Your Agent with Lightweight RAG (Lesson 5)

Tech Musings

Feb 10, 2026 · Backend Development

How to Build a Hybrid Vector‑+‑Text Search with Redis 8 (No GPU Required)

This article walks through the complete setup of a hybrid retrieval pipeline on two CPU‑only Linux servers using Redis 8, Qwen‑3‑Embedding vectors, and RediSearch to combine BM25 keyword scores with cosine‑based vector similarity, showing environment details, index creation, data ingestion, the hybrid_search function implementation, result normalization, and a common pitfall of forgetting to set the query language to Chinese.

EmbeddingHybrid SearchPython

0 likes · 23 min read

How to Build a Hybrid Vector‑+‑Text Search with Redis 8 (No GPU Required)

Tech Musings

Jan 29, 2026 · Artificial Intelligence

Running Qwen3‑Embedding on CPU‑Only Machines and Storing Vectors in Redis 8

This guide explains how to run the Qwen3‑Embedding‑0.6B model on a CPU‑only server, configure key parameters, optionally use Intel Extension for PyTorch, and efficiently store the resulting vectors in Redis 8 with proper serialization and indexing.

CPUEmbeddingPython

0 likes · 8 min read

Running Qwen3‑Embedding on CPU‑Only Machines and Storing Vectors in Redis 8

Tech Musings

Jan 28, 2026 · Databases

Building a CPU‑Only Poetry Retrieval Engine with Qwen Embeddings and Redis Vector Search

This article details a lightweight, CPU‑only knowledge‑base retrieval experiment that uses Qwen3‑Embedding‑0.6B to vectorize Chinese poetry, stores vectors in Redis with HNSW indexing, and implements a hybrid keyword‑plus‑vector search pipeline with configurable weighting and performance optimizations.

CPUEmbeddingKnowledge Base

0 likes · 11 min read

Building a CPU‑Only Poetry Retrieval Engine with Qwen Embeddings and Redis Vector Search

Data Party THU

Jan 27, 2026 · Artificial Intelligence

Which Retrieval Embedding Loss Works Best? Comparing Pairwise Cosine, Triplet Margin, and InfoNCE

Even as Agentic RAG evolves, the quality of the underlying retrieval embedding model remains crucial, and this article compares three training losses—pairwise cosine embedding, triplet margin, and InfoNCE—detailing their inputs, formulas, and practical trade‑offs.

AIEmbeddingInfoNCE

0 likes · 4 min read

Which Retrieval Embedding Loss Works Best? Comparing Pairwise Cosine, Triplet Margin, and InfoNCE

PaperAgent

Jan 17, 2026 · Artificial Intelligence

How Qwen3‑VL Embedding and Reranker Set New SOTA in Multimodal Retrieval

The article analyzes the Qwen3‑VL‑Embedding and Qwen3‑VL‑Reranker models, detailing their unified vector space, multi‑stage training pipeline, Matryoshka representation learning, quantization techniques, massive synthetic data generation, and benchmark results that push multimodal retrieval performance to a new state‑of‑the‑art.

EmbeddingLarge Language ModelMultimodal AI

0 likes · 7 min read

How Qwen3‑VL Embedding and Reranker Set New SOTA in Multimodal Retrieval

Architect

Dec 25, 2025 · Artificial Intelligence

How GraphRAG Boosts Retrieval Accuracy with Knowledge Graphs – A Complete Guide

This article explains why traditional RAG suffers from hallucinations, introduces GraphRAG’s knowledge‑graph‑based approach, walks through its indexing and query pipelines—including text splitting, entity‑relation extraction, graph construction, community detection, and local vs. global retrieval—provides practical setup commands, Neo4j visualization steps, and compares its performance with classic RAG.

EmbeddingGraphRAGKnowledge Graph

0 likes · 27 min read

How GraphRAG Boosts Retrieval Accuracy with Knowledge Graphs – A Complete Guide

AI Architecture Hub

Dec 24, 2025 · Artificial Intelligence

From LLMs to Autonomous Agents: The Three Evolution Stages of AI

This article explains the three evolutionary stages of AI—from large language models that generate text, through workflow‑enhanced systems using retrieval‑augmented generation, to fully autonomous agents capable of self‑directed decision‑making—while detailing the four core technologies that power each stage.

AI evolutionAgentEmbedding

0 likes · 9 min read

From LLMs to Autonomous Agents: The Three Evolution Stages of AI

JakartaEE China Community

Dec 16, 2025 · Artificial Intelligence

Build a Retrieval‑Augmented Generation (RAG) System with Langchain4j and Ollama 3

This guide walks through the importance of Retrieval‑Augmented Generation, outlines the core Langchain4j and Ollama 3 components, and provides a complete Java example—including Maven setup, document ingestion, embedding creation, similarity search, prompt construction, and response generation—to demonstrate a functional RAG pipeline.

EmbeddingJavaLLM

0 likes · 9 min read

Build a Retrieval‑Augmented Generation (RAG) System with Langchain4j and Ollama 3

Architect

Dec 15, 2025 · Artificial Intelligence

Demystifying LLM Architecture: From Transformers to Modern MoE Designs

This comprehensive guide explains the fundamentals of large language model (LLM) architectures, covering the original Transformer, tokenization, embeddings, positional encoding, attention mechanisms, feed‑forward networks, layer stacking, a step‑by‑step translation example, and the latest open‑source and hybrid LLM designs shaping the field.

EmbeddingLLMMoE

0 likes · 41 min read

Architect

Dec 14, 2025 · Artificial Intelligence

Mastering RAG with Spring AI: Build a Retrieval‑Augmented Generation System from Scratch

This article explains the background, principles, and step‑by‑step implementation of Retrieval‑Augmented Generation (RAG) using Spring AI, covering embedding models, vector databases, chunking strategies, indexing algorithms, similarity metrics, re‑ranking, prompt templates, and a complete Java code example.

EmbeddingJavaRAG

0 likes · 32 min read

Mastering RAG with Spring AI: Build a Retrieval‑Augmented Generation System from Scratch

Tencent Technical Engineering

Dec 3, 2025 · Artificial Intelligence

Why Transformers Power Modern LLMs: A Deep Dive into Architecture and Mechanics

This article provides a comprehensive, step‑by‑step explanation of the Transformer architecture that underpins large language models, covering tokenization, embeddings, positional encoding, attention mechanisms, feed‑forward networks, layer stacking, a detailed translation example, visualized attention weights, and a survey of recent open‑source LLM designs such as DeepSeek V3, OLMo 2, and Gemma 3.

EmbeddingLLMNeural Network

0 likes · 38 min read

Why Transformers Power Modern LLMs: A Deep Dive into Architecture and Mechanics

JakartaEE China Community

Nov 18, 2025 · Artificial Intelligence

How to Build a Retrieval‑Augmented Generation (RAG) System with Langchain4j and Ollama 3

This article explains why Retrieval‑Augmented Generation improves LLM accuracy, outlines the key Langchain4j and Ollama3 components, and provides a step‑by‑step Java example—including Maven setup, document ingestion, embedding, similarity search, prompt creation, and response generation—to demonstrate a functional RAG pipeline.

EmbeddingJavaLLM

0 likes · 8 min read

How to Build a Retrieval‑Augmented Generation (RAG) System with Langchain4j and Ollama 3

Wu Shixiong's Large Model Academy

Nov 16, 2025 · Artificial Intelligence

How to Slash RAG First‑Token Latency: Practical Engineering Strategies

This guide breaks down the three layers of a RAG pipeline—embedding, vector retrieval, and system architecture—and provides concrete engineering tactics such as batch embedding, async concurrency, caching, ANN indexing, partitioning, connection pooling, and async pipelines to dramatically reduce Time‑to‑First‑Token latency.

Async PipelineEmbeddingRAG

0 likes · 10 min read

How to Slash RAG First‑Token Latency: Practical Engineering Strategies

Zhihu Tech Column

Nov 4, 2025 · Artificial Intelligence

How Multimodal Large Models Transform Recommendation Systems: From Tags to Embeddings

This article explores how multimodal large models like Qwen2.5‑VL enable high‑dimensional tag generation and universal embeddings for recommendation systems, detailing data synthesis, model training, quantization, fine‑tuning, and the resulting improvements in click‑through rate and exposure interaction.

EmbeddingMultimodal AIRecommendation Systems

0 likes · 17 min read

How Multimodal Large Models Transform Recommendation Systems: From Tags to Embeddings

DeWu Technology

Oct 29, 2025 · Artificial Intelligence

Why Chunking Can Make or Break Your RAG System – Practical Strategies & Code

This article explains how proper document chunking—choosing the right chunk size, overlap, and structure‑aware boundaries—directly impacts the relevance, factuality, and efficiency of Retrieval‑Augmented Generation pipelines, and provides multiple Python implementations ranging from simple fixed‑length splits to semantic and hybrid approaches.

ChunkingEmbeddingLLM

0 likes · 29 min read

Why Chunking Can Make or Break Your RAG System – Practical Strategies & Code

Alibaba Cloud Observability

Oct 20, 2025 · Artificial Intelligence

How We Boosted Embedding Throughput 16× and Cut Vector Index Costs in a Cloud‑Native Setup

This article examines the high cost and low throughput of embedding vectors in log‑processing scenarios, analyzes the performance bottlenecks of inference frameworks, and details a series of cloud‑native optimizations—including switching to vLLM, deploying multiple model replicas with Triton, decoupling tokenization, and priority queuing—that together raise throughput by 16× and reduce per‑token pricing by two orders of magnitude.

EmbeddingGPU inferencePerformance Optimization

0 likes · 9 min read

How We Boosted Embedding Throughput 16× and Cut Vector Index Costs in a Cloud‑Native Setup

Alibaba Cloud Native

Oct 17, 2025 · Artificial Intelligence

How We Boosted Embedding Service Throughput 16× with Cloud‑Native Optimizations

This article details the cost and speed challenges of embedding vectors in large‑scale log scenarios, analyzes inference framework choices, describes GPU utilization, priority queuing, and pipeline redesigns, and reports a 16‑fold throughput increase and dramatically lower per‑request costs.

EmbeddingGPU OptimizationThroughput

0 likes · 8 min read

How We Boosted Embedding Service Throughput 16× with Cloud‑Native Optimizations

BirdNest Tech Talk

Oct 16, 2025 · Artificial Intelligence

Mastering Text Splitting in LangChain: From Theory to Code

This guide explains why large documents must be broken into semantic chunks for LLMs, introduces core parameters like chunk_size and chunk_overlap, compares LangChain's various splitters, and walks through a complete Python example that loads a long text, configures a RecursiveCharacterTextSplitter, and inspects the resulting chunks.

EmbeddingLangChainRAG

0 likes · 9 min read

Mastering Text Splitting in LangChain: From Theory to Code

JD Tech Talk

Oct 14, 2025 · Frontend Development

Cross‑Platform CEF Integration: Windows & macOS Setup Guide

This article explains how to integrate the Chromium Embedded Framework (CEF) on both Windows and macOS, covering required libraries, resource paths, main‑process initialization, render‑process creation, message‑loop handling, window adaptation, and version management to ensure a seamless cross‑platform deployment.

CEFCross-PlatformEmbedding

0 likes · 13 min read

Cross‑Platform CEF Integration: Windows & macOS Setup Guide

BirdNest Tech Talk

Oct 6, 2025 · Artificial Intelligence

How to Master Few-Shot Prompting with LangChain’s Example Selectors

The article explains why few-shot prompting benefits from dynamically selecting a small set of relevant examples, introduces LangChain’s ExampleSelector component, compares three selector strategies—LengthBased, SemanticSimilarity, and MaxMarginalRelevance—detailing their algorithms, advantages, drawbacks, and provides step-by-step Python code demonstrations for each.

AIEmbeddingExample selector

0 likes · 9 min read

How to Master Few-Shot Prompting with LangChain’s Example Selectors

JD Tech Talk

Sep 28, 2025 · Artificial Intelligence

What Is Retrieval‑Augmented Generation (RAG) and How Does It Power Modern AI?

This article explains Retrieval‑Augmented Generation (RAG), an AI framework that combines traditional information retrieval with large language models, detailing its core workflow—from knowledge preparation, chunking, and embedding to vector database storage and the question‑answering stage—while highlighting key challenges, tools, and optimization strategies.

AIChunkingEmbedding

0 likes · 15 min read

What Is Retrieval‑Augmented Generation (RAG) and How Does It Power Modern AI?

Data Party THU

Sep 25, 2025 · Artificial Intelligence

Mastering Triplet Loss in Sentence‑Transformers: A Step‑by‑Step Guide

This article explains the concept of triplet loss, its mathematical formulation, the different batch‑wise implementations in the sentence_transformers library, their advantages and drawbacks, and provides a complete Python example for training a text‑embedding model with Triplet Loss.

EmbeddingPyTorchPython

0 likes · 12 min read

Mastering Triplet Loss in Sentence‑Transformers: A Step‑by‑Step Guide

Tech Freedom Circle

Sep 25, 2025 · Artificial Intelligence

RAGFlow Deep Dive: Data Parsing and Knowledge Graph Construction

This article examines RAGFlow's end‑to‑end pipeline for turning diverse documents into structured knowledge, detailing the TaskExecutor factory, the DeepDoc layout‑aware parser, chunking strategies, embedding and storage mechanisms, and the GraphRAG‑based knowledge‑graph extraction that together enable high‑precision retrieval and reasoning.

ChunkingData ParsingDeepDoc

0 likes · 15 min read

RAGFlow Deep Dive: Data Parsing and Knowledge Graph Construction

Alibaba Cloud Developer

Sep 1, 2025 · Artificial Intelligence

Mastering RAG: From Chunking to Hybrid Search for Better AI Retrieval

This article delves into the implementation details and optimization strategies of Retrieval‑Augmented Generation (RAG), covering document chunking, index enhancement, embedding, hybrid search, and re‑ranking, and provides practical code examples to help developers move from quick deployment to deep performance tuning.

AIChunkingEmbedding

0 likes · 19 min read

Mastering RAG: From Chunking to Hybrid Search for Better AI Retrieval

Data Thinking Notes

Aug 31, 2025 · Artificial Intelligence

Embedding's Role in Retrieval‑Augmented Generation: Basics, Challenges & Future

This article explains how embedding technology converts unstructured data into vector representations, powers precise retrieval in Retrieval‑Augmented Generation (RAG), outlines the evolution of embedding models, discusses current challenges such as long‑text handling and domain adaptation, and highlights emerging solutions.

AIEmbeddingRAG

0 likes · 12 min read

Embedding's Role in Retrieval‑Augmented Generation: Basics, Challenges & Future

DaTaobao Tech

Aug 25, 2025 · Artificial Intelligence

Mastering RAG: From Quick Start to Deep Optimization Strategies

This article dives into the practical implementation of Retrieval‑Augmented Generation (RAG), covering document chunking, semantic and reverse HyDE indexing, embedding, hybrid search, and re‑ranking techniques, and provides concrete code examples and optimization tips for building high‑performance AI applications.

ChunkingEmbeddingHybrid Search

0 likes · 18 min read

Mastering RAG: From Quick Start to Deep Optimization Strategies

Qborfy AI

Aug 12, 2025 · Artificial Intelligence

What Powers Large Language Models? A Deep Dive into LLM Architecture and Scaling

This article explains how massive Transformer‑based large language models compress text data into mathematical representations, why scale, self‑attention, and training paradigms enable emergent general intelligence, and walks through tokenization, embedding, multi‑layer attention, architecture choices, energy costs, and hallucination mitigation.

AIEmbeddingLLM

0 likes · 6 min read

What Powers Large Language Models? A Deep Dive into LLM Architecture and Scaling

Alibaba Cloud Big Data AI Platform

Aug 11, 2025 · Artificial Intelligence

How Multimodal Product Search Transforms E‑Commerce with Embedding and Vector Retrieval

This article explores the evolution from keyword‑based to multimodal e‑commerce search, detailing a universal solution that combines text and image processing through embedding and vector retrieval, and demonstrates how Alibaba Cloud's AI Search Open Platform and Elasticsearch Serverless enable fast, low‑cost, and scalable multimodal product search deployments.

EmbeddingQuantizationVector Retrieval

0 likes · 17 min read

How Multimodal Product Search Transforms E‑Commerce with Embedding and Vector Retrieval

Ops Development & AI Practice

Jul 29, 2025 · Artificial Intelligence

Building a Retrieval‑Augmented Generation QA Bot to Keep LLMs Up‑to‑Date

This article explains how to create a RAG‑based intelligent QA system that fetches the latest documentation (e.g., PlantUML) before querying Gemini, detailing knowledge‑base creation, embedding, vector store management, LangChain integration, and deployment tips.

AI assistantEmbeddingGemini

0 likes · 8 min read

Building a Retrieval‑Augmented Generation QA Bot to Keep LLMs Up‑to‑Date

Alibaba Cloud Developer

Jul 16, 2025 · Artificial Intelligence

What Are the Core Concepts Behind AI? From Data to Models Explained

This article walks readers through the fundamentals of artificial intelligence, covering AI, machine learning, deep learning, data types, linear regression, supervised and unsupervised learning, reinforcement learning, feature engineering, tokenization, vectorization, embeddings, and includes a practical Word2Vec code example.

AIEmbeddingdata science

0 likes · 21 min read

What Are the Core Concepts Behind AI? From Data to Models Explained

macrozheng

Jul 4, 2025 · Artificial Intelligence

Build Java LLM Applications with LangChain4j: A Hands‑On Guide

This tutorial walks through the fundamentals of large language models, prompt engineering, word embeddings, and shows how to use the LangChain framework (including its Java implementation LangChain4j) to build, memory‑manage, retrieve, and chain AI‑driven applications with practical code examples.

AIEmbeddingJava

0 likes · 17 min read

Build Java LLM Applications with LangChain4j: A Hands‑On Guide

Mingyi World Elasticsearch

Jul 3, 2025 · Backend Development

Deep Dive into Elasticsearch semantic_text, dense_vector, and sparse_vector

This article explains how Elasticsearch supports vector search through three field types—semantic_text, dense_vector, and sparse_vector—detailing their definitions, ideal use cases, query syntax, advantages, limitations, and guidance for selecting the right type in real‑world search applications.

ElasticsearchEmbeddingdense_vector

0 likes · 14 min read

Deep Dive into Elasticsearch semantic_text, dense_vector, and sparse_vector

Ops Development Stories

Jun 30, 2025 · Artificial Intelligence

Build a Private AI Knowledge Assistant with n8n: Zero‑Code RAG in 30 Minutes

This guide shows how to create a fully local Retrieval‑Augmented Generation (RAG) system using n8n, Docker, Ollama and the free Qwen3 embedding model, enabling secure, up‑to‑date AI assistants that answer enterprise questions without exposing any proprietary data.

AI assistantDockerEmbedding

0 likes · 17 min read

Build a Private AI Knowledge Assistant with n8n: Zero‑Code RAG in 30 Minutes

Instant Consumer Technology Team

Jun 12, 2025 · Artificial Intelligence

How to Build a Production-Ready RAG System with Qwen3 Embedding and Reranker Models

This guide walks through using Alibaba's new Qwen3-Embedding and Qwen3-Reranker models to build a two‑stage Retrieval‑Augmented Generation pipeline with Milvus, covering environment setup, data ingestion, vector indexing, reranking, and LLM‑driven answer generation, demonstrating production‑grade performance across multilingual queries.

EmbeddingLLMMilvus

0 likes · 19 min read

How to Build a Production-Ready RAG System with Qwen3 Embedding and Reranker Models

Java Architecture Diary

Jun 9, 2025 · Artificial Intelligence

How Qwen3 Embedding Redefines Multilingual Vector Search Performance

This article examines the Qwen3 Embedding series released by Alibaba's Qwen team, detailing its architecture, multilingual capabilities, benchmark superiority across MTEB and C‑MTEB tests, and provides practical deployment guidance via Ollama and API integration.

AIEmbeddingOllama

0 likes · 8 min read

How Qwen3 Embedding Redefines Multilingual Vector Search Performance

JavaEdge

Jun 6, 2025 · Artificial Intelligence

Why Qwen3 Embedding Models Are Setting New Benchmarks in Text Representation

The article introduces the Qwen3 Embedding series, detailing its model variants, architecture, training methodology, multilingual support, performance metrics across several benchmarks, and future development plans, highlighting its superior generalization and flexibility for diverse AI applications.

AIEmbeddingQwen3

0 likes · 9 min read

Why Qwen3 Embedding Models Are Setting New Benchmarks in Text Representation

ITFLY8 Architecture Home

Jun 5, 2025 · Artificial Intelligence

Why Large Models Are Redefining Software: The Four AI Tech Drivers

The article explains how rapid AI advances and the AIAgent architecture are reshaping software development, outlines four key technical drivers—embedding, Transformer scaling laws, scenario Moore's law, and LLM OS—and discusses the security, professionalism, and responsibility challenges enterprises face when deploying AI‑native applications.

AI ArchitectureEmbeddingEnterprise AI

0 likes · 6 min read

Why Large Models Are Redefining Software: The Four AI Tech Drivers

Satori Komeiji's Programming Classroom

Jun 3, 2025 · Artificial Intelligence

Everything You Need to Know About Retrieval‑Augmented Generation (RAG)

The article explains Retrieval‑Augmented Generation (RAG) by describing how a programmer, frustrated with oversized prompts for a large language model, discovers that retrieving relevant document fragments, embedding them, and feeding the augmented context to the model yields accurate, fact‑based answers.

AIChunkingEmbedding

0 likes · 6 min read

Everything You Need to Know About Retrieval‑Augmented Generation (RAG)

Fun with Large Models

Apr 25, 2025 · Artificial Intelligence

Why Your RAG System Underperforms and How to Boost Its Effectiveness by 20%

This article analyzes common shortcomings of RAG pipelines—data preparation, retrieval, and LLM generation—and provides concrete optimization techniques such as advanced chunking, embedding model selection, retrieval parameter tuning, rerank models, and prompt engineering, promising up to a 20% performance gain.

ChunkingEmbeddingPrompt Engineering

0 likes · 17 min read

Why Your RAG System Underperforms and How to Boost Its Effectiveness by 20%

Tencent Technical Engineering

Apr 22, 2025 · Artificial Intelligence

Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB

Conan‑Embedding‑V2, a newly trained 1.4 B‑parameter LLM with a custom tokenizer, 32 k token context, SoftMask, cross‑lingual retrieval data and dynamic hard‑negative mining, delivers state‑of‑the‑art multilingual embeddings that surpass larger models on both English and Chinese MTEB benchmarks while remaining compact and fast.

EmbeddingLarge Language ModelMTEB

0 likes · 14 min read

Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB

Big Data Technology & Architecture

Apr 22, 2025 · Artificial Intelligence

Introduction to Retrieval‑Augmented Generation (RAG) and Vector Indexing with StarRocks and DeepSeek

This article explains the fundamentals of Retrieval‑Augmented Generation, demonstrates how to create and query vector indexes using StarRocks, shows how DeepSeek provides embeddings and answer generation, and walks through a complete end‑to‑end RAG pipeline with code examples and a web UI.

AIDeepSeekEmbedding

0 likes · 20 min read

Introduction to Retrieval‑Augmented Generation (RAG) and Vector Indexing with StarRocks and DeepSeek

Fun with Large Models

Apr 18, 2025 · Artificial Intelligence

How RAG Works: From Data Prep to LLM Generation Explained

This article breaks down Retrieval‑Augmented Generation (RAG) into its three core stages—data preparation, data retrieval, and LLM generation—showing how document chunking, embedding, vector databases, similarity search, and optional re‑ranking combine to let large language models produce more accurate, knowledge‑grounded answers.

EmbeddingLLMRAG

0 likes · 9 min read

How RAG Works: From Data Prep to LLM Generation Explained

AI Algorithm Path

Apr 10, 2025 · Artificial Intelligence

Beginner-Friendly Guide to Understanding Large Language Models

This article walks readers through the fundamentals of large language models, covering what tokens are, how tokenization works, the conversion of tokens to numeric IDs, the transformer architecture—including positional encoding, self‑attention, feed‑forward networks and softmax—and explains how these components enable next‑token prediction.

EmbeddingLLMSelf-Attention

0 likes · 9 min read

Beginner-Friendly Guide to Understanding Large Language Models

Spring Full-Stack Practical Cases

Apr 10, 2025 · Artificial Intelligence

Build a RAG-Powered Knowledge Base with Spring Boot, Milvus, and Ollama

This guide walks through creating a Retrieval‑Augmented Generation (RAG) system using Spring Boot 3.4.2, Milvus vector database, and the bge‑m3 embedding model via Ollama, covering environment setup, dependency configuration, vector store operations, and integration with a large language model to deliver refined, similarity‑based answers.

EmbeddingLLMMilvus

0 likes · 11 min read

Build a RAG-Powered Knowledge Base with Spring Boot, Milvus, and Ollama

Architect

Mar 29, 2025 · Artificial Intelligence

How Non‑AI Developers Can Build Powerful LLM Apps: Prompt Engineering, RAG, and AI Agents Explained

This article guides developers without an AI background through the fundamentals of building large‑language‑model applications, covering prompt engineering, multi‑turn interaction, function calling, retrieval‑augmented generation, vector databases, code assistants, and the MCP protocol for AI agents.

AI AgentEmbeddingFunction Calling

0 likes · 51 min read

How Non‑AI Developers Can Build Powerful LLM Apps: Prompt Engineering, RAG, and AI Agents Explained

Architect's Alchemy Furnace

Mar 26, 2025 · Artificial Intelligence

Mastering AI Knowledge Bases with Dify: From Creation to Advanced Retrieval

This guide explains how Dify visualizes RAG pipelines, lets developers upload and structure documents, choose segmentation modes, configure indexing and retrieval settings, and leverage embeddings and vector search to build fast, accurate, and up‑to‑date AI knowledge bases.

AIDifyEmbedding

0 likes · 26 min read

Mastering AI Knowledge Bases with Dify: From Creation to Advanced Retrieval

Ma Wei Says

Mar 24, 2025 · Artificial Intelligence

Master BGE Multilingual Embeddings: Models, Installation, and Quick Usage

Explore the BGE (BAAI General Embedding) family—including v1, v1.5, M3, Multilingual Gemma2, and EN‑ICL—detailing their multilingual capabilities, model variants, token limits, optimal use cases, and step‑by‑step installation and Python usage instructions with code examples for embedding generation and similarity scoring.

EmbeddingLLMPython

0 likes · 8 min read

Master BGE Multilingual Embeddings: Models, Installation, and Quick Usage

Architect

Mar 19, 2025 · Artificial Intelligence

Choosing the Best Embedding Model for RAG: A Practical Guide Using MTEB Rankings

This guide explains how to leverage the Massive Text Embedding Benchmark (MTEB) to identify high‑performing embedding models for Retrieval‑Augmented Generation (RAG) and outlines key factors such as model size, dimension, language support, resource requirements, inference speed, domain suitability, long‑text handling, scalability, and cost.

AIEmbeddingMTEB

0 likes · 12 min read

Choosing the Best Embedding Model for RAG: A Practical Guide Using MTEB Rankings

Architect's Alchemy Furnace

Mar 18, 2025 · Artificial Intelligence

How to Build an AI Agent with Ollama: From Model Setup to Knowledge Base

This step‑by‑step guide shows how to create an AI Agent by configuring a local Ollama model, selecting an embedding model, building a knowledge base, uploading documents, and testing the agent's retrieval capabilities, providing a practical RAG workflow for developers.

AI AgentEmbeddingOllama

0 likes · 8 min read

How to Build an AI Agent with Ollama: From Model Setup to Knowledge Base