Tagged articles

2016 articles

Page 16 of 21

Jan 7, 2025 · Artificial Intelligence

Tencent OlaChat: Intelligent Data Analysis Platform – Research, Architecture, and Capabilities

This article presents the Tencent PCG OlaChat team's research and practice in intelligent data analysis, covering the DIKW model, evolution of BI platforms, the impact of large language models, challenges of third‑generation data products, detailed product features, agent architecture, system design, and related academic publications.

AgentIntelligent BILLM

0 likes · 19 min read

Tencent OlaChat: Intelligent Data Analysis Platform – Research, Architecture, and Capabilities

DevOps

Jan 6, 2025 · Artificial Intelligence

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

This article reviews ten mainstream LLM deployment solutions—including WebLLM, LM Studio, Ollama, vLLM, LightLLM, OpenLLM, HuggingFace TGI, GPT4ALL, llama.cpp, and Triton Inference Server—detailing their technical characteristics, strengths, drawbacks, and example deployment workflows for both personal and enterprise environments.

AI inferenceGPU AccelerationLLM

0 likes · 16 min read

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

DeWu Technology

Jan 6, 2025 · Artificial Intelligence

Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform

The paper describes building a Retrieval‑Augmented Generation assistant for the Dewu Open Platform that leverages GPT‑4o‑mini, OpenAI embeddings, Milvus vector store, and LangChain.js to semantically retrieve API documentation, structure user queries, and generate accurate, JSON‑formatted answers, thereby reducing manual support and hallucinations.

AILLMLangChain

0 likes · 28 min read

Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform

AI Large Model Application Practice

Jan 6, 2025 · Artificial Intelligence

Boost LLM Agent Performance with the Evaluator‑Optimizer Reflection Loop

This article explains the Evaluator‑Optimizer reflection pattern for LLM agents, shows how it can improve output quality in single‑ or multi‑agent tasks, and provides a step‑by‑step PydanticAI implementation with code examples and practical usage tips.

LLMPydanticAIReflection

0 likes · 9 min read

Boost LLM Agent Performance with the Evaluator‑Optimizer Reflection Loop

Fighter's World

Jan 4, 2025 · Industry Insights

Is Unlimited Digital Labor Arriving? A Deep Dive into Salesforce’s Agentforce 2.0

Salesforce’s Agentforce 2.0 positions AI agents as a limitless digital labor platform, reshaping enterprise software with a new agent‑first model, consumption‑based pricing, and real‑world case studies that illustrate productivity gains, cost reductions, and strategic advantages in today’s AI‑driven market.

AI agentsAgentforceDigital Labor

0 likes · 19 min read

Is Unlimited Digital Labor Arriving? A Deep Dive into Salesforce’s Agentforce 2.0

Alibaba Cloud Infrastructure

Jan 3, 2025 · Cloud Native

How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)

This guide explains how to use Alibaba Cloud Service Mesh (ASM) to add infrastructure‑level observability for large language model (LLM) traffic, covering custom access‑log fields, new Prometheus metrics for token usage, and adding model dimensions to native Istio metrics, with step‑by‑step commands and configuration examples.

ASMKubernetesLLM

0 likes · 14 min read

How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)

AI Large Model Application Practice

Jan 3, 2025 · Artificial Intelligence

How to Build an Orchestrator‑Workers AI Agent Workflow with Pydantic AI

This article explains the Orchestrator‑Workers pattern from Anthropic’s “Build effective agents”, compares it with routing and parallel modes, distinguishes it from Supervisor agents, and provides a step‑by‑step Python implementation using Pydantic AI, including model definitions, prompts, orchestration logic, worker execution, and a test example.

AI agentsLLMOrchestrator-Workers

0 likes · 9 min read

How to Build an Orchestrator‑Workers AI Agent Workflow with Pydantic AI

Alibaba Cloud Big Data AI Platform

Jan 3, 2025 · Artificial Intelligence

Build an Education‑Focused Retrieval‑Augmented Generation (RAG) Solution with Alibaba PAI

This guide walks you through creating a RAG‑enhanced AI solution for education using Alibaba PAI, covering prerequisite setup, knowledge‑base construction with PAI‑Designer, model deployment, connection configuration, workflow assembly, and a side‑by‑side comparison of RAG versus non‑RAG answers.

AI PlatformLLMMilvus

0 likes · 16 min read

Build an Education‑Focused Retrieval‑Augmented Generation (RAG) Solution with Alibaba PAI

Infra Learning Club

Jan 2, 2025 · Artificial Intelligence

Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion

In 2025, large language models will see three key trends—agents becoming pervasive in daily life and industry, the emergence of efficient small models for edge and specialized tasks, and the integration of multimodal capabilities that combine text, images, and audio to enable more natural human‑machine interaction.

AI trendsLLMMultimodal

0 likes · 4 min read

Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion

DataFunSummit

Jan 1, 2025 · Artificial Intelligence

Challenges and Evaluation Strategies for LLM Agents in 2024

The article outlines the rapid progress of LLM agents in 2024 while highlighting key difficulties in planning capabilities, evaluation methods, dataset generation, and metric design, and suggests practical combinations and product‑level enhancements to improve efficiency, accuracy, and usability.

AIAgentDataset

0 likes · 3 min read

Challenges and Evaluation Strategies for LLM Agents in 2024

ByteFE

Dec 31, 2024 · Artificial Intelligence

In‑Depth Review of Cursor: AI‑Powered Coding Assistant, Capabilities, Use Cases, and Limitations

This article evaluates the Cursor AI coding assistant, describing its context‑aware indexing, Composer panel, and code‑generation features, while outlining practical scenarios such as Q&A, test creation, language conversion, and prototype development, and discussing its inherent randomness, domain‑knowledge gaps, and best‑practice recommendations for developers.

AI coding assistantLLMcode-generation

0 likes · 27 min read

In‑Depth Review of Cursor: AI‑Powered Coding Assistant, Capabilities, Use Cases, and Limitations

Alibaba Cloud Observability

Dec 30, 2024 · Operations

How to Quickly Diagnose Error and Performance Issues in Cloud‑Native Applications

This article outlines a comprehensive approach to identifying and resolving both error‑related and slow‑request problems in online systems by leveraging trace data, log correlation, method‑stack analysis, unified entity models, and large‑language‑model assistance to accelerate root‑cause diagnosis.

APMLLMPerformance debugging

0 likes · 12 min read

How to Quickly Diagnose Error and Performance Issues in Cloud‑Native Applications

AI Large Model Application Practice

Dec 30, 2024 · Artificial Intelligence

Implementing LLM Routing and Parallel Agent Workflows with PydanticAI

This tutorial walks through building semantic routing and parallel execution patterns for LLM agents using the lightweight PydanticAI framework, providing step‑by‑step code, example configurations, and practical observations to help developers create flexible AI‑driven workflows.

LLMParallelismPydanticAI

0 likes · 11 min read

Implementing LLM Routing and Parallel Agent Workflows with PydanticAI

ZhongAn Tech Team

Dec 28, 2024 · Artificial Intelligence

Weekly AI Digest Issue 8: OpenAI Robotics, ModernBERT Upgrade, Spatial Cognition, LLM Agent Evolution, and GNN‑LLM Fusion

This issue surveys recent AI developments, covering OpenAI's renewed robot program, the ModernBERT encoder upgrade, spatial reasoning advances in multimodal models, automated environment generation for LLM agents, and a novel GNN‑LLM approach for label‑free node classification.

Artificial IntelligenceBERTLLM

0 likes · 10 min read

Weekly AI Digest Issue 8: OpenAI Robotics, ModernBERT Upgrade, Spatial Cognition, LLM Agent Evolution, and GNN‑LLM Fusion

DataFunTalk

Dec 28, 2024 · Big Data

Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI

This article examines the current challenges of enterprise data analysis platforms, outlines three traditional analysis modes, and presents a next‑generation solution that combines Headless BI’s semantic modeling with Chat BI’s large‑language‑model interaction to deliver a more efficient, secure, and user‑friendly analytics experience.

ChatBIDataGovernanceHeadlessBI

0 likes · 15 min read

Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI

Volcano Engine Developer Services

Dec 26, 2024 · Artificial Intelligence

How LLMs Can Auto-Generate Unit Tests: Insights from ByteDance’s QCon Talk

This article summarizes ByteDance’s quality‑efficiency expert Zhao Liang’s QCon presentation on using large language models to automatically generate unit tests, covering pain points, goals, data‑quality engineering, model‑analysis fusion, architecture, evaluation metrics, and future plans for a production‑grade testing tool.

AILLMTest Generation

0 likes · 26 min read

How LLMs Can Auto-Generate Unit Tests: Insights from ByteDance’s QCon Talk

DevOps

Dec 25, 2024 · Artificial Intelligence

Anthropic’s Agent Development: The Counter‑Intuitive “Less Is More” Principle

Anthropic argues that building effective AI agents should start with simple, enhanced LLMs and only add workflow or autonomous agent complexity when necessary, emphasizing a “Less is More” approach to reduce latency, cost, and debugging difficulty.

AnthropicLLMLess is More

0 likes · 13 min read

Anthropic’s Agent Development: The Counter‑Intuitive “Less Is More” Principle

Alibaba Cloud Native

Dec 24, 2024 · Operations

How to Quickly Diagnose Error and Latency Issues in Cloud‑Native Applications

This article outlines a practical, end‑to‑end approach for identifying and resolving both error‑related and slow‑request problems in online systems by leveraging trace links, correlated logs, entity relationships, and large‑language‑model‑driven analysis to achieve rapid root‑cause isolation.

APMCloud NativeLLM

0 likes · 12 min read

How to Quickly Diagnose Error and Latency Issues in Cloud‑Native Applications

Alibaba Cloud Big Data AI Platform

Dec 24, 2024 · Artificial Intelligence

Build a Medical RAG Solution with Alibaba PAI: Step-by-Step Guide

Learn how to create a Retrieval‑Augmented Generation (RAG) system for medical applications using Alibaba's PAI platform, covering knowledge‑base construction with PAI‑Designer, template setup in PAI‑LangStudio, deployment of LLM and embedding models, vector database integration, and end‑to‑end workflow configuration.

EmbeddingLLMMilvus

0 likes · 18 min read

Build a Medical RAG Solution with Alibaba PAI: Step-by-Step Guide

NewBeeNLP

Dec 23, 2024 · Artificial Intelligence

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

The Qwen2.5 Technical Report introduces a new series of large language models with up to 72 B parameters, expanded pre‑training data to 18 trillion tokens, advanced supervised fine‑tuning and reinforcement learning pipelines, and demonstrates strong performance across comprehension, reasoning, coding, and long‑context tasks.

Fine-tuningLLMQwen2.5

0 likes · 5 min read

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

DataFunSummit

Dec 22, 2024 · Artificial Intelligence

From Concept to Deployment: The Evolution of 1688’s AI Purchasing Assistant “Yuanbao”

This article chronicles the development of 1688’s AI buyer assistant “Yuanbao”, detailing why an e‑commerce AI assistant is needed, its functional design, MVP constraints, the shift to a data‑driven 2.0 version, future prospects, and a Q&A, providing practical insights for AI product rollout in B‑to‑C platforms.

AIAgentData-driven

0 likes · 24 min read

From Concept to Deployment: The Evolution of 1688’s AI Purchasing Assistant “Yuanbao”

Baobao Algorithm Notes

Dec 18, 2024 · Artificial Intelligence

How STAR Enables Training‑Free Recommendations with Large Language Models

The article reviews the STAR framework, a training‑free recommendation approach that leverages large language model embeddings and collaborative co‑occurrence scores to retrieve and rank items, and evaluates its performance, hyper‑parameter effects, and ablation studies against existing LLM‑based recommender methods.

Artificial IntelligenceLLMcollaborative filtering

0 likes · 10 min read

How STAR Enables Training‑Free Recommendations with Large Language Models

Full-Stack Cultivation Path

Dec 18, 2024 · Frontend Development

Midscene.js: An AI‑Powered UI Automation Framework for Web Testing

Midscene.js leverages multimodal AI to simplify web UI automation by providing .ai, .aiQuery and .aiAssert methods, supporting JavaScript and YAML integrations, a Chrome extension, and detailed cost analysis while acknowledging latency, interaction limits, and prompt‑engineering challenges.

Chrome ExtensionJavaScriptLLM

0 likes · 9 min read

Midscene.js: An AI‑Powered UI Automation Framework for Web Testing

Alibaba Cloud Developer

Dec 17, 2024 · Frontend Development

Choosing the Best LangChain Text Splitter for Frontend LLM Apps

This article compares five LangChain text splitters—CharacterTextSplitter, RecursiveCharacterTextSplitter, TokenTextSplitter, MarkdownTextSplitter, and LatexTextSplitter—by examining their principles, pros and cons, and ideal use cases, helping developers select the most suitable splitter for their frontend large‑model applications.

JavaScriptLLMLangChain

0 likes · 10 min read

Choosing the Best LangChain Text Splitter for Frontend LLM Apps

Huolala Tech

Dec 17, 2024 · Artificial Intelligence

How to Secure AI Agents: Privacy Risks, Threats, and Governance Strategies

This article examines the rapid growth of AI agents, outlines typical privacy and security challenges such as data leakage, model attacks, and prompt injection, and proposes comprehensive governance and technical measures to mitigate these risks in enterprise deployments.

AI agentsLLMgovernance

0 likes · 22 min read

How to Secure AI Agents: Privacy Risks, Threats, and Governance Strategies

Huolala Safety Emergency Response Center

Dec 17, 2024 · Information Security

How Secure Are AI Agents? Risks, Attacks, and Governance Strategies

This article examines the rapid growth of AI agents, outlines their core components and classifications, analyzes a wide range of privacy and security threats—including data leakage, prompt injection, jailbreak, backdoor, hallucination, and memory attacks—and proposes practical governance measures to mitigate these risks.

AI agentsLLMgovernance

0 likes · 25 min read

How Secure Are AI Agents? Risks, Attacks, and Governance Strategies

Baobao Algorithm Notes

Dec 16, 2024 · Artificial Intelligence

What Do Leading Open‑Source LLMs Do After Pretraining? A Deep Dive into Post‑Training Strategies

This article surveys the post‑training pipelines of major open‑source large language models released this year, detailing their alignment algorithms, data synthesis, reward modeling, DPO/GRPO variants, long‑context handling, tool use, and model‑averaging techniques, and highlights emerging trends such as data‑centric pipelines and iterative weak‑to‑strong alignment.

AI researchAlignmentLLM

0 likes · 99 min read

What Do Leading Open‑Source LLMs Do After Pretraining? A Deep Dive into Post‑Training Strategies

Alibaba Cloud Big Data AI Platform

Dec 16, 2024 · Artificial Intelligence

Build a RAG-Powered Q&A App with Alibaba Cloud Milvus, DashScope & PAI

This guide walks you through creating a Retrieval‑Augmented Generation (RAG) question‑answering application by integrating Alibaba Cloud Milvus vector search, DashScope embedding models, and PAI EAS LLM services, covering prerequisites, service deployment, configuration, Python code setup, and execution steps.

LLMLangChainMilvus

0 likes · 12 min read

Build a RAG-Powered Q&A App with Alibaba Cloud Milvus, DashScope & PAI

ZhongAn Tech Team

Dec 15, 2024 · Artificial Intelligence

AI Weekly Digest Issue 6: OpenAI’s AI Christmas Season, LeCun’s AGI Forecast, Chinese Text‑to‑Image Breakthrough, and EchoMimic V2

This issue reviews OpenAI’s twelve‑day product launch, LeCun’s surprising AGI timeline, a new Chinese text‑to‑image capability from ByteDance’s Doubao, and the open‑source EchoMimic V2 digital‑human system, highlighting trends, technical details, and industry reactions across the AI landscape.

Artificial IntelligenceChinese Text GenerationEchoMimic

0 likes · 13 min read

AI Weekly Digest Issue 6: OpenAI’s AI Christmas Season, LeCun’s AGI Forecast, Chinese Text‑to‑Image Breakthrough, and EchoMimic V2

Baobao Algorithm Notes

Dec 15, 2024 · Artificial Intelligence

What Are the Best Practices for Retrieval‑Augmented Generation (RAG)?

This comprehensive study evaluates various components of Retrieval‑Augmented Generation pipelines—including query classification, chunking, embedding models, vector databases, retrieval, re‑ranking, summarization, and generator fine‑tuning—identifies optimal configurations, and proposes best‑practice guidelines for both performance‑maximizing and efficiency‑balanced RAG systems.

Fine-tuningLLMRAG

0 likes · 17 min read

What Are the Best Practices for Retrieval‑Augmented Generation (RAG)?

Fighter's World

Dec 14, 2024 · Industry Insights

Sequoia’s 2025 AI Outlook: From Hype to Real‑World Value

Sequoia Capital’s 2025 AI outlook argues that the industry is shifting from early excitement and massive spending to a phase focused on differentiated large‑model providers, AI‑search as a killer app, and a more disciplined, ROI‑driven investment climate.

2025 predictionsAIAI investment

0 likes · 16 min read

Sequoia’s 2025 AI Outlook: From Hype to Real‑World Value

DevOps

Dec 12, 2024 · Artificial Intelligence

The Future of Large Language Models: From Consumer Q&A to Agentic Workflows

Andrew Ng highlights that large language models are shifting from optimizing simple question‑answering for consumers to supporting complex agentic workflows, including tool usage, computer interaction, and multi‑agent collaboration, signaling a major evolution in AI capabilities.

AI agentsAI trendsAgentic AI

0 likes · 8 min read

The Future of Large Language Models: From Consumer Q&A to Agentic Workflows

AI Large Model Application Practice

Dec 12, 2024 · Artificial Intelligence

Mastering AutoGen: Build Multi‑Agent LLM Applications in Minutes

AutoGen, Microsoft’s advanced multi‑agent framework, lets developers quickly assemble collaborative LLM agents—supporting chat, tool use, and hierarchical group chats—through concise Python code, with examples ranging from simple two‑agent dialogues to complex three‑agent reporting pipelines, while outlining its strengths, limitations, and upcoming v0.4 enhancements.

AIAutoGenFramework

0 likes · 9 min read

Mastering AutoGen: Build Multi‑Agent LLM Applications in Minutes

Airbnb Technology Team

Dec 12, 2024 · Artificial Intelligence

Airbnb Automation Platform v2: Enabling LLM‑Driven Conversational AI

Airbnb’s Automation Platform v2 replaces the rigid, workflow‑driven architecture of v1 with an LLM‑centric design that orchestrates context gathering, chain‑of‑thought reasoning, tool execution, and guardrails, enabling more natural, scalable, and safe conversational AI while preserving the reliability of traditional workflows.

AI ArchitectureAirbnbConversational AI

0 likes · 11 min read

Airbnb Automation Platform v2: Enabling LLM‑Driven Conversational AI

DaTaobao Tech

Dec 9, 2024 · Artificial Intelligence

Analyzing LLM Failure Cases: Tokenization, Next‑Token Prediction, and Chain‑of‑Thought Prompting

The article explains how tokenization mismatches and biased next‑token prediction cause LLMs to miscount letters in “Strawberry” and incorrectly compare 9.9 versus 9.11, and shows that step‑by‑step Chain‑of‑Thought prompting with reason‑first output dramatically improves accuracy.

AIChain-of-ThoughtLLM

0 likes · 13 min read

Analyzing LLM Failure Cases: Tokenization, Next‑Token Prediction, and Chain‑of‑Thought Prompting

37 Interactive Technology Team

Dec 9, 2024 · Artificial Intelligence

Optimizing Request Concurrency for LLM Workflows: Rationale, Implementation, and Results

By breaking iterable inputs into parallel LLM calls and batching 20 items across three languages within Dify’s platform limits, the workflow achieves 43‑64% average runtime reductions and markedly higher success rates, demonstrating that request‑level concurrency dramatically improves throughput for large‑scale translation tasks.

CozeDifyLLM

0 likes · 6 min read

Optimizing Request Concurrency for LLM Workflows: Rationale, Implementation, and Results

DataFunSummit

Dec 4, 2024 · Artificial Intelligence

Accelerating Large Language Model Inference with the YiNian LLM Framework

This article presents the YiNian LLM framework, detailing how KVCache, prefill/decoding separation, continuous batching, PageAttention, and multi‑hardware scheduling are used to speed up large language model inference while managing GPU memory and latency.

AI accelerationContinuous BatchingGPU

0 likes · 20 min read

Accelerating Large Language Model Inference with the YiNian LLM Framework

DaTaobao Tech

Dec 4, 2024 · Artificial Intelligence

LLM‑Powered Live Stream Analysis and Automation for E‑commerce

Taobao’s self‑operated live‑stream team built an end‑to‑end pipeline that downloads benchmark videos, transcribes audio, and uses GPT‑4o prompts to automatically summarize sales highlights, visual cues, and comments, delivering actionable insights that match manual notes, free operators for core tasks, and enable features like coupon pushes and intelligent product recommendations.

LLMautomatione‑commerce

0 likes · 15 min read

LLM‑Powered Live Stream Analysis and Automation for E‑commerce

Rare Earth Juejin Tech Community

Dec 2, 2024 · Artificial Intelligence

Building a Simple Chatbot with Alibaba Tongyi Large Language Model: Fundamentals and Implementation

This article introduces the basic concepts of supervised and unsupervised machine learning, explains the core mechanisms of large language models such as Transformers, and provides a step‑by‑step guide with code to build a simple chatbot using Alibaba's Tongyi LLM via Spring Boot.

AlibabaChatbotLLM

0 likes · 11 min read

Building a Simple Chatbot with Alibaba Tongyi Large Language Model: Fundamentals and Implementation

AI Large Model Application Practice

Dec 2, 2024 · Artificial Intelligence

Master CrewAI: Build Multi‑Agent Systems Quickly with Flows and a Full Demo

This article introduces CrewAI, a high‑level Python framework for constructing multi‑agent systems, explains its core concepts such as Crew, Agent, Tool, Task and Process, walks through a complete demo with code, evaluates its strengths and limitations, and showcases the new Flows feature for more flexible workflow orchestration.

AI FrameworkCrewAIFlows

0 likes · 15 min read

Master CrewAI: Build Multi‑Agent Systems Quickly with Flows and a Full Demo

JavaEdge

Dec 1, 2024 · Artificial Intelligence

Exploring the Limits and Benchmarks of Qwen’s QwQ‑32B‑Preview AI Model

QwQ‑32B‑Preview, an experimental AI model from the Qwen team, showcases strong reasoning in math and programming while facing challenges like language switching, inference loops, safety concerns, and variable capabilities across domains, with benchmark scores ranging from 50% to over 90% on tests such as GPQA, AIME, MATH‑500, and LiveCodeBench.

AI BenchmarkLLMModel Evaluation

0 likes · 7 min read

Exploring the Limits and Benchmarks of Qwen’s QwQ‑32B‑Preview AI Model

AsiaInfo Technology: New Tech Exploration

Nov 29, 2024 · Artificial Intelligence

How GraphRAG Transforms Global QA with Structured Retrieval

This article examines GraphRAG—a graph‑enhanced Retrieval‑Augmented Generation approach—detailing its core concepts, the practical challenges of deploying it in enterprise settings, and the engineering solutions and future directions that enable more accurate, efficient, and explainable global question‑answering systems.

Global QAGraphRAGLLM

0 likes · 16 min read

How GraphRAG Transforms Global QA with Structured Retrieval

Alibaba Cloud Developer

Nov 28, 2024 · Artificial Intelligence

Understanding Tokenizers and Embeddings in Large Language Models

This article introduces the core concepts of tokenizers and embeddings in large language models, explains how they convert text into numeric IDs and dense vectors, compares different tokenization strategies, and provides practical JavaScript and TensorFlow.js code examples for beginners.

AI fundamentalsJavaScriptLLM

0 likes · 10 min read

Understanding Tokenizers and Embeddings in Large Language Models

Sohu Tech Products

Nov 27, 2024 · Artificial Intelligence

RAG Technology and Practical Application in Multi-Modal Query: Using Chinese-CLIP and Redis Search

The article explains how Retrieval‑Augmented Generation (RAG) outperforms direct LLM inference by enabling real‑time knowledge updates and lower costs, and demonstrates a practical multi‑modal RAG pipeline that uses Chinese‑CLIP for vector encoding, various chunking strategies, and Redis Search for fast vector storage and retrieval.

Chinese-CLIPLLMRAG

0 likes · 17 min read

RAG Technology and Practical Application in Multi-Modal Query: Using Chinese-CLIP and Redis Search

NewBeeNLP

Nov 27, 2024 · Artificial Intelligence

How Can Large Language Models Extend Their Context Window? A Deep Dive into Position Encoding

This article reviews the principles of absolute and relative positional encodings, explains why window extrapolation is crucial for large language models, analyzes current extrapolation methods, evaluates their performance, and answers common questions about extending LLM context windows.

LLMPositional EncodingRoPE

0 likes · 14 min read

How Can Large Language Models Extend Their Context Window? A Deep Dive into Position Encoding

Alibaba Cloud Big Data AI Platform

Nov 27, 2024 · Artificial Intelligence

How to Train, Evaluate, and Deploy Qwen2.5-Coder on Alibaba Cloud PAI‑QuickStart

This guide walks developers through the entire lifecycle of Qwen2.5‑Coder—covering model sizes, training token expansion, resource requirements, fine‑tuning with SFT/DPO, evaluation on custom and public datasets, and one‑click deployment and compression on Alibaba Cloud's PAI‑QuickStart platform.

DeploymentLLMModel Training

0 likes · 15 min read

How to Train, Evaluate, and Deploy Qwen2.5-Coder on Alibaba Cloud PAI‑QuickStart

AI Large Model Application Practice

Nov 25, 2024 · Artificial Intelligence

Building Multi‑Agent Systems with LangGraph: A Step‑by‑Step Guide

This article walks through implementing a multi‑agent workflow using LangGraph, comparing it with the lightweight Swam framework, and detailing the code for defining models, tools, agents, graph structures, testing, and evaluating the framework's strengths, limitations, and suitable use cases.

AILLMLangGraph

0 likes · 10 min read

Building Multi‑Agent Systems with LangGraph: A Step‑by‑Step Guide

Baobao Algorithm Notes

Nov 24, 2024 · Artificial Intelligence

How Marco‑o1 Merges Chain‑of‑Thought Fine‑Tuning with Monte‑Carlo Tree Search for Superior Reasoning

The article introduces Marco‑o1, an open‑source LLM that enhances complex reasoning by fine‑tuning on Chain‑of‑Thought data, integrating Monte‑Carlo Tree Search, introducing mini‑step actions and a reflection mechanism, and evaluates its performance on multilingual math and translation benchmarks.

Artificial IntelligenceChain-of-ThoughtLLM

0 likes · 15 min read

How Marco‑o1 Merges Chain‑of‑Thought Fine‑Tuning with Monte‑Carlo Tree Search for Superior Reasoning

System Architect Go

Nov 24, 2024 · Artificial Intelligence

Building a Web Voice Chatbot with Whisper, llama.cpp, and LLM

This article demonstrates how to build a web‑based voice chatbot by integrating Whisper speech‑to‑text, llama.cpp LLM inference, and WebSocket communication, detailing both the frontend JavaScript implementation and the Python FastAPI backend, along with Docker deployment and example code.

FastAPIJavaScriptLLM

0 likes · 10 min read

Building a Web Voice Chatbot with Whisper, llama.cpp, and LLM

ZhongAn Tech Team

Nov 24, 2024 · Artificial Intelligence

Weekly AI Digest – Issue 3: Agentic AI, Riemann Hypothesis Rumors, AI Search Trends, and Real‑time Voice Interaction

This issue reviews the rise of Agentic AI and upcoming computer agents, debunks a viral claim about Grok‑3 proving the Riemann hypothesis, analyzes Gartner’s AI search forecasts, and highlights OpenAI’s Realtime API for ultra‑low‑latency voice interactions.

AI searchAgentic AILLM

0 likes · 10 min read

Weekly AI Digest – Issue 3: Agentic AI, Riemann Hypothesis Rumors, AI Search Trends, and Real‑time Voice Interaction

DaTaobao Tech

Nov 20, 2024 · Mobile Development

MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment

MNN‑Transformer provides an end‑to‑end framework that enables large language and diffusion models to run efficiently on modern smartphones by exporting, quantizing (including dynamic int4/int8 and KV cache compression) and executing via a plugin‑engine runtime, achieving up to 35 tokens/s decoding and 2‑3× faster image generation compared with existing on‑device solutions.

LLMMNNMobile AI

0 likes · 15 min read

MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment

System Architect Go

Nov 19, 2024 · Artificial Intelligence

Retrieval Augmented Generation (RAG) System Overview and Implementation with LangChain, Redis, and llama.cpp

This article explains the concept, architecture, and step‑by‑step implementation of Retrieval Augmented Generation (RAG), covering indexing, retrieval & generation processes, a practical LangChain‑Redis‑llama.cpp example on Kubernetes, code snippets, test results, challenges, and references.

AIEmbeddingLLM

0 likes · 6 min read

Retrieval Augmented Generation (RAG) System Overview and Implementation with LangChain, Redis, and llama.cpp

dbaplus Community

Nov 16, 2024 · Artificial Intelligence

Are LLM Frameworks Overhyped? A Critical Look at RAG and Reusability

The article critiques LLM frameworks, comparing them to early ORM tools, explains how Retrieval Augmented Generation works, warns against premature optimization, and advises developers to favor simple, visible practices over complex, abstracted frameworks for better control and understanding.

AILLMModelEvaluation

0 likes · 7 min read

Are LLM Frameworks Overhyped? A Critical Look at RAG and Reusability

Alibaba Cloud Developer

Nov 15, 2024 · Frontend Development

How to Build Real‑Time LLM Streaming in the Browser with Fetch

This article explains the mechanism of HTTP API streaming for large language models and shows step‑by‑step how front‑end developers can use the Fetch API, readable streams, and incremental UI updates to deliver real‑time, progressive results while handling errors and connection interruptions.

Front-endHTTP streamingJavaScript

0 likes · 9 min read

How to Build Real‑Time LLM Streaming in the Browser with Fetch

Linux Kernel Journey

Nov 14, 2024 · Artificial Intelligence

Deep Dive: How DeepFlow Collects Business Metrics for Large‑Model Services

This article explains how China Mobile built a hybrid‑cloud production environment for its customer‑service LLM, using eBPF and WebAssembly plugins from DeepFlow to achieve zero‑intrusion observability, automatically capture full‑stack topology, application/network metrics, and key LLM business indicators such as TTFT, TPOT, and token throughput.

DeepFlowGrafanaLLM

0 likes · 19 min read

Deep Dive: How DeepFlow Collects Business Metrics for Large‑Model Services

Baobao Algorithm Notes

Nov 14, 2024 · Artificial Intelligence

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

This article details the end‑to‑end process of pre‑training, fine‑tuning, and evaluating a 1‑billion‑parameter Chinese LLM named Steel‑LLM on limited hardware, covering data collection, pipeline design, training framework choices, architectural tweaks, performance results, and practical lessons for resource‑constrained developers.

LLMModel architectureTraining Optimization

0 likes · 18 min read

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

Alibaba Cloud Developer

Nov 14, 2024 · Artificial Intelligence

Building a High‑Accuracy Automotive Maintenance Q&A System with Multi‑Agent LLMs

This article details how to design, implement, and evaluate a complex‑table intelligent Q&A solution for automotive maintenance using large language models, RAG pipelines, multi‑agent architectures, prompt engineering, and Alibaba Cloud services, achieving up to 93.8% accuracy.

LLMMulti-AgentRAG

0 likes · 31 min read

Building a High‑Accuracy Automotive Maintenance Q&A System with Multi‑Agent LLMs

AI Large Model Application Practice

Nov 13, 2024 · Artificial Intelligence

Exploring OpenAI Swam: A Minimalist Multi‑Agent Orchestration Framework

This article introduces the concept of multi‑agent systems, compares five popular orchestration frameworks, and provides a step‑by‑step tutorial for building and testing a simple supervision‑based workflow using OpenAI's experimental Swam library, complete with code snippets and performance observations.

LLMOpenAIPython

0 likes · 12 min read

Exploring OpenAI Swam: A Minimalist Multi‑Agent Orchestration Framework

Aikesheng Open Source Community

Nov 12, 2024 · Artificial Intelligence

ChatDBA: An AI‑Powered Database Fault Diagnosis Assistant Using Large Language Models

ChatDBA is a conversational AI system built by Shanghai Aikesheng that employs large language models and Retrieval‑Augmented Generation to help database administrators diagnose faults, learn domain knowledge, and generate or optimize SQL, with a redesigned architecture that addresses early‑stage shortcomings and outlines future enhancements.

ChatDBAFault DiagnosisKnowledge Base

0 likes · 10 min read

ChatDBA: An AI‑Powered Database Fault Diagnosis Assistant Using Large Language Models

Alibaba Cloud Developer

Nov 12, 2024 · Artificial Intelligence

How Multi‑Agent LLMs Can Auto‑Optimize E‑Commerce Product Titles

This article explains how large language models and rule‑based multi‑agent pipelines are used to automatically generate and select high‑impact keywords for e‑commerce product titles, improving search exposure without extra advertising costs.

AILLMe‑commerce

0 likes · 19 min read

How Multi‑Agent LLMs Can Auto‑Optimize E‑Commerce Product Titles

Aikesheng Open Source Community

Nov 11, 2024 · Databases

ChatDBA: An AI‑Powered Intelligent Assistant for Database Fault Diagnosis and Management

ChatDBA is an AI‑driven conversational system developed by Shanghai Aikesheng that assists DBAs with fault diagnosis, knowledge learning, SQL generation and optimization by leveraging large language models, RAG architecture, and advanced retrieval and document‑processing techniques.

ChatDBAFault DiagnosisLLM

0 likes · 10 min read

ChatDBA: An AI‑Powered Intelligent Assistant for Database Fault Diagnosis and Management

JD Cloud Developers

Nov 11, 2024 · Artificial Intelligence

Mastering Prompt Engineering: History, Techniques, and Real-World Applications

This article explains what Prompt Engineering is, traces its evolution from early NLP commands to modern adaptive and multimodal prompting, details core techniques such as Zero‑shot, Chain‑of‑Thought, Auto‑CoT, and reduction of hallucinations, and showcases a logistics case study using various prompting strategies.

AIChain-of-ThoughtLLM

0 likes · 26 min read

Mastering Prompt Engineering: History, Techniques, and Real-World Applications

Fighter's World

Nov 11, 2024 · Artificial Intelligence

How CoCounsel’s $650M Acquisition Reveals Key Design Principles for LLM‑Powered Legal Tools

The article examines how Casetext’s CoCounsel, an AI‑driven legal assistant acquired by Thomson Reuters for $650 million, achieved rapid growth by prioritizing accuracy, workflow integration, user‑centered design, security, and continuous improvement, and distills the critical challenges and success factors for building LLM‑native products in low‑tolerance B2B environments.

AI ethicsB2B SaaSLLM

0 likes · 11 min read

How CoCounsel’s $650M Acquisition Reveals Key Design Principles for LLM‑Powered Legal Tools

Alibaba Cloud Observability

Nov 8, 2024 · Cloud Native

Enable Python Probe for LLM Observability on Alibaba Cloud ACK

This guide explains how to integrate Alibaba Cloud's Python probe into a Kubernetes (ACK) environment to monitor large language model (LLM) applications, covering prerequisites, installation steps, Dockerfile modifications, resource permissions, and sample Python code for both server and client components.

ARMSCloud NativeDocker

0 likes · 16 min read

Enable Python Probe for LLM Observability on Alibaba Cloud ACK

AI Large Model Application Practice

Nov 8, 2024 · Artificial Intelligence

How to Build a Multimodal Embedding RAG with Cohere and LlamaIndex

This guide explains how to overcome the limitations of text‑only embeddings for enterprise AI search by using a multimodal embedding model to index and retrieve both text and images, detailing the full workflow, code examples, and performance benefits.

CohereLLMLlamaIndex

0 likes · 13 min read

How to Build a Multimodal Embedding RAG with Cohere and LlamaIndex

CSS Magic

Nov 8, 2024 · Artificial Intelligence

LLM Application Development Tips (3): Exploring LLM API Inputs and Outputs

This article explains how to configure key OpenAI chat completion parameters—such as temperature, top_p, streaming, response format, and tool selection—and walks through the structure of the API's JSON response, highlighting fields like id, model, choices, finish_reason, and usage for better control and cost estimation.

AI agentsAPI parametersJSON response

0 likes · 8 min read

LLM Application Development Tips (3): Exploring LLM API Inputs and Outputs

Alimama Tech

Nov 6, 2024 · Artificial Intelligence

How AI Generates Synchronized Video Narrations for E‑Commerce

This article presents the research behind Synchronized Video Storytelling, introducing the E‑SyncVidStory dataset, the VideoNarrator multimodal architecture, and extensive experiments that demonstrate high‑quality, product‑aware video narration generation for e‑commerce applications.

DatasetLLMMultimodal AI

0 likes · 12 min read

How AI Generates Synchronized Video Narrations for E‑Commerce

37 Interactive Technology Team

Nov 4, 2024 · Artificial Intelligence

Developing RAG and Agent Applications with LangChain: A Case Study of an AI Assistant for Activity Components

The article outlines a step‑by‑step methodology for creating Retrieval‑Augmented Generation and custom Agent applications with LangChain, illustrated by an AI assistant for activity components that evolves from a rapid Dify prototype to a LangChain‑based RAG system and finally a hand‑crafted ReAct‑style agent, detailing LCEL chain composition, vector‑search integration, model performance trade‑offs, and a unified routing layer.

AI AssistantAgentCloud-native

0 likes · 6 min read

Developing RAG and Agent Applications with LangChain: A Case Study of an AI Assistant for Activity Components

Baobao Algorithm Notes

Nov 4, 2024 · Artificial Intelligence

Uncovering 16 Limits of AI Search Engines and 16 Design Recommendations

A user study with 21 participants reveals sixteen critical limitations of generative AI search engines, maps them to eight quantitative metrics, proposes sixteen design recommendations, and evaluates You.com, Perplexity and BingChat against this framework to highlight current performance gaps.

AI searchGenerative SearchLLM

0 likes · 12 min read

Uncovering 16 Limits of AI Search Engines and 16 Design Recommendations

CSS Magic

Nov 1, 2024 · Artificial Intelligence

Refining System Prompts for LLMs: Practical Tips for Batch Automation

This article explains how to automate batch document processing with LLM APIs by mastering the messages parameter, defining system, user, and assistant roles, and iteratively polishing system prompts through scripts or OpenAI's GPTs editor and Playground interfaces.

ChatGPTLLMOpenAI API

0 likes · 7 min read

Refining System Prompts for LLMs: Practical Tips for Batch Automation

DataFunTalk

Oct 31, 2024 · Artificial Intelligence

Tencent OlaChat: An LLM‑Powered Intelligent Business Intelligence Platform – Architecture, Capabilities, and Practice

This article presents the evolution from traditional to intelligent BI, explores how large language models enable natural‑language data analysis, details the OlaChat platform’s architecture, metadata‑enhanced retrieval methods, Text2SQL pipeline, multi‑turn dialogue system, and shares practical deployment insights and Q&A.

Business IntelligenceIntelligent AnalyticsLLM

0 likes · 20 min read

Tencent OlaChat: An LLM‑Powered Intelligent Business Intelligence Platform – Architecture, Capabilities, and Practice

NewBeeNLP

Oct 31, 2024 · Artificial Intelligence

How o1 Is Redefining LLM Engineering and What It Means for AI Professionals

The article examines OpenAI's o1 model, highlighting its unprecedented scientific capabilities, its shift from a chat toy to a high‑value tool, the potential impact on algorithm engineers, and the technical directions (RLHF, MCTS, PPO, PRM) that practitioners should master to stay relevant.

AILLMmodel analysis

0 likes · 8 min read

How o1 Is Redefining LLM Engineering and What It Means for AI Professionals

Alibaba Cloud Developer

Oct 31, 2024 · Artificial Intelligence

How to Guarantee 100% Structured JSON Output from Large Language Models

This article explains why LLMs often fail to produce strict JSON, reviews existing solutions, and presents a three‑stage strategy—prompt engineering, dynamic constrained decoding, and post‑processing—to achieve reliable structured JSON output for automated pipelines.

AIJSONLLM

0 likes · 11 min read

How to Guarantee 100% Structured JSON Output from Large Language Models

DaTaobao Tech

Oct 30, 2024 · Artificial Intelligence

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

The article explains how OpenAI’s o1 model leverages chain‑of‑thought prompting, dual‑system cognitive theory, and new scaling laws—pre‑training on code/math and post‑training reinforcement with step‑wise reward models—to achieve superior reasoning, safety, and performance over GPT‑4, heralding a shift toward models that learn to think.

Chain-of-ThoughtLLMReinforcement Learning

0 likes · 42 min read

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

Baobao Algorithm Notes

Oct 29, 2024 · Artificial Intelligence

Reproducing OpenAI o1: Steiner Model’s Reasoning, Training, and Evaluation

This report details the design, data synthesis, three‑stage training pipeline, and benchmark evaluation of the open‑source Steiner reasoning model, which aims to emulate OpenAI o1’s inference‑time scaling while highlighting current performance gaps and future research challenges.

Inference ScalingLLMReasoning Models

0 likes · 14 min read

Reproducing OpenAI o1: Steiner Model’s Reasoning, Training, and Evaluation

Baobao Algorithm Notes

Oct 29, 2024 · Industry Insights

Inside Perplexity AI: How RAG Powers the Next‑Gen Search Engine

In this interview, Perplexity AI CEO Aravind Srinivas explains the company’s retrieval‑augmented generation architecture, multi‑model strategy, vector‑database use, competitive positioning against Google, monetization plans, and future product road‑map, offering a deep industry perspective on AI‑driven search.

AI startupIndustry AnalysisLLM

0 likes · 38 min read

Inside Perplexity AI: How RAG Powers the Next‑Gen Search Engine

CSS Magic

Oct 29, 2024 · Artificial Intelligence

LLM Application Development Tips (1): How to Choose the Right Model

With a growing array of overseas and domestic LLM APIs in 2024, this guide explains how to pick the right model—starting with a top‑tier option like GPT‑4o for feasibility testing, then moving to cost‑effective or Chinese alternatives, while weighing price, inference speed, context window, API compatibility, and rate limits.

API compatibilityChinese LLMGPT-4o

0 likes · 8 min read

LLM Application Development Tips (1): How to Choose the Right Model

DevOps

Oct 27, 2024 · Artificial Intelligence

Best Practices for Building Efficient Retrieval‑Augmented Generation (RAG) Systems

This article reviews Wang et al.'s 2024 research on Retrieval‑Augmented Generation, outlining optimal practices such as query classification, chunk sizing, hybrid metadata search, embedding selection, vector databases, query transformation, reranking, document repacking, summarization, fine‑tuning, and multimodal retrieval to guide developers in constructing high‑performance RAG pipelines.

LLMQuery ClassificationRAG

0 likes · 11 min read

Best Practices for Building Efficient Retrieval‑Augmented Generation (RAG) Systems

Alibaba Cloud Native

Oct 26, 2024 · Artificial Intelligence

Build a Real‑Time Semantic Search with EventBridge, DashVector, and FunctionCompute

This tutorial walks through constructing a zero‑to‑one RAG pipeline that ingests OSS text files via EventBridge, transforms them into embeddings with DashScope, stores vectors in DashVector, and performs semantic search using FunctionCompute and a Qwen‑Turbo LLM, complete with code samples and configuration steps.

DashVectorEmbeddingEventBridge

0 likes · 10 min read

Build a Real‑Time Semantic Search with EventBridge, DashVector, and FunctionCompute

System Architect Go

Oct 25, 2024 · Artificial Intelligence

Designing and Extending a Self‑Built ChatGPT System: Architecture, Session Management, and Scaling Strategies

This article explains how to construct a ChatGPT‑like conversational system by detailing the core dialogue flow, adding session and history management with a database, defining REST APIs, and exploring extensions such as caching, elastic scaling, and production‑ready deployment considerations.

ChatGPTLLMScalability

0 likes · 7 min read

Designing and Extending a Self‑Built ChatGPT System: Architecture, Session Management, and Scaling Strategies

Baobao Algorithm Notes

Oct 25, 2024 · Artificial Intelligence

How Simhash and Minhash Power LLM Data Deduplication: Theory and Spark Code

This article explains document‑level, paragraph‑level, and sentence‑level deduplication for large‑scale LLM pre‑training, introduces the Simhash and Minhash algorithms with step‑by‑step Python examples, and shows how to implement efficient LSH‑based deduplication using Spark.

LLMMinhashPython

0 likes · 29 min read

How Simhash and Minhash Power LLM Data Deduplication: Theory and Spark Code

Baobao Algorithm Notes

Oct 25, 2024 · Artificial Intelligence

How to Use Importance Sampling for Effective Continue Pretraining of LLMs

Continuing pretraining (CP) bridges pretraining and SFT to inject domain knowledge, but faces catastrophic forgetting; this article explores leveraging importance sampling to balance common and domain data, discusses data selection, annealing strategies, and practical tips for mitigating forgetting while enhancing specialized capabilities.

Catastrophic ForgettingContinue PretrainingImportance Sampling

0 likes · 8 min read

How to Use Importance Sampling for Effective Continue Pretraining of LLMs

System Architect Go

Oct 24, 2024 · Artificial Intelligence

How to Fine‑Tune Translation Models on Kubernetes Docs with LoRA

This article walks through the complete process of fine‑tuning both domain‑specific and large‑language translation models on Kubernetes documentation, covering data preparation, model selection, training configurations, the differences between Seq2Seq and CausalLM, and how LoRA can dramatically reduce resource usage while improving performance.

AIFine-tuningLLM

0 likes · 7 min read

How to Fine‑Tune Translation Models on Kubernetes Docs with LoRA

21CTO

Oct 23, 2024 · Artificial Intelligence

IBM Unveils Granite 3.0 LLMs: Open‑Source, Secure, and Cost‑Effective AI Models

IBM introduced the Granite 3.0 series, an open‑source family of large language models that combine cutting‑edge performance with enhanced security, multi‑language support, and cost‑efficiency, while offering a variety of base, instruct, and specialist variants for enterprise use.

AI modelsGraniteIBM

0 likes · 4 min read

IBM Unveils Granite 3.0 LLMs: Open‑Source, Secure, and Cost‑Effective AI Models

DaTaobao Tech

Oct 23, 2024 · Artificial Intelligence

Retrieval-Augmented Generation (RAG): Principles, Applications, Limitations and Challenges

Retrieval-Augmented Generation (RAG) combines a retriever that fetches relevant external documents and a generator that uses them, improving LLM accuracy, relevance, privacy, and up-to-date information, but faces challenges such as retrieval latency, computational cost, chunking strategies, embedding selection, and system integration complexity.

AIKnowledge RetrievalLLM

0 likes · 13 min read

Retrieval-Augmented Generation (RAG): Principles, Applications, Limitations and Challenges

Baidu Geek Talk

Oct 23, 2024 · Artificial Intelligence

Integrating Yuan 2.0 Large Model with PaddleNLP: Overview, Usage Steps, and Interaction Examples

The open‑source Yuan 2.0 large model is fully integrated into Baidu’s PaddleNLP, offering quick inference for tasks like code generation, translation, and reasoning, along with efficient distributed training and fine‑tuning features such as Zero Padding optimization, enabling developers to easily deploy and customize the model via simple setup steps and example interactions.

AILLMPaddleNLP

0 likes · 10 min read

Integrating Yuan 2.0 Large Model with PaddleNLP: Overview, Usage Steps, and Interaction Examples

Alibaba Cloud Big Data AI Platform

Oct 22, 2024 · Artificial Intelligence

How Alibaba Cloud Optimizes Enterprise RAG: Key Techniques for AI Search

At the 2024 Alibaba Cloud Yúnxī Conference, senior AI Search expert Xing Shaomin detailed the enterprise‑grade Retrieval‑Augmented Generation (RAG) pipeline, covering critical link architecture, effectiveness, performance, and cost optimizations, as well as practical applications, vector store enhancements, LLM agents, and deployment strategies.

AI searchCost OptimizationEnterprise AI

0 likes · 16 min read

How Alibaba Cloud Optimizes Enterprise RAG: Key Techniques for AI Search

AI Large Model Application Practice

Oct 21, 2024 · Artificial Intelligence

Building Personalized Long‑Term Memory for AI Agents with Mem0 and LangGraph

This tutorial explains why AI agents need durable, personalized long‑term memory, introduces the open‑source Mem0 solution, shows how Mem0 works with LLMs and vector stores, and provides step‑by‑step code to integrate Mem0 into a LangGraph workflow for adaptive, user‑specific interactions.

AI memoryLLMLangGraph

0 likes · 11 min read

Building Personalized Long‑Term Memory for AI Agents with Mem0 and LangGraph

DataFunSummit

Oct 18, 2024 · Artificial Intelligence

Building Efficient RAG Applications with a Small Team: Insights from PingCAP AI Lab

This article details how PingCAP's three‑person AI Lab leveraged Retrieval‑Augmented Generation (RAG) techniques—including basic RAG, fine‑tuned embeddings, re‑ranking, graph RAG, and agent‑based RAG—to create scalable, multilingual document‑question answering services while addressing large‑scale documentation challenges, model limitations, and user feedback loops.

AgentEmbeddingFine-tuning

0 likes · 14 min read

Building Efficient RAG Applications with a Small Team: Insights from PingCAP AI Lab

JavaEdge

Oct 18, 2024 · Artificial Intelligence

Designing Scalable Multi‑Agent Systems with LangGraph: Architectures, Communication, and Code Samples

This article explains why large‑language‑model agents become hard to manage, outlines the benefits of modular multi‑agent designs, compares several connection architectures, and provides concrete LangGraph code for supervisor‑based, tool‑calling, and custom workflow patterns.

LLMLangGraphMulti-Agent

0 likes · 12 min read

Designing Scalable Multi‑Agent Systems with LangGraph: Architectures, Communication, and Code Samples

System Architect Go

Oct 17, 2024 · Artificial Intelligence

Running and Fine‑Tuning Large Language Models Locally with Ollama, Docker, and Cloud Resources

The author chronicles the challenges and solutions of running large language models locally using Ollama, experimenting with cloud GPUs on Google Colab, managing Python dependencies through Docker, and ultimately fine‑tuning a small Qwen model, providing a practical guide for AI enthusiasts.

DockerFine-tuningGoogle Colab

0 likes · 6 min read

Running and Fine‑Tuning Large Language Models Locally with Ollama, Docker, and Cloud Resources

NewBeeNLP

Oct 16, 2024 · Artificial Intelligence

Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention

This article reviews recent advances in training and inference for long‑sequence large language models, comparing ALIBI and RoPE position embeddings, exploring RoPE scaling techniques, analyzing attention optimizations, and outlining practical data, evaluation, and system frameworks for scalable LLM deployment.

Flash AttentionLLMRoPE

0 likes · 14 min read

Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention

Baobao Algorithm Notes

Oct 16, 2024 · Artificial Intelligence

How the DB3 Team Won the Meta CRAG RAG Challenge: Prompts, Retrieval, and LoRA Fine‑Tuning

This article analyzes the Meta Comprehensive RAG (CRAG) benchmark, detailing its three tasks, evaluation metrics, and the champion DB3 team's end‑to‑end solution that combines data preprocessing, dual‑stage retrieval, prompt engineering, LoRA‑based fine‑tuning, and public data augmentation to achieve top scores across all tasks.

LLMLoRARAG

0 likes · 17 min read

How the DB3 Team Won the Meta CRAG RAG Challenge: Prompts, Retrieval, and LoRA Fine‑Tuning

System Architect Go

Oct 15, 2024 · Artificial Intelligence

Overview of Ollama: Architecture, Storage Structure, and Dialogue Process

This article provides a comprehensive overview of Ollama, a lightweight tool for running large language models, detailing its client‑server architecture, local storage layout, and the step‑by‑step workflow of user interactions with the model.

AI toolsLLMOllama

0 likes · 7 min read

Overview of Ollama: Architecture, Storage Structure, and Dialogue Process

CSS Magic

Oct 14, 2024 · Artificial Intelligence

How OpenAI’s o1 Models Impact Developers: Performance, Limits, Cost, and Prompting

The article evaluates OpenAI’s o1 series—o1‑preview, o1‑mini and the upcoming full model—by comparing their complex reasoning strength, slower inference speed, higher pricing, API restrictions, and prompting best practices, helping developers decide when to adopt them.

APILLMOpenAI

0 likes · 13 min read

How OpenAI’s o1 Models Impact Developers: Performance, Limits, Cost, and Prompting

Baobao Algorithm Notes

Oct 13, 2024 · Artificial Intelligence

Can Hierarchical LLMs Transform Sequential Recommendation? A Deep Dive

This article provides a comprehensive analysis of the HLLM paper, detailing its hierarchical LLM architecture for item and user modeling, the training objectives, fusion strategies, extensive offline and online experiments, scaling behavior, ablation studies, and practical deployment insights in large‑scale recommendation systems.

Industrial DeploymentLLMSequential Modeling

0 likes · 12 min read

Can Hierarchical LLMs Transform Sequential Recommendation? A Deep Dive

JD Tech

Oct 13, 2024 · Artificial Intelligence

Building a Simple Local AI Question‑Answer System with Java, LangChain4J, Ollama, and ChromaDB

This article guides readers through the concepts of large language models, embeddings, vector databases, and Retrieval‑Augmented Generation, then demonstrates step‑by‑step how to set up Ollama, install a local Chroma vector store, configure Maven dependencies, and write Java code using LangChain4J to build and test a functional AI Q&A application.

AILLMLangChain4j

0 likes · 22 min read

Building a Simple Local AI Question‑Answer System with Java, LangChain4J, Ollama, and ChromaDB

AntTech

Oct 12, 2024 · Artificial Intelligence

Observations from ISSTA 2024: Conference Highlights, Awarded Papers, Keynotes, and In‑Depth Reviews

The article reports on the 33rd ISSTA 2024 conference in Vienna, summarizing its acceptance statistics, highlighting the Impact Paper Award and Distinguished Papers, detailing keynotes on large‑language‑model‑driven software quality, and providing extensive reviews of selected research works ranging from fuzzing and program repair to database query simplification and AI‑oriented code generation.

ISSTA2024LLMProgramRepair

0 likes · 29 min read

Observations from ISSTA 2024: Conference Highlights, Awarded Papers, Keynotes, and In‑Depth Reviews

21CTO

Oct 10, 2024 · Artificial Intelligence

5 Practical AI Projects to Build Your Skills with Python

This article presents five hands‑on AI project ideas—from resume optimization to multimodal search—complete with step‑by‑step instructions, required Python libraries, and code snippets, helping beginners and intermediate developers quickly build valuable AI applications.

AILLMPython

0 likes · 12 min read

5 Practical AI Projects to Build Your Skills with Python

JD Tech Talk

Oct 8, 2024 · Artificial Intelligence

Building a Retrieval‑Augmented Generation (RAG) System with Rust and Qdrant

This article explains how to construct a Retrieval‑Augmented Generation pipeline in Rust, covering knowledge‑base creation with Qdrant, model loading and embedding using the candle library, data ingestion, and integration of a Rust‑based inference service based on mistral.rs, while also discussing resource usage and common pitfalls.

AIEmbeddingLLM

0 likes · 16 min read

Building a Retrieval‑Augmented Generation (RAG) System with Rust and Qdrant