Tagged articles
2016 articles
Page 16 of 21
DataFunSummit
DataFunSummit
Jan 7, 2025 · Artificial Intelligence

Tencent OlaChat: Intelligent Data Analysis Platform – Research, Architecture, and Capabilities

This article presents the Tencent PCG OlaChat team's research and practice in intelligent data analysis, covering the DIKW model, evolution of BI platforms, the impact of large language models, challenges of third‑generation data products, detailed product features, agent architecture, system design, and related academic publications.

AgentIntelligent BILLM
0 likes · 19 min read
Tencent OlaChat: Intelligent Data Analysis Platform – Research, Architecture, and Capabilities
DevOps
DevOps
Jan 6, 2025 · Artificial Intelligence

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

This article reviews ten mainstream LLM deployment solutions—including WebLLM, LM Studio, Ollama, vLLM, LightLLM, OpenLLM, HuggingFace TGI, GPT4ALL, llama.cpp, and Triton Inference Server—detailing their technical characteristics, strengths, drawbacks, and example deployment workflows for both personal and enterprise environments.

AI inferenceGPU AccelerationLLM
0 likes · 16 min read
Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations
DeWu Technology
DeWu Technology
Jan 6, 2025 · Artificial Intelligence

Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform

The paper describes building a Retrieval‑Augmented Generation assistant for the Dewu Open Platform that leverages GPT‑4o‑mini, OpenAI embeddings, Milvus vector store, and LangChain.js to semantically retrieve API documentation, structure user queries, and generate accurate, JSON‑formatted answers, thereby reducing manual support and hallucinations.

AILLMLangChain
0 likes · 28 min read
Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform
Fighter's World
Fighter's World
Jan 4, 2025 · Industry Insights

Is Unlimited Digital Labor Arriving? A Deep Dive into Salesforce’s Agentforce 2.0

Salesforce’s Agentforce 2.0 positions AI agents as a limitless digital labor platform, reshaping enterprise software with a new agent‑first model, consumption‑based pricing, and real‑world case studies that illustrate productivity gains, cost reductions, and strategic advantages in today’s AI‑driven market.

AI agentsAgentforceDigital Labor
0 likes · 19 min read
Is Unlimited Digital Labor Arriving? A Deep Dive into Salesforce’s Agentforce 2.0
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 3, 2025 · Cloud Native

How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)

This guide explains how to use Alibaba Cloud Service Mesh (ASM) to add infrastructure‑level observability for large language model (LLM) traffic, covering custom access‑log fields, new Prometheus metrics for token usage, and adding model dimensions to native Istio metrics, with step‑by‑step commands and configuration examples.

ASMKubernetesLLM
0 likes · 14 min read
How to Enable LLM Traffic Observability with Alibaba Cloud Service Mesh (ASM)
AI Large Model Application Practice
AI Large Model Application Practice
Jan 3, 2025 · Artificial Intelligence

How to Build an Orchestrator‑Workers AI Agent Workflow with Pydantic AI

This article explains the Orchestrator‑Workers pattern from Anthropic’s “Build effective agents”, compares it with routing and parallel modes, distinguishes it from Supervisor agents, and provides a step‑by‑step Python implementation using Pydantic AI, including model definitions, prompts, orchestration logic, worker execution, and a test example.

AI agentsLLMOrchestrator-Workers
0 likes · 9 min read
How to Build an Orchestrator‑Workers AI Agent Workflow with Pydantic AI
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 3, 2025 · Artificial Intelligence

Build an Education‑Focused Retrieval‑Augmented Generation (RAG) Solution with Alibaba PAI

This guide walks you through creating a RAG‑enhanced AI solution for education using Alibaba PAI, covering prerequisite setup, knowledge‑base construction with PAI‑Designer, model deployment, connection configuration, workflow assembly, and a side‑by‑side comparison of RAG versus non‑RAG answers.

AI PlatformLLMMilvus
0 likes · 16 min read
Build an Education‑Focused Retrieval‑Augmented Generation (RAG) Solution with Alibaba PAI
Infra Learning Club
Infra Learning Club
Jan 2, 2025 · Artificial Intelligence

Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion

In 2025, large language models will see three key trends—agents becoming pervasive in daily life and industry, the emergence of efficient small models for edge and specialized tasks, and the integration of multimodal capabilities that combine text, images, and audio to enable more natural human‑machine interaction.

AI trendsLLMMultimodal
0 likes · 4 min read
Three Major LLM Trends in 2025: Ubiquitous Agents, Rising Small Models, and Multimodal Fusion
DataFunSummit
DataFunSummit
Jan 1, 2025 · Artificial Intelligence

Challenges and Evaluation Strategies for LLM Agents in 2024

The article outlines the rapid progress of LLM agents in 2024 while highlighting key difficulties in planning capabilities, evaluation methods, dataset generation, and metric design, and suggests practical combinations and product‑level enhancements to improve efficiency, accuracy, and usability.

AIAgentDataset
0 likes · 3 min read
Challenges and Evaluation Strategies for LLM Agents in 2024
ByteFE
ByteFE
Dec 31, 2024 · Artificial Intelligence

In‑Depth Review of Cursor: AI‑Powered Coding Assistant, Capabilities, Use Cases, and Limitations

This article evaluates the Cursor AI coding assistant, describing its context‑aware indexing, Composer panel, and code‑generation features, while outlining practical scenarios such as Q&A, test creation, language conversion, and prototype development, and discussing its inherent randomness, domain‑knowledge gaps, and best‑practice recommendations for developers.

AI coding assistantLLMcode-generation
0 likes · 27 min read
In‑Depth Review of Cursor: AI‑Powered Coding Assistant, Capabilities, Use Cases, and Limitations
ZhongAn Tech Team
ZhongAn Tech Team
Dec 28, 2024 · Artificial Intelligence

Weekly AI Digest Issue 8: OpenAI Robotics, ModernBERT Upgrade, Spatial Cognition, LLM Agent Evolution, and GNN‑LLM Fusion

This issue surveys recent AI developments, covering OpenAI's renewed robot program, the ModernBERT encoder upgrade, spatial reasoning advances in multimodal models, automated environment generation for LLM agents, and a novel GNN‑LLM approach for label‑free node classification.

Artificial IntelligenceBERTLLM
0 likes · 10 min read
Weekly AI Digest Issue 8: OpenAI Robotics, ModernBERT Upgrade, Spatial Cognition, LLM Agent Evolution, and GNN‑LLM Fusion
DataFunTalk
DataFunTalk
Dec 28, 2024 · Big Data

Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI

This article examines the current challenges of enterprise data analysis platforms, outlines three traditional analysis modes, and presents a next‑generation solution that combines Headless BI’s semantic modeling with Chat BI’s large‑language‑model interaction to deliver a more efficient, secure, and user‑friendly analytics experience.

ChatBIDataGovernanceHeadlessBI
0 likes · 15 min read
Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI
Volcano Engine Developer Services
Volcano Engine Developer Services
Dec 26, 2024 · Artificial Intelligence

How LLMs Can Auto-Generate Unit Tests: Insights from ByteDance’s QCon Talk

This article summarizes ByteDance’s quality‑efficiency expert Zhao Liang’s QCon presentation on using large language models to automatically generate unit tests, covering pain points, goals, data‑quality engineering, model‑analysis fusion, architecture, evaluation metrics, and future plans for a production‑grade testing tool.

AILLMTest Generation
0 likes · 26 min read
How LLMs Can Auto-Generate Unit Tests: Insights from ByteDance’s QCon Talk
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 24, 2024 · Artificial Intelligence

Build a Medical RAG Solution with Alibaba PAI: Step-by-Step Guide

Learn how to create a Retrieval‑Augmented Generation (RAG) system for medical applications using Alibaba's PAI platform, covering knowledge‑base construction with PAI‑Designer, template setup in PAI‑LangStudio, deployment of LLM and embedding models, vector database integration, and end‑to‑end workflow configuration.

EmbeddingLLMMilvus
0 likes · 18 min read
Build a Medical RAG Solution with Alibaba PAI: Step-by-Step Guide
NewBeeNLP
NewBeeNLP
Dec 23, 2024 · Artificial Intelligence

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

The Qwen2.5 Technical Report introduces a new series of large language models with up to 72 B parameters, expanded pre‑training data to 18 trillion tokens, advanced supervised fine‑tuning and reinforcement learning pipelines, and demonstrates strong performance across comprehension, reasoning, coding, and long‑context tasks.

Fine-tuningLLMQwen2.5
0 likes · 5 min read
What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances
DataFunSummit
DataFunSummit
Dec 22, 2024 · Artificial Intelligence

From Concept to Deployment: The Evolution of 1688’s AI Purchasing Assistant “Yuanbao”

This article chronicles the development of 1688’s AI buyer assistant “Yuanbao”, detailing why an e‑commerce AI assistant is needed, its functional design, MVP constraints, the shift to a data‑driven 2.0 version, future prospects, and a Q&A, providing practical insights for AI product rollout in B‑to‑C platforms.

AIAgentData-driven
0 likes · 24 min read
From Concept to Deployment: The Evolution of 1688’s AI Purchasing Assistant “Yuanbao”
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 18, 2024 · Artificial Intelligence

How STAR Enables Training‑Free Recommendations with Large Language Models

The article reviews the STAR framework, a training‑free recommendation approach that leverages large language model embeddings and collaborative co‑occurrence scores to retrieve and rank items, and evaluates its performance, hyper‑parameter effects, and ablation studies against existing LLM‑based recommender methods.

Artificial IntelligenceLLMcollaborative filtering
0 likes · 10 min read
How STAR Enables Training‑Free Recommendations with Large Language Models
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 17, 2024 · Frontend Development

Choosing the Best LangChain Text Splitter for Frontend LLM Apps

This article compares five LangChain text splitters—CharacterTextSplitter, RecursiveCharacterTextSplitter, TokenTextSplitter, MarkdownTextSplitter, and LatexTextSplitter—by examining their principles, pros and cons, and ideal use cases, helping developers select the most suitable splitter for their frontend large‑model applications.

JavaScriptLLMLangChain
0 likes · 10 min read
Choosing the Best LangChain Text Splitter for Frontend LLM Apps
Huolala Tech
Huolala Tech
Dec 17, 2024 · Artificial Intelligence

How to Secure AI Agents: Privacy Risks, Threats, and Governance Strategies

This article examines the rapid growth of AI agents, outlines typical privacy and security challenges such as data leakage, model attacks, and prompt injection, and proposes comprehensive governance and technical measures to mitigate these risks in enterprise deployments.

AI agentsLLMgovernance
0 likes · 22 min read
How to Secure AI Agents: Privacy Risks, Threats, and Governance Strategies
Huolala Safety Emergency Response Center
Huolala Safety Emergency Response Center
Dec 17, 2024 · Information Security

How Secure Are AI Agents? Risks, Attacks, and Governance Strategies

This article examines the rapid growth of AI agents, outlines their core components and classifications, analyzes a wide range of privacy and security threats—including data leakage, prompt injection, jailbreak, backdoor, hallucination, and memory attacks—and proposes practical governance measures to mitigate these risks.

AI agentsLLMgovernance
0 likes · 25 min read
How Secure Are AI Agents? Risks, Attacks, and Governance Strategies
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 16, 2024 · Artificial Intelligence

What Do Leading Open‑Source LLMs Do After Pretraining? A Deep Dive into Post‑Training Strategies

This article surveys the post‑training pipelines of major open‑source large language models released this year, detailing their alignment algorithms, data synthesis, reward modeling, DPO/GRPO variants, long‑context handling, tool use, and model‑averaging techniques, and highlights emerging trends such as data‑centric pipelines and iterative weak‑to‑strong alignment.

AI researchAlignmentLLM
0 likes · 99 min read
What Do Leading Open‑Source LLMs Do After Pretraining? A Deep Dive into Post‑Training Strategies
ZhongAn Tech Team
ZhongAn Tech Team
Dec 15, 2024 · Artificial Intelligence

AI Weekly Digest Issue 6: OpenAI’s AI Christmas Season, LeCun’s AGI Forecast, Chinese Text‑to‑Image Breakthrough, and EchoMimic V2

This issue reviews OpenAI’s twelve‑day product launch, LeCun’s surprising AGI timeline, a new Chinese text‑to‑image capability from ByteDance’s Doubao, and the open‑source EchoMimic V2 digital‑human system, highlighting trends, technical details, and industry reactions across the AI landscape.

Artificial IntelligenceChinese Text GenerationEchoMimic
0 likes · 13 min read
AI Weekly Digest Issue 6: OpenAI’s AI Christmas Season, LeCun’s AGI Forecast, Chinese Text‑to‑Image Breakthrough, and EchoMimic V2
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 15, 2024 · Artificial Intelligence

What Are the Best Practices for Retrieval‑Augmented Generation (RAG)?

This comprehensive study evaluates various components of Retrieval‑Augmented Generation pipelines—including query classification, chunking, embedding models, vector databases, retrieval, re‑ranking, summarization, and generator fine‑tuning—identifies optimal configurations, and proposes best‑practice guidelines for both performance‑maximizing and efficiency‑balanced RAG systems.

Fine-tuningLLMRAG
0 likes · 17 min read
What Are the Best Practices for Retrieval‑Augmented Generation (RAG)?
Fighter's World
Fighter's World
Dec 14, 2024 · Industry Insights

Sequoia’s 2025 AI Outlook: From Hype to Real‑World Value

Sequoia Capital’s 2025 AI outlook argues that the industry is shifting from early excitement and massive spending to a phase focused on differentiated large‑model providers, AI‑search as a killer app, and a more disciplined, ROI‑driven investment climate.

2025 predictionsAIAI investment
0 likes · 16 min read
Sequoia’s 2025 AI Outlook: From Hype to Real‑World Value
DevOps
DevOps
Dec 12, 2024 · Artificial Intelligence

The Future of Large Language Models: From Consumer Q&A to Agentic Workflows

Andrew Ng highlights that large language models are shifting from optimizing simple question‑answering for consumers to supporting complex agentic workflows, including tool usage, computer interaction, and multi‑agent collaboration, signaling a major evolution in AI capabilities.

AI agentsAI trendsAgentic AI
0 likes · 8 min read
The Future of Large Language Models: From Consumer Q&A to Agentic Workflows
AI Large Model Application Practice
AI Large Model Application Practice
Dec 12, 2024 · Artificial Intelligence

Mastering AutoGen: Build Multi‑Agent LLM Applications in Minutes

AutoGen, Microsoft’s advanced multi‑agent framework, lets developers quickly assemble collaborative LLM agents—supporting chat, tool use, and hierarchical group chats—through concise Python code, with examples ranging from simple two‑agent dialogues to complex three‑agent reporting pipelines, while outlining its strengths, limitations, and upcoming v0.4 enhancements.

AIAutoGenFramework
0 likes · 9 min read
Mastering AutoGen: Build Multi‑Agent LLM Applications in Minutes
Airbnb Technology Team
Airbnb Technology Team
Dec 12, 2024 · Artificial Intelligence

Airbnb Automation Platform v2: Enabling LLM‑Driven Conversational AI

Airbnb’s Automation Platform v2 replaces the rigid, workflow‑driven architecture of v1 with an LLM‑centric design that orchestrates context gathering, chain‑of‑thought reasoning, tool execution, and guardrails, enabling more natural, scalable, and safe conversational AI while preserving the reliability of traditional workflows.

AI ArchitectureAirbnbConversational AI
0 likes · 11 min read
Airbnb Automation Platform v2: Enabling LLM‑Driven Conversational AI
37 Interactive Technology Team
37 Interactive Technology Team
Dec 9, 2024 · Artificial Intelligence

Optimizing Request Concurrency for LLM Workflows: Rationale, Implementation, and Results

By breaking iterable inputs into parallel LLM calls and batching 20 items across three languages within Dify’s platform limits, the workflow achieves 43‑64% average runtime reductions and markedly higher success rates, demonstrating that request‑level concurrency dramatically improves throughput for large‑scale translation tasks.

CozeDifyLLM
0 likes · 6 min read
Optimizing Request Concurrency for LLM Workflows: Rationale, Implementation, and Results
DataFunSummit
DataFunSummit
Dec 4, 2024 · Artificial Intelligence

Accelerating Large Language Model Inference with the YiNian LLM Framework

This article presents the YiNian LLM framework, detailing how KVCache, prefill/decoding separation, continuous batching, PageAttention, and multi‑hardware scheduling are used to speed up large language model inference while managing GPU memory and latency.

AI accelerationContinuous BatchingGPU
0 likes · 20 min read
Accelerating Large Language Model Inference with the YiNian LLM Framework
DaTaobao Tech
DaTaobao Tech
Dec 4, 2024 · Artificial Intelligence

LLM‑Powered Live Stream Analysis and Automation for E‑commerce

Taobao’s self‑operated live‑stream team built an end‑to‑end pipeline that downloads benchmark videos, transcribes audio, and uses GPT‑4o prompts to automatically summarize sales highlights, visual cues, and comments, delivering actionable insights that match manual notes, free operators for core tasks, and enable features like coupon pushes and intelligent product recommendations.

LLMautomatione‑commerce
0 likes · 15 min read
LLM‑Powered Live Stream Analysis and Automation for E‑commerce
AI Large Model Application Practice
AI Large Model Application Practice
Dec 2, 2024 · Artificial Intelligence

Master CrewAI: Build Multi‑Agent Systems Quickly with Flows and a Full Demo

This article introduces CrewAI, a high‑level Python framework for constructing multi‑agent systems, explains its core concepts such as Crew, Agent, Tool, Task and Process, walks through a complete demo with code, evaluates its strengths and limitations, and showcases the new Flows feature for more flexible workflow orchestration.

AI FrameworkCrewAIFlows
0 likes · 15 min read
Master CrewAI: Build Multi‑Agent Systems Quickly with Flows and a Full Demo
JavaEdge
JavaEdge
Dec 1, 2024 · Artificial Intelligence

Exploring the Limits and Benchmarks of Qwen’s QwQ‑32B‑Preview AI Model

QwQ‑32B‑Preview, an experimental AI model from the Qwen team, showcases strong reasoning in math and programming while facing challenges like language switching, inference loops, safety concerns, and variable capabilities across domains, with benchmark scores ranging from 50% to over 90% on tests such as GPQA, AIME, MATH‑500, and LiveCodeBench.

AI BenchmarkLLMModel Evaluation
0 likes · 7 min read
Exploring the Limits and Benchmarks of Qwen’s QwQ‑32B‑Preview AI Model
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Nov 29, 2024 · Artificial Intelligence

How GraphRAG Transforms Global QA with Structured Retrieval

This article examines GraphRAG—a graph‑enhanced Retrieval‑Augmented Generation approach—detailing its core concepts, the practical challenges of deploying it in enterprise settings, and the engineering solutions and future directions that enable more accurate, efficient, and explainable global question‑answering systems.

Global QAGraphRAGLLM
0 likes · 16 min read
How GraphRAG Transforms Global QA with Structured Retrieval
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 28, 2024 · Artificial Intelligence

Understanding Tokenizers and Embeddings in Large Language Models

This article introduces the core concepts of tokenizers and embeddings in large language models, explains how they convert text into numeric IDs and dense vectors, compares different tokenization strategies, and provides practical JavaScript and TensorFlow.js code examples for beginners.

AI fundamentalsJavaScriptLLM
0 likes · 10 min read
Understanding Tokenizers and Embeddings in Large Language Models
Sohu Tech Products
Sohu Tech Products
Nov 27, 2024 · Artificial Intelligence

RAG Technology and Practical Application in Multi-Modal Query: Using Chinese-CLIP and Redis Search

The article explains how Retrieval‑Augmented Generation (RAG) outperforms direct LLM inference by enabling real‑time knowledge updates and lower costs, and demonstrates a practical multi‑modal RAG pipeline that uses Chinese‑CLIP for vector encoding, various chunking strategies, and Redis Search for fast vector storage and retrieval.

Chinese-CLIPLLMRAG
0 likes · 17 min read
RAG Technology and Practical Application in Multi-Modal Query: Using Chinese-CLIP and Redis Search
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 27, 2024 · Artificial Intelligence

How to Train, Evaluate, and Deploy Qwen2.5-Coder on Alibaba Cloud PAI‑QuickStart

This guide walks developers through the entire lifecycle of Qwen2.5‑Coder—covering model sizes, training token expansion, resource requirements, fine‑tuning with SFT/DPO, evaluation on custom and public datasets, and one‑click deployment and compression on Alibaba Cloud's PAI‑QuickStart platform.

DeploymentLLMModel Training
0 likes · 15 min read
How to Train, Evaluate, and Deploy Qwen2.5-Coder on Alibaba Cloud PAI‑QuickStart
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 24, 2024 · Artificial Intelligence

How Marco‑o1 Merges Chain‑of‑Thought Fine‑Tuning with Monte‑Carlo Tree Search for Superior Reasoning

The article introduces Marco‑o1, an open‑source LLM that enhances complex reasoning by fine‑tuning on Chain‑of‑Thought data, integrating Monte‑Carlo Tree Search, introducing mini‑step actions and a reflection mechanism, and evaluates its performance on multilingual math and translation benchmarks.

Artificial IntelligenceChain-of-ThoughtLLM
0 likes · 15 min read
How Marco‑o1 Merges Chain‑of‑Thought Fine‑Tuning with Monte‑Carlo Tree Search for Superior Reasoning
System Architect Go
System Architect Go
Nov 24, 2024 · Artificial Intelligence

Building a Web Voice Chatbot with Whisper, llama.cpp, and LLM

This article demonstrates how to build a web‑based voice chatbot by integrating Whisper speech‑to‑text, llama.cpp LLM inference, and WebSocket communication, detailing both the frontend JavaScript implementation and the Python FastAPI backend, along with Docker deployment and example code.

FastAPIJavaScriptLLM
0 likes · 10 min read
Building a Web Voice Chatbot with Whisper, llama.cpp, and LLM
DaTaobao Tech
DaTaobao Tech
Nov 20, 2024 · Mobile Development

MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment

MNN‑Transformer provides an end‑to‑end framework that enables large language and diffusion models to run efficiently on modern smartphones by exporting, quantizing (including dynamic int4/int8 and KV cache compression) and executing via a plugin‑engine runtime, achieving up to 35 tokens/s decoding and 2‑3× faster image generation compared with existing on‑device solutions.

LLMMNNMobile AI
0 likes · 15 min read
MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment
System Architect Go
System Architect Go
Nov 19, 2024 · Artificial Intelligence

Retrieval Augmented Generation (RAG) System Overview and Implementation with LangChain, Redis, and llama.cpp

This article explains the concept, architecture, and step‑by‑step implementation of Retrieval Augmented Generation (RAG), covering indexing, retrieval & generation processes, a practical LangChain‑Redis‑llama.cpp example on Kubernetes, code snippets, test results, challenges, and references.

AIEmbeddingLLM
0 likes · 6 min read
Retrieval Augmented Generation (RAG) System Overview and Implementation with LangChain, Redis, and llama.cpp
dbaplus Community
dbaplus Community
Nov 16, 2024 · Artificial Intelligence

Are LLM Frameworks Overhyped? A Critical Look at RAG and Reusability

The article critiques LLM frameworks, comparing them to early ORM tools, explains how Retrieval Augmented Generation works, warns against premature optimization, and advises developers to favor simple, visible practices over complex, abstracted frameworks for better control and understanding.

AILLMModelEvaluation
0 likes · 7 min read
Are LLM Frameworks Overhyped? A Critical Look at RAG and Reusability
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 15, 2024 · Frontend Development

How to Build Real‑Time LLM Streaming in the Browser with Fetch

This article explains the mechanism of HTTP API streaming for large language models and shows step‑by‑step how front‑end developers can use the Fetch API, readable streams, and incremental UI updates to deliver real‑time, progressive results while handling errors and connection interruptions.

Front-endHTTP streamingJavaScript
0 likes · 9 min read
How to Build Real‑Time LLM Streaming in the Browser with Fetch
Linux Kernel Journey
Linux Kernel Journey
Nov 14, 2024 · Artificial Intelligence

Deep Dive: How DeepFlow Collects Business Metrics for Large‑Model Services

This article explains how China Mobile built a hybrid‑cloud production environment for its customer‑service LLM, using eBPF and WebAssembly plugins from DeepFlow to achieve zero‑intrusion observability, automatically capture full‑stack topology, application/network metrics, and key LLM business indicators such as TTFT, TPOT, and token throughput.

DeepFlowGrafanaLLM
0 likes · 19 min read
Deep Dive: How DeepFlow Collects Business Metrics for Large‑Model Services
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 14, 2024 · Artificial Intelligence

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

This article details the end‑to‑end process of pre‑training, fine‑tuning, and evaluating a 1‑billion‑parameter Chinese LLM named Steel‑LLM on limited hardware, covering data collection, pipeline design, training framework choices, architectural tweaks, performance results, and practical lessons for resource‑constrained developers.

LLMModel architectureTraining Optimization
0 likes · 18 min read
How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned
Aikesheng Open Source Community
Aikesheng Open Source Community
Nov 12, 2024 · Artificial Intelligence

ChatDBA: An AI‑Powered Database Fault Diagnosis Assistant Using Large Language Models

ChatDBA is a conversational AI system built by Shanghai Aikesheng that employs large language models and Retrieval‑Augmented Generation to help database administrators diagnose faults, learn domain knowledge, and generate or optimize SQL, with a redesigned architecture that addresses early‑stage shortcomings and outlines future enhancements.

ChatDBAFault DiagnosisKnowledge Base
0 likes · 10 min read
ChatDBA: An AI‑Powered Database Fault Diagnosis Assistant Using Large Language Models
JD Cloud Developers
JD Cloud Developers
Nov 11, 2024 · Artificial Intelligence

Mastering Prompt Engineering: History, Techniques, and Real-World Applications

This article explains what Prompt Engineering is, traces its evolution from early NLP commands to modern adaptive and multimodal prompting, details core techniques such as Zero‑shot, Chain‑of‑Thought, Auto‑CoT, and reduction of hallucinations, and showcases a logistics case study using various prompting strategies.

AIChain-of-ThoughtLLM
0 likes · 26 min read
Mastering Prompt Engineering: History, Techniques, and Real-World Applications
Fighter's World
Fighter's World
Nov 11, 2024 · Artificial Intelligence

How CoCounsel’s $650M Acquisition Reveals Key Design Principles for LLM‑Powered Legal Tools

The article examines how Casetext’s CoCounsel, an AI‑driven legal assistant acquired by Thomson Reuters for $650 million, achieved rapid growth by prioritizing accuracy, workflow integration, user‑centered design, security, and continuous improvement, and distills the critical challenges and success factors for building LLM‑native products in low‑tolerance B2B environments.

AI ethicsB2B SaaSLLM
0 likes · 11 min read
How CoCounsel’s $650M Acquisition Reveals Key Design Principles for LLM‑Powered Legal Tools
Alibaba Cloud Observability
Alibaba Cloud Observability
Nov 8, 2024 · Cloud Native

Enable Python Probe for LLM Observability on Alibaba Cloud ACK

This guide explains how to integrate Alibaba Cloud's Python probe into a Kubernetes (ACK) environment to monitor large language model (LLM) applications, covering prerequisites, installation steps, Dockerfile modifications, resource permissions, and sample Python code for both server and client components.

ARMSCloud NativeDocker
0 likes · 16 min read
Enable Python Probe for LLM Observability on Alibaba Cloud ACK
CSS Magic
CSS Magic
Nov 8, 2024 · Artificial Intelligence

LLM Application Development Tips (3): Exploring LLM API Inputs and Outputs

This article explains how to configure key OpenAI chat completion parameters—such as temperature, top_p, streaming, response format, and tool selection—and walks through the structure of the API's JSON response, highlighting fields like id, model, choices, finish_reason, and usage for better control and cost estimation.

AI agentsAPI parametersJSON response
0 likes · 8 min read
LLM Application Development Tips (3): Exploring LLM API Inputs and Outputs
Alimama Tech
Alimama Tech
Nov 6, 2024 · Artificial Intelligence

How AI Generates Synchronized Video Narrations for E‑Commerce

This article presents the research behind Synchronized Video Storytelling, introducing the E‑SyncVidStory dataset, the VideoNarrator multimodal architecture, and extensive experiments that demonstrate high‑quality, product‑aware video narration generation for e‑commerce applications.

DatasetLLMMultimodal AI
0 likes · 12 min read
How AI Generates Synchronized Video Narrations for E‑Commerce
37 Interactive Technology Team
37 Interactive Technology Team
Nov 4, 2024 · Artificial Intelligence

Developing RAG and Agent Applications with LangChain: A Case Study of an AI Assistant for Activity Components

The article outlines a step‑by‑step methodology for creating Retrieval‑Augmented Generation and custom Agent applications with LangChain, illustrated by an AI assistant for activity components that evolves from a rapid Dify prototype to a LangChain‑based RAG system and finally a hand‑crafted ReAct‑style agent, detailing LCEL chain composition, vector‑search integration, model performance trade‑offs, and a unified routing layer.

AI AssistantAgentCloud-native
0 likes · 6 min read
Developing RAG and Agent Applications with LangChain: A Case Study of an AI Assistant for Activity Components
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 4, 2024 · Artificial Intelligence

Uncovering 16 Limits of AI Search Engines and 16 Design Recommendations

A user study with 21 participants reveals sixteen critical limitations of generative AI search engines, maps them to eight quantitative metrics, proposes sixteen design recommendations, and evaluates You.com, Perplexity and BingChat against this framework to highlight current performance gaps.

AI searchGenerative SearchLLM
0 likes · 12 min read
Uncovering 16 Limits of AI Search Engines and 16 Design Recommendations
CSS Magic
CSS Magic
Nov 1, 2024 · Artificial Intelligence

Refining System Prompts for LLMs: Practical Tips for Batch Automation

This article explains how to automate batch document processing with LLM APIs by mastering the messages parameter, defining system, user, and assistant roles, and iteratively polishing system prompts through scripts or OpenAI's GPTs editor and Playground interfaces.

ChatGPTLLMOpenAI API
0 likes · 7 min read
Refining System Prompts for LLMs: Practical Tips for Batch Automation
DataFunTalk
DataFunTalk
Oct 31, 2024 · Artificial Intelligence

Tencent OlaChat: An LLM‑Powered Intelligent Business Intelligence Platform – Architecture, Capabilities, and Practice

This article presents the evolution from traditional to intelligent BI, explores how large language models enable natural‑language data analysis, details the OlaChat platform’s architecture, metadata‑enhanced retrieval methods, Text2SQL pipeline, multi‑turn dialogue system, and shares practical deployment insights and Q&A.

Business IntelligenceIntelligent AnalyticsLLM
0 likes · 20 min read
Tencent OlaChat: An LLM‑Powered Intelligent Business Intelligence Platform – Architecture, Capabilities, and Practice
NewBeeNLP
NewBeeNLP
Oct 31, 2024 · Artificial Intelligence

How o1 Is Redefining LLM Engineering and What It Means for AI Professionals

The article examines OpenAI's o1 model, highlighting its unprecedented scientific capabilities, its shift from a chat toy to a high‑value tool, the potential impact on algorithm engineers, and the technical directions (RLHF, MCTS, PPO, PRM) that practitioners should master to stay relevant.

AILLMmodel analysis
0 likes · 8 min read
How o1 Is Redefining LLM Engineering and What It Means for AI Professionals
DaTaobao Tech
DaTaobao Tech
Oct 30, 2024 · Artificial Intelligence

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

The article explains how OpenAI’s o1 model leverages chain‑of‑thought prompting, dual‑system cognitive theory, and new scaling laws—pre‑training on code/math and post‑training reinforcement with step‑wise reward models—to achieve superior reasoning, safety, and performance over GPT‑4, heralding a shift toward models that learn to think.

Chain-of-ThoughtLLMReinforcement Learning
0 likes · 42 min read
Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 29, 2024 · Industry Insights

Inside Perplexity AI: How RAG Powers the Next‑Gen Search Engine

In this interview, Perplexity AI CEO Aravind Srinivas explains the company’s retrieval‑augmented generation architecture, multi‑model strategy, vector‑database use, competitive positioning against Google, monetization plans, and future product road‑map, offering a deep industry perspective on AI‑driven search.

AI startupIndustry AnalysisLLM
0 likes · 38 min read
Inside Perplexity AI: How RAG Powers the Next‑Gen Search Engine
CSS Magic
CSS Magic
Oct 29, 2024 · Artificial Intelligence

LLM Application Development Tips (1): How to Choose the Right Model

With a growing array of overseas and domestic LLM APIs in 2024, this guide explains how to pick the right model—starting with a top‑tier option like GPT‑4o for feasibility testing, then moving to cost‑effective or Chinese alternatives, while weighing price, inference speed, context window, API compatibility, and rate limits.

API compatibilityChinese LLMGPT-4o
0 likes · 8 min read
LLM Application Development Tips (1): How to Choose the Right Model
DevOps
DevOps
Oct 27, 2024 · Artificial Intelligence

Best Practices for Building Efficient Retrieval‑Augmented Generation (RAG) Systems

This article reviews Wang et al.'s 2024 research on Retrieval‑Augmented Generation, outlining optimal practices such as query classification, chunk sizing, hybrid metadata search, embedding selection, vector databases, query transformation, reranking, document repacking, summarization, fine‑tuning, and multimodal retrieval to guide developers in constructing high‑performance RAG pipelines.

LLMQuery ClassificationRAG
0 likes · 11 min read
Best Practices for Building Efficient Retrieval‑Augmented Generation (RAG) Systems
Alibaba Cloud Native
Alibaba Cloud Native
Oct 26, 2024 · Artificial Intelligence

Build a Real‑Time Semantic Search with EventBridge, DashVector, and FunctionCompute

This tutorial walks through constructing a zero‑to‑one RAG pipeline that ingests OSS text files via EventBridge, transforms them into embeddings with DashScope, stores vectors in DashVector, and performs semantic search using FunctionCompute and a Qwen‑Turbo LLM, complete with code samples and configuration steps.

DashVectorEmbeddingEventBridge
0 likes · 10 min read
Build a Real‑Time Semantic Search with EventBridge, DashVector, and FunctionCompute
System Architect Go
System Architect Go
Oct 25, 2024 · Artificial Intelligence

Designing and Extending a Self‑Built ChatGPT System: Architecture, Session Management, and Scaling Strategies

This article explains how to construct a ChatGPT‑like conversational system by detailing the core dialogue flow, adding session and history management with a database, defining REST APIs, and exploring extensions such as caching, elastic scaling, and production‑ready deployment considerations.

ChatGPTLLMScalability
0 likes · 7 min read
Designing and Extending a Self‑Built ChatGPT System: Architecture, Session Management, and Scaling Strategies
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 25, 2024 · Artificial Intelligence

How to Use Importance Sampling for Effective Continue Pretraining of LLMs

Continuing pretraining (CP) bridges pretraining and SFT to inject domain knowledge, but faces catastrophic forgetting; this article explores leveraging importance sampling to balance common and domain data, discusses data selection, annealing strategies, and practical tips for mitigating forgetting while enhancing specialized capabilities.

Catastrophic ForgettingContinue PretrainingImportance Sampling
0 likes · 8 min read
How to Use Importance Sampling for Effective Continue Pretraining of LLMs
System Architect Go
System Architect Go
Oct 24, 2024 · Artificial Intelligence

How to Fine‑Tune Translation Models on Kubernetes Docs with LoRA

This article walks through the complete process of fine‑tuning both domain‑specific and large‑language translation models on Kubernetes documentation, covering data preparation, model selection, training configurations, the differences between Seq2Seq and CausalLM, and how LoRA can dramatically reduce resource usage while improving performance.

AIFine-tuningLLM
0 likes · 7 min read
How to Fine‑Tune Translation Models on Kubernetes Docs with LoRA
21CTO
21CTO
Oct 23, 2024 · Artificial Intelligence

IBM Unveils Granite 3.0 LLMs: Open‑Source, Secure, and Cost‑Effective AI Models

IBM introduced the Granite 3.0 series, an open‑source family of large language models that combine cutting‑edge performance with enhanced security, multi‑language support, and cost‑efficiency, while offering a variety of base, instruct, and specialist variants for enterprise use.

AI modelsGraniteIBM
0 likes · 4 min read
IBM Unveils Granite 3.0 LLMs: Open‑Source, Secure, and Cost‑Effective AI Models
DaTaobao Tech
DaTaobao Tech
Oct 23, 2024 · Artificial Intelligence

Retrieval-Augmented Generation (RAG): Principles, Applications, Limitations and Challenges

Retrieval-Augmented Generation (RAG) combines a retriever that fetches relevant external documents and a generator that uses them, improving LLM accuracy, relevance, privacy, and up-to-date information, but faces challenges such as retrieval latency, computational cost, chunking strategies, embedding selection, and system integration complexity.

AIKnowledge RetrievalLLM
0 likes · 13 min read
Retrieval-Augmented Generation (RAG): Principles, Applications, Limitations and Challenges
Baidu Geek Talk
Baidu Geek Talk
Oct 23, 2024 · Artificial Intelligence

Integrating Yuan 2.0 Large Model with PaddleNLP: Overview, Usage Steps, and Interaction Examples

The open‑source Yuan 2.0 large model is fully integrated into Baidu’s PaddleNLP, offering quick inference for tasks like code generation, translation, and reasoning, along with efficient distributed training and fine‑tuning features such as Zero Padding optimization, enabling developers to easily deploy and customize the model via simple setup steps and example interactions.

AILLMPaddleNLP
0 likes · 10 min read
Integrating Yuan 2.0 Large Model with PaddleNLP: Overview, Usage Steps, and Interaction Examples
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 22, 2024 · Artificial Intelligence

How Alibaba Cloud Optimizes Enterprise RAG: Key Techniques for AI Search

At the 2024 Alibaba Cloud Yúnxī Conference, senior AI Search expert Xing Shaomin detailed the enterprise‑grade Retrieval‑Augmented Generation (RAG) pipeline, covering critical link architecture, effectiveness, performance, and cost optimizations, as well as practical applications, vector store enhancements, LLM agents, and deployment strategies.

AI searchCost OptimizationEnterprise AI
0 likes · 16 min read
How Alibaba Cloud Optimizes Enterprise RAG: Key Techniques for AI Search
DataFunSummit
DataFunSummit
Oct 18, 2024 · Artificial Intelligence

Building Efficient RAG Applications with a Small Team: Insights from PingCAP AI Lab

This article details how PingCAP's three‑person AI Lab leveraged Retrieval‑Augmented Generation (RAG) techniques—including basic RAG, fine‑tuned embeddings, re‑ranking, graph RAG, and agent‑based RAG—to create scalable, multilingual document‑question answering services while addressing large‑scale documentation challenges, model limitations, and user feedback loops.

AgentEmbeddingFine-tuning
0 likes · 14 min read
Building Efficient RAG Applications with a Small Team: Insights from PingCAP AI Lab
NewBeeNLP
NewBeeNLP
Oct 16, 2024 · Artificial Intelligence

Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention

This article reviews recent advances in training and inference for long‑sequence large language models, comparing ALIBI and RoPE position embeddings, exploring RoPE scaling techniques, analyzing attention optimizations, and outlining practical data, evaluation, and system frameworks for scalable LLM deployment.

Flash AttentionLLMRoPE
0 likes · 14 min read
Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 16, 2024 · Artificial Intelligence

How the DB3 Team Won the Meta CRAG RAG Challenge: Prompts, Retrieval, and LoRA Fine‑Tuning

This article analyzes the Meta Comprehensive RAG (CRAG) benchmark, detailing its three tasks, evaluation metrics, and the champion DB3 team's end‑to‑end solution that combines data preprocessing, dual‑stage retrieval, prompt engineering, LoRA‑based fine‑tuning, and public data augmentation to achieve top scores across all tasks.

LLMLoRARAG
0 likes · 17 min read
How the DB3 Team Won the Meta CRAG RAG Challenge: Prompts, Retrieval, and LoRA Fine‑Tuning
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 13, 2024 · Artificial Intelligence

Can Hierarchical LLMs Transform Sequential Recommendation? A Deep Dive

This article provides a comprehensive analysis of the HLLM paper, detailing its hierarchical LLM architecture for item and user modeling, the training objectives, fusion strategies, extensive offline and online experiments, scaling behavior, ablation studies, and practical deployment insights in large‑scale recommendation systems.

Industrial DeploymentLLMSequential Modeling
0 likes · 12 min read
Can Hierarchical LLMs Transform Sequential Recommendation? A Deep Dive
JD Tech
JD Tech
Oct 13, 2024 · Artificial Intelligence

Building a Simple Local AI Question‑Answer System with Java, LangChain4J, Ollama, and ChromaDB

This article guides readers through the concepts of large language models, embeddings, vector databases, and Retrieval‑Augmented Generation, then demonstrates step‑by‑step how to set up Ollama, install a local Chroma vector store, configure Maven dependencies, and write Java code using LangChain4J to build and test a functional AI Q&A application.

AILLMLangChain4j
0 likes · 22 min read
Building a Simple Local AI Question‑Answer System with Java, LangChain4J, Ollama, and ChromaDB
AntTech
AntTech
Oct 12, 2024 · Artificial Intelligence

Observations from ISSTA 2024: Conference Highlights, Awarded Papers, Keynotes, and In‑Depth Reviews

The article reports on the 33rd ISSTA 2024 conference in Vienna, summarizing its acceptance statistics, highlighting the Impact Paper Award and Distinguished Papers, detailing keynotes on large‑language‑model‑driven software quality, and providing extensive reviews of selected research works ranging from fuzzing and program repair to database query simplification and AI‑oriented code generation.

ISSTA2024LLMProgramRepair
0 likes · 29 min read
Observations from ISSTA 2024: Conference Highlights, Awarded Papers, Keynotes, and In‑Depth Reviews
21CTO
21CTO
Oct 10, 2024 · Artificial Intelligence

5 Practical AI Projects to Build Your Skills with Python

This article presents five hands‑on AI project ideas—from resume optimization to multimodal search—complete with step‑by‑step instructions, required Python libraries, and code snippets, helping beginners and intermediate developers quickly build valuable AI applications.

AILLMPython
0 likes · 12 min read
5 Practical AI Projects to Build Your Skills with Python
JD Tech Talk
JD Tech Talk
Oct 8, 2024 · Artificial Intelligence

Building a Retrieval‑Augmented Generation (RAG) System with Rust and Qdrant

This article explains how to construct a Retrieval‑Augmented Generation pipeline in Rust, covering knowledge‑base creation with Qdrant, model loading and embedding using the candle library, data ingestion, and integration of a Rust‑based inference service based on mistral.rs, while also discussing resource usage and common pitfalls.

AIEmbeddingLLM
0 likes · 16 min read
Building a Retrieval‑Augmented Generation (RAG) System with Rust and Qdrant