Tagged articles
2016 articles
Page 15 of 21
AI Large Model Application Practice
AI Large Model Application Practice
Mar 3, 2025 · Artificial Intelligence

Can DeepSeek‑R1 Unlock True “Deep Thinking” for Enterprise RAG?

This article examines how swapping in DeepSeek‑R1 enhances Retrieval‑Augmented Generation with deeper reasoning, outlines its benefits and pitfalls—including slower inference, higher compute costs, and hallucinations—provides a simple hallucination test, and proposes an Agentic RAG research assistant to balance accuracy and creativity.

AI reasoningAgenticDeepSeek
0 likes · 10 min read
Can DeepSeek‑R1 Unlock True “Deep Thinking” for Enterprise RAG?
JD Retail Technology
JD Retail Technology
Mar 1, 2025 · Industry Insights

How JD Retail’s AI Assistant Uses Multimodal LLMs to Boost E‑Commerce

JD Retail’s AI assistant combines a Master‑Sub agent framework, ReAct paradigm, multimodal integration and MoE architecture to improve sales forecasting, pricing, and recommendation accuracy, while the team’s collaborative culture and open talent pathways illustrate how cutting‑edge AI is applied in real‑world e‑commerce.

AIJD RetailLLM
0 likes · 8 min read
How JD Retail’s AI Assistant Uses Multimodal LLMs to Boost E‑Commerce
Code Mala Tang
Code Mala Tang
Mar 1, 2025 · Artificial Intelligence

Why Do Large Language Models Hallucinate and How Can We Fix It?

This article explains why large language models produce plausible‑looking but false information, traces the problem to the supervised fine‑tuning stage, and outlines mitigation techniques such as knowledge interrogation, RLHF, and tool‑augmented search to reduce hallucinations.

LLMRLHFTraining
0 likes · 12 min read
Why Do Large Language Models Hallucinate and How Can We Fix It?
AntTech
AntTech
Mar 1, 2025 · Artificial Intelligence

ScaleOT: Privacy‑Utility‑Scalable Offsite‑Tuning with Dynamic LayerReplace and Selective Rank Compression

The ScaleOT framework introduces a privacy‑preserving offsite‑tuning pipeline for large language models that combines importance‑aware dynamic layer replacement with selective rank compression, enabling flexible model compression, near‑lossless fine‑tuning, and strong privacy guarantees across diverse downstream tasks.

AdapterLLMmodel compression
0 likes · 16 min read
ScaleOT: Privacy‑Utility‑Scalable Offsite‑Tuning with Dynamic LayerReplace and Selective Rank Compression
Cognitive Technology Team
Cognitive Technology Team
Feb 28, 2025 · Artificial Intelligence

Design and High‑Availability Architecture of Alibaba LangEngine AI Application Framework

This article introduces Alibaba's LangEngine, a pure Java AI application framework, detailing its high‑availability gateway architecture, communication protocols, streaming and non‑streaming output, multi‑level metadata caching, asynchronous and serverless designs, and future open‑source roadmap, offering practical guidance for building robust AI services.

AI FrameworkLLMLangEngine
0 likes · 11 min read
Design and High‑Availability Architecture of Alibaba LangEngine AI Application Framework
AI Large Model Application Practice
AI Large Model Application Practice
Feb 28, 2025 · Artificial Intelligence

How Self-Attention Powers LLMs: A Step‑by‑Step Deep Dive

This article explains the self‑attention mechanism behind large language models, detailing why static word importance fails, how queries, keys, and values are generated, how attention scores are computed, scaled, softmaxed, and used to produce context‑aware word vectors, while noting computational costs.

AILLMSelf-Attention
0 likes · 9 min read
How Self-Attention Powers LLMs: A Step‑by‑Step Deep Dive
JavaEdge
JavaEdge
Feb 27, 2025 · Artificial Intelligence

How to Quickly Build a DeepSeek‑Powered Knowledge Base on Tencent Cloud

This guide walks through deploying the full‑feature DeepSeek V3+R1 model on Tencent Cloud, configuring a smart knowledge‑base application, importing documentation, enabling internet search, tuning retrieval parameters, and publishing the app for public use, all without writing code.

AIDeepSeekKnowledge Base
0 likes · 6 min read
How to Quickly Build a DeepSeek‑Powered Knowledge Base on Tencent Cloud
Baidu Tech Salon
Baidu Tech Salon
Feb 26, 2025 · Artificial Intelligence

Graph‑Engine‑Driven Workflow for Building Intelligent Agents

The article presents a graph‑engine‑driven workflow platform that lets developers assemble, orchestrate, and execute intelligent LLM‑based agents with low‑code visual design, fine‑grained path control, hierarchical sub‑flows, and event‑driven hooks, addressing perception, reasoning, planning, and scalability challenges while surpassing existing frameworks.

Data DecouplingIntelligent agentsLLM
0 likes · 19 min read
Graph‑Engine‑Driven Workflow for Building Intelligent Agents
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 25, 2025 · Artificial Intelligence

Build a RAG‑Powered Smart Q&A Assistant with Milvus, DeepSeek, and PAI LangStudio

This step‑by‑step guide shows how to assemble a Retrieval‑Augmented Generation (RAG) system using Alibaba Cloud Milvus vector search, the DeepSeek large language model, and PAI LangStudio, covering instance creation, data upload, model deployment, connection setup, flow design, and service invocation.

AI TutorialDeepSeekLLM
0 likes · 9 min read
Build a RAG‑Powered Smart Q&A Assistant with Milvus, DeepSeek, and PAI LangStudio
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 25, 2025 · Artificial Intelligence

How DistilQwen2.5 Boosts LLM Efficiency with Dual‑Stage Knowledge Distillation

This article introduces DistilQwen2.5, a lightweight LLM series built on Qwen2.5 that uses a novel two‑layer distillation framework, instruction‑data optimization, and parameter‑fusion techniques to achieve higher performance while drastically reducing computational cost and deployment overhead.

Efficient InferenceLLMknowledge distillation
0 likes · 26 min read
How DistilQwen2.5 Boosts LLM Efficiency with Dual‑Stage Knowledge Distillation
DataFunSummit
DataFunSummit
Feb 25, 2025 · Artificial Intelligence

Collecting High-Quality LLM Training Data and Custom Model Training Guide

This article explains what constitutes high‑quality LLM training data, why large datasets are essential, outlines the step‑by‑step process for collecting, preprocessing, and fine‑tuning models, and highlights the best data sources—including web content, books, code repositories, and news—while noting available free datasets.

AILLMWeb Scraping
0 likes · 9 min read
Collecting High-Quality LLM Training Data and Custom Model Training Guide
Code Mala Tang
Code Mala Tang
Feb 25, 2025 · Artificial Intelligence

How Resources, Tools, and Prompts Power LLM Super‑Agents

This article explains how the Resources data hub, Tools capability engine, and Prompts interaction templates work together to create a secure, extensible workflow that enables large language models to ingest data, execute tasks, and generate structured outputs.

AI workflowArtificial IntelligenceLLM
0 likes · 5 min read
How Resources, Tools, and Prompts Power LLM Super‑Agents
CSS Magic
CSS Magic
Feb 25, 2025 · Artificial Intelligence

Two Simple Ways to Access DeepSeek API for Free

This guide shows how to obtain free DeepSeek API access through GitHub Models and SiliconFlow, detailing the required API base URL, key, and model name, how to register, create keys, verify usage with a web chat tool, and compare model choices and platform limits.

APIDeepSeekFree access
0 likes · 7 min read
Two Simple Ways to Access DeepSeek API for Free
Baidu Geek Talk
Baidu Geek Talk
Feb 24, 2025 · Artificial Intelligence

Using a Graph Engine to Drive Workflow for Intelligent Agents

By leveraging mature graph‑engine technology, the article shows how visual, low‑code workflow orchestration can give intelligent LLM‑based agents fine‑grained path control, reusable functions, hierarchical sub‑flows, and robust error handling, turning complex business tasks into modular, scalable processes adopted by hundreds of thousands of developers.

AI agentsLLMgraph engine
0 likes · 18 min read
Using a Graph Engine to Drive Workflow for Intelligent Agents
Architecture Digest
Architecture Digest
Feb 24, 2025 · Artificial Intelligence

MoBA: Mixture of Block Attention for Long‑Context Large Language Models

The article introduces MoBA, a Mixture‑of‑Block‑Attention mechanism that applies Mixture‑of‑Experts principles to transformer attention, enabling efficient long‑context processing for large language models while maintaining performance comparable to full attention through sparse, trainable block selection and seamless switching.

Attention MechanismLLMMixture of Experts
0 likes · 12 min read
MoBA: Mixture of Block Attention for Long‑Context Large Language Models
AI Large Model Application Practice
AI Large Model Application Practice
Feb 24, 2025 · Artificial Intelligence

How Web Agents Combine LLMs and Browser Automation to Perform Real‑World Tasks

This article explains what Web Agents are, their ReAct‑style reasoning loop, key implementation technologies such as observation parsing, multimodal models, and browser control tools like Selenium and Playwright, and demonstrates building a DeepSeek‑powered Web Agent with the Browser‑use framework, including code samples and performance insights.

Browser AutomationDeepSeekLLM
0 likes · 11 min read
How Web Agents Combine LLMs and Browser Automation to Perform Real‑World Tasks
Java Web Project
Java Web Project
Feb 23, 2025 · Artificial Intelligence

Build Your First AI Chatbot with Spring Boot and DeepSeek LLM

This guide walks you through creating a Spring Boot project, configuring DeepSeek's large language model via SiliconFlow, setting up OpenAI‑compatible parameters, and implementing a REST controller that returns weather forecasts using the model, complete with step‑by‑step code snippets, configuration files, and deployment instructions.

AIChatbotDeepSeek
0 likes · 7 min read
Build Your First AI Chatbot with Spring Boot and DeepSeek LLM
Ma Wei Says
Ma Wei Says
Feb 23, 2025 · Artificial Intelligence

How Microsoft’s PIKE‑RAG Builds Knowledge‑Driven AI Across Four Stages

The article explains Microsoft’s open‑source PIKE‑RAG system, detailing its four progressive stages—from knowledge‑base construction to creative multi‑agent reasoning—while describing the underlying modules, chunking strategies, multi‑granularity retrieval, and code snippets that enable specialized domain understanding and inference.

AI RetrievalLLMPIKE-RAG
0 likes · 11 min read
How Microsoft’s PIKE‑RAG Builds Knowledge‑Driven AI Across Four Stages
Architecture and Beyond
Architecture and Beyond
Feb 22, 2025 · Artificial Intelligence

Understanding Retrieval‑Augmented Generation (RAG) and Its Role in Enhancing Large Language Models

The article explains how the inherent knowledge‑staleness, hallucination, lack of private data, non‑traceable output, limited long‑text handling, and data‑security concerns of large language models can be mitigated by Retrieval‑Augmented Generation, which combines external retrieval, augmentation, and generation to provide up‑to‑date, reliable, and secure AI responses.

AIKnowledge augmentationLLM
0 likes · 15 min read
Understanding Retrieval‑Augmented Generation (RAG) and Its Role in Enhancing Large Language Models
Infra Learning Club
Infra Learning Club
Feb 21, 2025 · Artificial Intelligence

5 Must‑Try Open‑Source AI Projects You Can Start Using Today

This article introduces five open‑source AI tools—a PPT generator, an LLM app development platform, a cloud‑agnostic AI runner, a curated collection of LLM applications, and a one‑click HD video creator—detailing their key features, usage links, and sample configurations.

AIDifyLLM
0 likes · 8 min read
5 Must‑Try Open‑Source AI Projects You Can Start Using Today
Ma Wei Says
Ma Wei Says
Feb 21, 2025 · Artificial Intelligence

How PIKE‑RAG Boosts Retrieval‑Augmented Generation for Industrial AI

PIKE‑RAG, a Retrieval‑Augmented Generation framework from Microsoft Research, tackles knowledge source diversity, one‑size‑fits‑all limitations, and LLMs' lack of domain expertise by building multi‑layer heterogeneous graphs, task‑driven modular pipelines, and a staged L0‑L4 system for more accurate industrial AI responses.

AIKnowledgeGraphLLM
0 likes · 5 min read
How PIKE‑RAG Boosts Retrieval‑Augmented Generation for Industrial AI
Architect
Architect
Feb 20, 2025 · Artificial Intelligence

Why Long CoT and In‑Context RL Are the Next Frontier for LLMs

The article analyses recent breakthroughs such as OpenAI's o1, Long CoT, and test‑time search, arguing that enabling LLMs to perform self‑critique and reinforcement learning with long output sequences is essential for future AI performance, while warning against overly structured workflows.

AI researchIn‑Context RLLLM
0 likes · 12 min read
Why Long CoT and In‑Context RL Are the Next Frontier for LLMs
JD Tech Talk
JD Tech Talk
Feb 20, 2025 · Artificial Intelligence

Multi‑Agent Architecture for an E‑Commerce Business Assistant: Design, Planning, Evaluation, and Sample Generation

The document describes the evolution, design principles, key technologies, online inference workflow, evaluation methods, and sample‑generation techniques of a large‑language‑model‑based multi‑agent system that powers a 24/7 e‑commerce merchant assistant, highlighting its benefits, challenges, and future work.

AI PlanningLLMMulti-Agent
0 likes · 21 min read
Multi‑Agent Architecture for an E‑Commerce Business Assistant: Design, Planning, Evaluation, and Sample Generation
JD Cloud Developers
JD Cloud Developers
Feb 20, 2025 · Artificial Intelligence

How Multi‑Agent ReAct Architecture Boosts E‑Commerce AI Assistants

This article explains the evolution of multi‑agent systems for e‑commerce assistants, detailing the ReAct‑based planning framework, hierarchical master‑sub agent collaboration, evaluation methods, and sample‑generation techniques that together improve accuracy, efficiency, and scalability of AI‑driven merchant services.

AI PlanningAgent ArchitectureLLM
0 likes · 23 min read
How Multi‑Agent ReAct Architecture Boosts E‑Commerce AI Assistants
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 20, 2025 · Artificial Intelligence

How LLMs Power Real-Time Interactive 3D Worlds in Unreal Engine

This article explains how large language models are integrated with Unreal Engine to enable natural‑language‑driven 3D model search, manipulation, and scene understanding, detailing metadata extraction, vision‑language labeling, RAG‑based retrieval, and function‑call translation for interactive virtual environments.

3D interactionLLMRAG
0 likes · 21 min read
How LLMs Power Real-Time Interactive 3D Worlds in Unreal Engine
Architects' Tech Alliance
Architects' Tech Alliance
Feb 18, 2025 · Artificial Intelligence

How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment

This article explains DeepSeek's knowledge‑distillation approach for compressing large language models into small, efficient student models, details step‑by‑step local deployment requirements, performance optimizations, and highlights the cost, privacy, and application benefits of running the distilled model on‑premise.

AI inferenceDeepSeekLLM
0 likes · 10 min read
How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment
Big Data Tech Team
Big Data Tech Team
Feb 18, 2025 · Artificial Intelligence

How DeepSeek Trains and Optimizes Its LLMs: From Pre‑training to Reasoning Models

This article breaks down DeepSeek's LLM training pipeline, explaining the massive pre‑training phase, instruction fine‑tuning, reinforcement‑learning‑from‑human‑feedback, and the distinct roles of its V3 instruction model and R1 reasoning model, while also highlighting performance metrics and current limitations.

DeepSeekLLMModel Training
0 likes · 8 min read
How DeepSeek Trains and Optimizes Its LLMs: From Pre‑training to Reasoning Models
Java One
Java One
Feb 17, 2025 · Artificial Intelligence

How to Get Free Access to DeepSeek R1 Across Major Cloud Platforms

This guide walks you through using DeepSeek R1 via the official website or popular third‑party cloud services, compares free token quotas, explains token accounting, and provides step‑by‑step instructions for configuring API access and AI clients such as Chatbox, Cherry Studio, and Dify.

AI clientAPIDeepSeek
0 likes · 11 min read
How to Get Free Access to DeepSeek R1 Across Major Cloud Platforms
AI Large Model Application Practice
AI Large Model Application Practice
Feb 17, 2025 · Artificial Intelligence

Mastering Structured Output for DeepSeek‑R1 with LangChain, LangGraph, and ReAct Agents

DeepSeek‑R1 excels at deep reasoning but lacks native structured output; this guide explains why structured output matters, outlines common API‑level techniques, and provides three practical solutions—using an auxiliary model with a LangChain chain, a LangGraph workflow, and a ReAct agent—complete with code snippets and JSON‑mode tips.

DeepSeekLLMLangChain
0 likes · 12 min read
Mastering Structured Output for DeepSeek‑R1 with LangChain, LangGraph, and ReAct Agents
Code Mala Tang
Code Mala Tang
Feb 16, 2025 · Artificial Intelligence

17 Proven Prompt Engineering Techniques to Master LLM Interactions

This article presents 17 practical prompt‑engineering strategies—ranging from zero‑shot and few‑shot prompting to role, style, and chain‑of‑thought methods—explaining their usage, ideal scenarios, and concrete examples to help you obtain higher‑quality responses from large language models.

Artificial IntelligenceChatGPTLLM
0 likes · 14 min read
17 Proven Prompt Engineering Techniques to Master LLM Interactions
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Feb 15, 2025 · Artificial Intelligence

FinRL‑DeepSeek: How Integrating DeepSeek with RL Improves Portfolio Returns (Code Open‑Source)

This article reviews a new risk‑sensitive trading agent that combines reinforcement learning with large language models to extract stock recommendations and news‑based risk scores, describes the extended CVaR‑PPO algorithm, presents extensive experiments on the FNSPID dataset, and discusses the resulting performance gains and future work.

Algorithmic TradingCVaRDeepSeek
0 likes · 10 min read
FinRL‑DeepSeek: How Integrating DeepSeek with RL Improves Portfolio Returns (Code Open‑Source)
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 14, 2025 · Artificial Intelligence

Unlock Faster LLM Inference: Full Stack of Chips, Frameworks & Services

The article examines the end‑to‑end architecture for large‑model inference, detailing seven layers—from chip hardware and programming toolkits to deep‑learning frameworks, inference accelerators, model providers, compute platforms, application orchestration, and traffic management—highlighting key vendors, open‑source projects, and performance‑optimizing techniques.

AI hardwareInferenceLLM
0 likes · 12 min read
Unlock Faster LLM Inference: Full Stack of Chips, Frameworks & Services
JD Tech
JD Tech
Feb 14, 2025 · Artificial Intelligence

JD Merchant Intelligent Assistant – Multi‑Agent System Architecture, Planning, and Evaluation

JD’s Merchant Intelligent Assistant leverages a large‑language‑model‑based multi‑agent architecture to provide 24/7 e‑commerce support, detailing its evolution, planning techniques, online inference, evaluation methods, sample generation, and practical insights for scalable AI‑driven operations.

E-commerce AILLMMulti-Agent
0 likes · 22 min read
JD Merchant Intelligent Assistant – Multi‑Agent System Architecture, Planning, and Evaluation
Architect
Architect
Feb 13, 2025 · Artificial Intelligence

How to Build a Mini ChatGPT on a Single GPU with MiniMind

This article provides a comprehensive, step‑by‑step guide to training and fine‑tuning a miniature large‑language model called MiniMind, covering lightweight model design, open‑source training pipelines, required datasets, tokenizer options, and deployment via a web UI, all using PyTorch on modest hardware.

AILLMMiniMind
0 likes · 11 min read
How to Build a Mini ChatGPT on a Single GPU with MiniMind
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 13, 2025 · Cloud Computing

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

This guide walks you through deploying the DeepSeek‑R1 large‑language‑model inference service on Alibaba Cloud ACK One registered clusters using ACS GPU compute, covering model preparation, OSS storage setup, PersistentVolume configuration, arena‑based service deployment, and verification steps with concrete commands and parameters.

ACK OneACS GPUDeepSeek
0 likes · 14 min read
Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 13, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 671B Distributed Inference Service on Alibaba Cloud ACK with vLLM and Dify

This article explains how to quickly deploy the full‑parameter DeepSeek‑R1 671B model in a multi‑node GPU‑enabled Kubernetes cluster on Alibaba Cloud ACK, covering prerequisites, model parallelism, vLLM‑Ray distributed deployment, service verification, and integration with Dify to build a private AI Q&A assistant.

DeepSeekDifyDistributed Deployment
0 likes · 12 min read
Deploying DeepSeek‑R1 671B Distributed Inference Service on Alibaba Cloud ACK with vLLM and Dify
JD Tech Talk
JD Tech Talk
Feb 13, 2025 · Artificial Intelligence

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

This article introduces the DeepSeek family of models, explains the concepts of online search and deep reasoning, details the two‑phase training pipeline with data augmentation and reinforcement learning, and showcases practical experiments and deployment examples for the R1 and distilled variants.

DeepSeekLLMModel Training
0 likes · 10 min read
DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 13, 2025 · Artificial Intelligence

How to Build and Improve Reasoning LLMs: Methods, Trade‑offs, and DeepSeek Insights

This article explains what reasoning language models are, when they are needed, and reviews four main techniques— inference‑time scaling, pure reinforcement learning, combined SFT + RL, and distillation—illustrated with DeepSeek‑R1’s development, cost analysis, and low‑budget alternatives.

AI researchDeepSeekInference Scaling
0 likes · 27 min read
How to Build and Improve Reasoning LLMs: Methods, Trade‑offs, and DeepSeek Insights
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 12, 2025 · Artificial Intelligence

How X‑R1 Triggers Aha Moments in Low‑Cost RL Training of 0.5B LLMs

The X‑R1 open‑source framework demonstrates that a 0.5B language model can achieve rapid reasoning improvements and observable "Aha Moments" using reinforcement learning on a modest 4‑GPU setup, detailing its design, performance metrics, installation steps, and future roadmap.

AILLMReinforcement Learning
0 likes · 6 min read
How X‑R1 Triggers Aha Moments in Low‑Cost RL Training of 0.5B LLMs
vivo Internet Technology
vivo Internet Technology
Feb 12, 2025 · Artificial Intelligence

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

The paper proposes a bidirectional optimization framework that fine‑tunes the low‑resource NLLB‑200 translation model with LoRA using data generated by ChatGPT, while also translating low‑resource prompts with NLLB before feeding them to LLMs, thereby improving multilingual translation quality yet requiring careful validation of noisy synthetic data.

Fine-tuningLLMLoRA
0 likes · 28 min read
Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation
JD Retail Technology
JD Retail Technology
Feb 12, 2025 · Artificial Intelligence

Accelerating Generative Recommendation with NVIDIA TensorRT‑LLM in JD Advertising

JD Advertising accelerates its generative‑recall recommendation system by integrating NVIDIA TensorRT‑LLM, which simplifies the pipeline, injects LLM knowledge, scales to billions of parameters, and delivers over five‑fold throughput gains, one‑fifth the cost, and significant CTR improvements in both recommendation and search.

Inference OptimizationLLMTensorRT-LLM
0 likes · 13 min read
Accelerating Generative Recommendation with NVIDIA TensorRT‑LLM in JD Advertising
Architect
Architect
Feb 10, 2025 · Artificial Intelligence

Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3

This article reviews the development of DeepSeek's Mixture-of-Experts (MoE) models, tracing their evolution from the original DeepSeekMoE V1 through V2 to V3, detailing architectural innovations such as fine‑grained expert segmentation, shared‑expert isolation, load‑balancing losses, device‑limited routing, and the shift from softmax to sigmoid gating.

DeepSeekLLMMixture of Experts
0 likes · 21 min read
Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 10, 2025 · Artificial Intelligence

Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe

This article presents a hybrid‑cloud solution that uses ACK Edge and KServe to dynamically allocate on‑premise and cloud GPU resources for large‑language‑model inference, addressing tidal traffic patterns, reducing costs, and ensuring high availability through elastic scaling and custom scheduling policies.

ACK@EdgeAuto ScalingKServe
0 likes · 13 min read
Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe
JD Retail Technology
JD Retail Technology
Feb 10, 2025 · Artificial Intelligence

JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration

The JD Merchant Intelligent Assistant employs a large‑language‑model‑driven multi‑agent architecture with dynamic ReAct planning, enabling merchants to query and execute store operations in under a second with over 90 % decision accuracy, while reducing inference cost, hallucinations, and engineering effort across diverse e‑commerce tasks.

AILLMMulti-Agent
0 likes · 25 min read
JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration
Top Architect
Top Architect
Feb 9, 2025 · Artificial Intelligence

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

The article reviews DeepSeek‑R1’s training methodology—including cold‑start data collection, multi‑stage RL fine‑tuning, SFT data generation, and model distillation—highlights its performance comparable to OpenAI‑o1‑1217, and discusses key contributions, reward design, successful experiments, and failed attempts.

AI researchDeepSeekLLM
0 likes · 12 min read
DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results
Infra Learning Club
Infra Learning Club
Feb 8, 2025 · Artificial Intelligence

Multi-Agent LLMs Explained: Benefits, Workflows, and Leading Frameworks

The article surveys the rise of multi‑agent LLM systems, detailing how specialized agents collaborate on tasks such as travel planning, outlining their workflow, comparing them with single‑agent models, listing prominent frameworks, and discussing current challenges and research citations.

AIAgent CollaborationAutoGen
0 likes · 13 min read
Multi-Agent LLMs Explained: Benefits, Workflows, and Leading Frameworks
MaGe Linux Operations
MaGe Linux Operations
Feb 7, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide

This article walks you through everything you need to know about DeepSeek R1—including its different model sizes, hardware requirements, installation tools like Ollama, LM Studio and Docker, and how to set up a visual interface with Open‑WebUI or Dify—for offline, private, and cost‑effective AI inference.

AIDeepSeekDocker
0 likes · 15 min read
How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide
iKang Technology Team
iKang Technology Team
Feb 7, 2025 · Artificial Intelligence

Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation

Retrieval‑Augmented Generation (RAG) using LangChain lets developers enhance large language models by embedding user queries, fetching relevant documents from a vector store, inserting the context into a prompt template, and generating concise, source‑grounded answers, offering low‑cost, up‑to‑date knowledge while reducing hallucinations and fine‑tuning expenses.

LLMLangChainRAG
0 likes · 10 min read
Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation
Top Architect
Top Architect
Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the full‑size DeepSeek R1 671B model using Ollama, covering dynamic quantization options, hardware specifications, detailed installation commands, configuration files, performance observations, and practical recommendations for consumer‑grade systems.

AIDeepSeekGPU
0 likes · 14 min read
Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 5, 2025 · Artificial Intelligence

10 Common Prompt Engineering Mistakes and How to Overcome Them

This article lists ten common misconceptions about prompt engineering, explains why each is flawed, and offers practical insights and strategies—such as using the CO‑STAR framework, tailoring prompts to specific models, keeping prompts concise, and continuously testing and refining—to help readers communicate effectively with large language models.

AI misconceptionsLLMPrompt Design
0 likes · 10 min read
10 Common Prompt Engineering Mistakes and How to Overcome Them
21CTO
21CTO
Feb 4, 2025 · Artificial Intelligence

Is DeepSeek the Next Challenger to ChatGPT? A Deep Dive into Its AI Edge

This article explains what DeepSeek is, how its open‑source large language model works, its unique multilingual training, free access, the DeepSeek‑Coder variant, and compares its capabilities and goals with ChatGPT, highlighting strengths, limitations, and market impact.

AI modelsChatGPT comparisonDeepSeek
0 likes · 7 min read
Is DeepSeek the Next Challenger to ChatGPT? A Deep Dive into Its AI Edge
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 1, 2025 · Artificial Intelligence

Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery

This article introduces Alibaba Cloud's PAI Model Gallery, detailing the DeepSeek-V3 and DeepSeek‑R1 large language models, their architectures and parameters, and provides a step‑by‑step guide for one‑click deployment of these models and their distilled variants using vLLM or BladeLLM.

AI inferenceAlibaba CloudDeepSeek
0 likes · 6 min read
Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery
CSS Magic
CSS Magic
Jan 31, 2025 · Artificial Intelligence

Cursor vs. Windsurf vs. GitHub Copilot: Hands‑On Comparison of Three AI Code Editors

The article conducts a practical, step‑by‑step evaluation of Cursor, Windsurf, and GitHub Copilot’s multi‑file editing capabilities using a simple web‑chat bot, revealing that Cursor completes all required UI, storage, and application changes in a single interaction, while the others need two rounds, with Copilot showing notable improvement on a retest.

AI code editorCursorGitHub Copilot
0 likes · 9 min read
Cursor vs. Windsurf vs. GitHub Copilot: Hands‑On Comparison of Three AI Code Editors
DataFunSummit
DataFunSummit
Jan 30, 2025 · Databases

Mature Practices for Building Risk‑Control Knowledge Graphs on NebulaGraph and Leveraging Large Language Models

This article explains how NebulaGraph’s large‑scale graph database can be used to construct real‑time risk‑control knowledge graphs, describes practical applications such as community detection and path analysis, and explores how large language models enhance graph queries through Text‑to‑GQL, agents, exploration chains, and semi‑structured knowledge extraction.

AIGraph DatabaseLLM
0 likes · 11 min read
Mature Practices for Building Risk‑Control Knowledge Graphs on NebulaGraph and Leveraging Large Language Models
DataFunSummit
DataFunSummit
Jan 29, 2025 · Artificial Intelligence

Tencent OlaChat: An LLM‑Powered Intelligent Business Intelligence Platform – Architecture, Capabilities, and Practice

This article presents Tencent's OlaChat intelligent BI platform, detailing its evolution from traditional to intelligent BI, the impact of large language models on data analytics, the system's multi‑task dialogue, metadata retrieval enhancements, Text2SQL solutions, and real‑world deployment insights.

AIBusiness IntelligenceData Platform
0 likes · 21 min read
Tencent OlaChat: An LLM‑Powered Intelligent Business Intelligence Platform – Architecture, Capabilities, and Practice
Architect
Architect
Jan 27, 2025 · Artificial Intelligence

How to Build a Retrieval‑Augmented Generation QA Assistant for an Open Platform

This article details a step‑by‑step design of a RAG‑based intelligent Q&A assistant for the DeWu Open Platform, covering background, RAG fundamentals, system architecture, technology selection, prompt engineering with CO‑STAR, data preprocessing, vector store setup, LangChain.js implementation, similarity search, runnable chaining, debugging, and future prospects.

AILLMLangChain
0 likes · 28 min read
How to Build a Retrieval‑Augmented Generation QA Assistant for an Open Platform
DataFunTalk
DataFunTalk
Jan 26, 2025 · Artificial Intelligence

58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations

Since the launch of ChatGPT, 58.com has built a Model‑as‑a‑Service platform called LingXi that trains and serves domain‑specific large language models, supports over a hundred internal scenarios with daily inference exceeding ten million calls, and continuously improves performance through quantization, GPU optimization, model miniaturization, and advanced AI applications such as interview assistants, voice agents, and RAG‑enabled agents.

AI PlatformAI applicationsInference Optimization
0 likes · 9 min read
58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations
DataFunSummit
DataFunSummit
Jan 24, 2025 · Artificial Intelligence

Exploring LLM‑Based Generative Business Intelligence (GenBI): Architecture, Implementation, and Lessons Learned

With the rise of LLM‑based generative AI, this article examines the emerging GenBI (Generative Business Intelligence) paradigm, detailing why self‑serving analytics are needed, the progress of Text‑to‑SQL, an LLM‑driven agent architecture, practical AWS Bedrock implementation, technical choices, lessons learned, and future outlook.

AWS BedrockAgentic AIBusiness Intelligence
0 likes · 18 min read
Exploring LLM‑Based Generative Business Intelligence (GenBI): Architecture, Implementation, and Lessons Learned
DataFunSummit
DataFunSummit
Jan 21, 2025 · Artificial Intelligence

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

This article presents NVIDIA's NeMo technology stack for end‑to‑end large language model (LLM) training, covering the full software pipeline, model alignment with reinforcement learning from human feedback (RLHF), performance optimizations such as model parallelism, FP8, TensorRT‑LLM inference, dynamic load balancing, and future research directions.

Distributed TrainingGPU OptimizationLLM
0 likes · 24 min read
NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF
ByteFE
ByteFE
Jan 20, 2025 · Artificial Intelligence

Eino: An Open‑Source Golang Framework for Large‑Model Application Development

Eino is a Golang‑based, open‑source framework that streamlines the full devops lifecycle of large‑model applications by providing stable, strongly‑typed components, graph‑based orchestration, built‑in tooling, and extensible architecture to help developers quickly build reliable AI services.

AIFrameworkGolang
0 likes · 13 min read
Eino: An Open‑Source Golang Framework for Large‑Model Application Development
AI Large Model Application Practice
AI Large Model Application Practice
Jan 20, 2025 · Artificial Intelligence

How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs

This article explains how embeddings convert basic character indices into high‑dimensional vectors, describes their training via gradient descent, introduces the embedding matrix, and shows how these vectors enable modern language models to capture semantic relationships and be reused across tasks.

LLMNeural Networksembeddings
0 likes · 8 min read
How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs
DataFunTalk
DataFunTalk
Jan 18, 2025 · Artificial Intelligence

Understanding Xiaohongshu’s Content Recommendation Mechanisms: NoteLLM and SSD

This article analyzes Xiaohongshu’s content recommendation system by reviewing two official papers, detailing the NoteLLM framework for interest discovery and the Sliding Spectrum Decomposition (SSD) method for diversified recommendations, and explaining their underlying models, loss functions, and experimental results.

DiversityLLMcollaborative filtering
0 likes · 13 min read
Understanding Xiaohongshu’s Content Recommendation Mechanisms: NoteLLM and SSD
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jan 17, 2025 · Artificial Intelligence

Elastic Scaling of Large Language Model Inference on Alibaba Cloud ACK with Knative, ResourcePolicy, and Fluid

This article explains how to reduce inference cost and improve performance for large language models on Alibaba Cloud ACK by using Knative's request‑based autoscaling, custom ResourcePolicy priority scheduling, and Fluid data‑caching to achieve elastic scaling, resource pre‑emption, and faster model loading.

FluidInferenceKnative
0 likes · 22 min read
Elastic Scaling of Large Language Model Inference on Alibaba Cloud ACK with Knative, ResourcePolicy, and Fluid
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 15, 2025 · Artificial Intelligence

How Multi-Token Prediction Boosts LLM Training and Inference Efficiency

This article reviews the evolution of Multi‑Token Prediction (MTP) techniques—from early blockwise parallel decoding to Meta's and DeepSeek's implementations—explaining their architectures, training and inference workflows, and the speed‑up gains they offer for large language models.

DeepSeekInference AccelerationLLM
0 likes · 20 min read
How Multi-Token Prediction Boosts LLM Training and Inference Efficiency
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 15, 2025 · Artificial Intelligence

Build an Education‑Focused RAG Solution Using Alibaba PAI

This guide explains how to create a Retrieval‑Augmented Generation (RAG) solution for education on Alibaba PAI, covering knowledge‑base construction with PAI‑Designer, model deployment, connection setup in LangStudio, workflow configuration, online deployment, and a legal‑domain case comparison that highlights RAG's accuracy benefits.

Alibaba PAIEmbeddingKnowledge Base
0 likes · 14 min read
Build an Education‑Focused RAG Solution Using Alibaba PAI
Bilibili Tech
Bilibili Tech
Jan 14, 2025 · Artificial Intelligence

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

We built an LLM‑powered system for Bilibili that automatically creates ad titles from user keywords, employing fluency, style, and quality classifiers, mixed domain data cleaning, and alignment methods such as SFT, DPO and KTO, resulting in a product that now generates about ten percent of daily titles and drives significant ad spend.

AI AlignmentAd Title GenerationBilibili
0 likes · 24 min read
Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili
JD Tech Talk
JD Tech Talk
Jan 14, 2025 · Artificial Intelligence

Advantages and Engineering Implementation of Generative Recommendation Systems Using Large Language Models

This article explains how generative recommendation systems powered by large language models simplify the recommendation pipeline, integrate world knowledge, benefit from scaling laws, and require specialized engineering optimizations such as TensorRT‑LLM deployment, inference acceleration, and hybrid model strategies to achieve low latency and high throughput in real‑world e‑commerce scenarios.

AIInference OptimizationLLM
0 likes · 10 min read
Advantages and Engineering Implementation of Generative Recommendation Systems Using Large Language Models
JD Cloud Developers
JD Cloud Developers
Jan 14, 2025 · Artificial Intelligence

How Generative Recommendation Systems Transform E‑Commerce with LLMs

This article explains how large language models reshape recommendation systems by simplifying pipelines, integrating world knowledge, and leveraging scaling laws, and details the engineering steps for deploying generative recall models—including product encoding, user prompting, model training, TensorRT‑LLM optimization, and continuous performance improvements.

AI OptimizationGenerative RecommendationLLM
0 likes · 13 min read
How Generative Recommendation Systems Transform E‑Commerce with LLMs
AI Large Model Application Practice
AI Large Model Application Practice
Jan 14, 2025 · Artificial Intelligence

Turning Classification Nets into Language Generators: A Step‑by‑Step Guide

This article explains how a simple neural network trained for classification can be adapted to generate natural language by expanding its output layer, encoding characters as numbers, using a sliding‑window context, and recursively predicting the next token, illustrating each step with diagrams and concrete examples.

AILLMNeural Networks
0 likes · 10 min read
Turning Classification Nets into Language Generators: A Step‑by‑Step Guide
Java Architecture Diary
Java Architecture Diary
Jan 10, 2025 · Artificial Intelligence

Generate Structured JSON with Ollama LLM Using Java

This guide explains why structured JSON output from LLMs is essential, walks through installing and running Ollama, and provides a complete Java Spring Boot implementation—including POJOs, service code, and best‑practice tips—to retrieve AI‑generated data in a reliable, parsable format.

AIJSONLLM
0 likes · 7 min read
Generate Structured JSON with Ollama LLM Using Java
Tencent Advertising Technology
Tencent Advertising Technology
Jan 9, 2025 · Artificial Intelligence

Applying Large Language Models to Search Advertising: End‑to‑End Generative Recall and System Optimizations

This report details how large language models (LLMs) were integrated into Tencent's search advertising pipeline—from early extraction‑distillation experiments in 2023 to a 2024 end‑to‑end generative recall architecture—showing significant improvements in relevance, diversity, and revenue through knowledge injection, supervised fine‑tuning, constrained beam‑search decoding, and high‑performance inference services.

AIBeam SearchLLM
0 likes · 11 min read
Applying Large Language Models to Search Advertising: End‑to‑End Generative Recall and System Optimizations
Data Thinking Notes
Data Thinking Notes
Jan 7, 2025 · Databases

Unlocking LLM-Powered Text-to-SQL: From Basics to Cutting-Edge Techniques

This article provides a comprehensive overview of LLM-based Text-to-SQL technology, covering its background, evolution, challenges, various LLM-driven methods, benchmark datasets, evaluation metrics, and future research directions to guide researchers and practitioners in advancing natural language interfaces for databases.

LLMText-to-SQLdatabase
0 likes · 18 min read
Unlocking LLM-Powered Text-to-SQL: From Basics to Cutting-Edge Techniques
Infra Learning Club
Infra Learning Club
Jan 7, 2025 · Artificial Intelligence

How GitHub Copilot Workspace Made Me Fear Unemployment

The author experiments with GitHub Copilot Workspace to automatically generate a WeChat mini‑program for family library management, documents the prompting process, code generation, bug fixes, UI tweaks, and reflects on the broader impact of AI‑driven development on programmers' future jobs.

AI code generationGitHub CopilotLLM
0 likes · 5 min read
How GitHub Copilot Workspace Made Me Fear Unemployment