Tagged articles

2016 articles

Page 15 of 21

Mar 3, 2025 · Artificial Intelligence

Can DeepSeek‑R1 Unlock True “Deep Thinking” for Enterprise RAG?

This article examines how swapping in DeepSeek‑R1 enhances Retrieval‑Augmented Generation with deeper reasoning, outlines its benefits and pitfalls—including slower inference, higher compute costs, and hallucinations—provides a simple hallucination test, and proposes an Agentic RAG research assistant to balance accuracy and creativity.

AI reasoningAgenticDeepSeek

0 likes · 10 min read

Can DeepSeek‑R1 Unlock True “Deep Thinking” for Enterprise RAG?

Java Architect Essentials

Mar 2, 2025 · Artificial Intelligence

Zero‑Code Local Deployment of DeepSeek LLM on Consumer GPUs Using Ollama

This guide explains why DeepSeek is a compelling GPT‑4‑level alternative, provides hardware recommendations for various model sizes, and walks through a three‑step Windows deployment using Ollama, including installation, environment configuration, model download, performance tuning, and common troubleshooting tips.

AIDeepSeekGPU

0 likes · 8 min read

Zero‑Code Local Deployment of DeepSeek LLM on Consumer GPUs Using Ollama

JD Retail Technology

Mar 1, 2025 · Industry Insights

How JD Retail’s AI Assistant Uses Multimodal LLMs to Boost E‑Commerce

JD Retail’s AI assistant combines a Master‑Sub agent framework, ReAct paradigm, multimodal integration and MoE architecture to improve sales forecasting, pricing, and recommendation accuracy, while the team’s collaborative culture and open talent pathways illustrate how cutting‑edge AI is applied in real‑world e‑commerce.

AIJD RetailLLM

0 likes · 8 min read

How JD Retail’s AI Assistant Uses Multimodal LLMs to Boost E‑Commerce

Code Mala Tang

Mar 1, 2025 · Artificial Intelligence

Why Do Large Language Models Hallucinate and How Can We Fix It?

This article explains why large language models produce plausible‑looking but false information, traces the problem to the supervised fine‑tuning stage, and outlines mitigation techniques such as knowledge interrogation, RLHF, and tool‑augmented search to reduce hallucinations.

LLMRLHFTraining

0 likes · 12 min read

Why Do Large Language Models Hallucinate and How Can We Fix It?

AntTech

Mar 1, 2025 · Artificial Intelligence

ScaleOT: Privacy‑Utility‑Scalable Offsite‑Tuning with Dynamic LayerReplace and Selective Rank Compression

The ScaleOT framework introduces a privacy‑preserving offsite‑tuning pipeline for large language models that combines importance‑aware dynamic layer replacement with selective rank compression, enabling flexible model compression, near‑lossless fine‑tuning, and strong privacy guarantees across diverse downstream tasks.

AdapterLLMmodel compression

0 likes · 16 min read

ScaleOT: Privacy‑Utility‑Scalable Offsite‑Tuning with Dynamic LayerReplace and Selective Rank Compression

Cognitive Technology Team

Feb 28, 2025 · Artificial Intelligence

Design and High‑Availability Architecture of Alibaba LangEngine AI Application Framework

This article introduces Alibaba's LangEngine, a pure Java AI application framework, detailing its high‑availability gateway architecture, communication protocols, streaming and non‑streaming output, multi‑level metadata caching, asynchronous and serverless designs, and future open‑source roadmap, offering practical guidance for building robust AI services.

AI FrameworkLLMLangEngine

0 likes · 11 min read

Design and High‑Availability Architecture of Alibaba LangEngine AI Application Framework

AI Large Model Application Practice

Feb 28, 2025 · Artificial Intelligence

How Self-Attention Powers LLMs: A Step‑by‑Step Deep Dive

This article explains the self‑attention mechanism behind large language models, detailing why static word importance fails, how queries, keys, and values are generated, how attention scores are computed, scaled, softmaxed, and used to produce context‑aware word vectors, while noting computational costs.

AILLMSelf-Attention

0 likes · 9 min read

How Self-Attention Powers LLMs: A Step‑by‑Step Deep Dive

JavaEdge

Feb 27, 2025 · Artificial Intelligence

How to Quickly Build a DeepSeek‑Powered Knowledge Base on Tencent Cloud

This guide walks through deploying the full‑feature DeepSeek V3+R1 model on Tencent Cloud, configuring a smart knowledge‑base application, importing documentation, enabling internet search, tuning retrieval parameters, and publishing the app for public use, all without writing code.

AIDeepSeekKnowledge Base

0 likes · 6 min read

How to Quickly Build a DeepSeek‑Powered Knowledge Base on Tencent Cloud

AI Algorithm Path

Feb 26, 2025 · Artificial Intelligence

Anthropic Unveils Claude 3.7 Sonnet: The World’s First Hybrid Reasoning Model

Anthropic’s Claude 3.7 Sonnet introduces a hybrid reasoning LLM with an extended thinking mode, a 128K‑token context window, improved coding abilities, lower refusal rates, and strong benchmark results, while being accessible via web, mobile apps and API under tiered pricing.

AI CodingAnthropicClaude 3.7 Sonnet

0 likes · 10 min read

Anthropic Unveils Claude 3.7 Sonnet: The World’s First Hybrid Reasoning Model

Baidu Tech Salon

Feb 26, 2025 · Artificial Intelligence

Graph‑Engine‑Driven Workflow for Building Intelligent Agents

The article presents a graph‑engine‑driven workflow platform that lets developers assemble, orchestrate, and execute intelligent LLM‑based agents with low‑code visual design, fine‑grained path control, hierarchical sub‑flows, and event‑driven hooks, addressing perception, reasoning, planning, and scalability challenges while surpassing existing frameworks.

Data DecouplingIntelligent agentsLLM

0 likes · 19 min read

Graph‑Engine‑Driven Workflow for Building Intelligent Agents

Alibaba Cloud Big Data AI Platform

Feb 25, 2025 · Artificial Intelligence

Build a RAG‑Powered Smart Q&A Assistant with Milvus, DeepSeek, and PAI LangStudio

This step‑by‑step guide shows how to assemble a Retrieval‑Augmented Generation (RAG) system using Alibaba Cloud Milvus vector search, the DeepSeek large language model, and PAI LangStudio, covering instance creation, data upload, model deployment, connection setup, flow design, and service invocation.

AI TutorialDeepSeekLLM

0 likes · 9 min read

Build a RAG‑Powered Smart Q&A Assistant with Milvus, DeepSeek, and PAI LangStudio

Alibaba Cloud Big Data AI Platform

Feb 25, 2025 · Artificial Intelligence

How DistilQwen2.5 Boosts LLM Efficiency with Dual‑Stage Knowledge Distillation

This article introduces DistilQwen2.5, a lightweight LLM series built on Qwen2.5 that uses a novel two‑layer distillation framework, instruction‑data optimization, and parameter‑fusion techniques to achieve higher performance while drastically reducing computational cost and deployment overhead.

Efficient InferenceLLMknowledge distillation

0 likes · 26 min read

How DistilQwen2.5 Boosts LLM Efficiency with Dual‑Stage Knowledge Distillation

DataFunSummit

Feb 25, 2025 · Artificial Intelligence

Collecting High-Quality LLM Training Data and Custom Model Training Guide

This article explains what constitutes high‑quality LLM training data, why large datasets are essential, outlines the step‑by‑step process for collecting, preprocessing, and fine‑tuning models, and highlights the best data sources—including web content, books, code repositories, and news—while noting available free datasets.

AILLMWeb Scraping

0 likes · 9 min read

Collecting High-Quality LLM Training Data and Custom Model Training Guide

Code Mala Tang

Feb 25, 2025 · Artificial Intelligence

How Resources, Tools, and Prompts Power LLM Super‑Agents

This article explains how the Resources data hub, Tools capability engine, and Prompts interaction templates work together to create a secure, extensible workflow that enables large language models to ingest data, execute tasks, and generate structured outputs.

AI workflowArtificial IntelligenceLLM

0 likes · 5 min read

How Resources, Tools, and Prompts Power LLM Super‑Agents

CSS Magic

Feb 25, 2025 · Artificial Intelligence

Two Simple Ways to Access DeepSeek API for Free

This guide shows how to obtain free DeepSeek API access through GitHub Models and SiliconFlow, detailing the required API base URL, key, and model name, how to register, create keys, verify usage with a web chat tool, and compare model choices and platform limits.

APIDeepSeekFree access

0 likes · 7 min read

Two Simple Ways to Access DeepSeek API for Free

Alibaba Cloud Native

Feb 24, 2025 · Cloud Native

Build a Real‑Time AI Search‑Enabled Q&A System with Higress and DeepSeek

This guide shows how open‑source LLMs like DeepSeek can power cost‑effective intelligent Q&A services, and how the cloud‑native Higress API gateway adds real‑time web search, routing, security, and observability to create a production‑grade solution in just a few steps.

DeepSeekHigressLLM

0 likes · 6 min read

Build a Real‑Time AI Search‑Enabled Q&A System with Higress and DeepSeek

Baidu Geek Talk

Feb 24, 2025 · Artificial Intelligence

Using a Graph Engine to Drive Workflow for Intelligent Agents

By leveraging mature graph‑engine technology, the article shows how visual, low‑code workflow orchestration can give intelligent LLM‑based agents fine‑grained path control, reusable functions, hierarchical sub‑flows, and robust error handling, turning complex business tasks into modular, scalable processes adopted by hundreds of thousands of developers.

AI agentsLLMgraph engine

0 likes · 18 min read

Using a Graph Engine to Drive Workflow for Intelligent Agents

Alibaba Cloud Observability

Feb 24, 2025 · Backend Development

Build a Cloud‑Native AI Chatbot with Spring AI Alibaba and ARMS Observability

This tutorial walks you through creating a Java‑based AI chat agent using Spring AI Alibaba, integrating Alibaba Cloud's large language model, adding function‑calling for weather queries, and enabling full observability with ARMS in a cloud‑native deployment.

ARMSCloud NativeLLM

0 likes · 10 min read

Build a Cloud‑Native AI Chatbot with Spring AI Alibaba and ARMS Observability

Selected Java Interview Questions

Feb 24, 2025 · Artificial Intelligence

Deploying Ollama on Windows and Linux and Integrating with SpringBoot

This guide explains how to download, install, and configure Ollama on Windows and Linux, set up environment variables, select a DeepSeek model, and call the Ollama API from a SpringBoot application with example code snippets.

APIDeepSeekDeployment

0 likes · 6 min read

Deploying Ollama on Windows and Linux and Integrating with SpringBoot

Architecture Digest

Feb 24, 2025 · Artificial Intelligence

MoBA: Mixture of Block Attention for Long‑Context Large Language Models

The article introduces MoBA, a Mixture‑of‑Block‑Attention mechanism that applies Mixture‑of‑Experts principles to transformer attention, enabling efficient long‑context processing for large language models while maintaining performance comparable to full attention through sparse, trainable block selection and seamless switching.

Attention MechanismLLMMixture of Experts

0 likes · 12 min read

MoBA: Mixture of Block Attention for Long‑Context Large Language Models

AI Large Model Application Practice

Feb 24, 2025 · Artificial Intelligence

How Web Agents Combine LLMs and Browser Automation to Perform Real‑World Tasks

This article explains what Web Agents are, their ReAct‑style reasoning loop, key implementation technologies such as observation parsing, multimodal models, and browser control tools like Selenium and Playwright, and demonstrates building a DeepSeek‑powered Web Agent with the Browser‑use framework, including code samples and performance insights.

Browser AutomationDeepSeekLLM

0 likes · 11 min read

How Web Agents Combine LLMs and Browser Automation to Perform Real‑World Tasks

Java Architecture Diary

Feb 24, 2025 · Artificial Intelligence

Run Large Language Models Directly in Java with Jlama – Quick Start Guide

This article introduces Jlama, an open‑source Java LLM inference engine, outlines its key features, provides step‑by‑step CLI and Maven integration instructions, shows code examples, run logs, and special setup notes for using large language models efficiently within Java applications.

AIInferenceJlama

0 likes · 6 min read

Run Large Language Models Directly in Java with Jlama – Quick Start Guide

Alibaba Cloud Developer

Feb 24, 2025 · Artificial Intelligence

How to Build a Local Chatbot with Web Search Using DeepSeek, Ollama, and Dify

Learn how to create a locally hosted chatbot powered by DeepSeek R1 32b, using Ollama and Docker, integrate Dify for model management, and add web‑search capability through SEARXNG, covering environment setup, search logic, content extraction, testing, and optimization tips.

ChatbotDeepSeekDify

0 likes · 10 min read

How to Build a Local Chatbot with Web Search Using DeepSeek, Ollama, and Dify

Java Web Project

Feb 23, 2025 · Artificial Intelligence

Build Your First AI Chatbot with Spring Boot and DeepSeek LLM

This guide walks you through creating a Spring Boot project, configuring DeepSeek's large language model via SiliconFlow, setting up OpenAI‑compatible parameters, and implementing a REST controller that returns weather forecasts using the model, complete with step‑by‑step code snippets, configuration files, and deployment instructions.

AIChatbotDeepSeek

0 likes · 7 min read

Build Your First AI Chatbot with Spring Boot and DeepSeek LLM

Ma Wei Says

Feb 23, 2025 · Artificial Intelligence

How Microsoft’s PIKE‑RAG Builds Knowledge‑Driven AI Across Four Stages

The article explains Microsoft’s open‑source PIKE‑RAG system, detailing its four progressive stages—from knowledge‑base construction to creative multi‑agent reasoning—while describing the underlying modules, chunking strategies, multi‑granularity retrieval, and code snippets that enable specialized domain understanding and inference.

AI RetrievalLLMPIKE-RAG

0 likes · 11 min read

How Microsoft’s PIKE‑RAG Builds Knowledge‑Driven AI Across Four Stages

Architecture and Beyond

Feb 22, 2025 · Artificial Intelligence

Understanding Retrieval‑Augmented Generation (RAG) and Its Role in Enhancing Large Language Models

The article explains how the inherent knowledge‑staleness, hallucination, lack of private data, non‑traceable output, limited long‑text handling, and data‑security concerns of large language models can be mitigated by Retrieval‑Augmented Generation, which combines external retrieval, augmentation, and generation to provide up‑to‑date, reliable, and secure AI responses.

AIKnowledge augmentationLLM

0 likes · 15 min read

Understanding Retrieval‑Augmented Generation (RAG) and Its Role in Enhancing Large Language Models

Infra Learning Club

Feb 21, 2025 · Artificial Intelligence

5 Must‑Try Open‑Source AI Projects You Can Start Using Today

This article introduces five open‑source AI tools—a PPT generator, an LLM app development platform, a cloud‑agnostic AI runner, a curated collection of LLM applications, and a one‑click HD video creator—detailing their key features, usage links, and sample configurations.

AIDifyLLM

0 likes · 8 min read

5 Must‑Try Open‑Source AI Projects You Can Start Using Today

Ma Wei Says

Feb 21, 2025 · Artificial Intelligence

How PIKE‑RAG Boosts Retrieval‑Augmented Generation for Industrial AI

PIKE‑RAG, a Retrieval‑Augmented Generation framework from Microsoft Research, tackles knowledge source diversity, one‑size‑fits‑all limitations, and LLMs' lack of domain expertise by building multi‑layer heterogeneous graphs, task‑driven modular pipelines, and a staged L0‑L4 system for more accurate industrial AI responses.

AIKnowledgeGraphLLM

0 likes · 5 min read

How PIKE‑RAG Boosts Retrieval‑Augmented Generation for Industrial AI

Architect

Feb 20, 2025 · Artificial Intelligence

Why Long CoT and In‑Context RL Are the Next Frontier for LLMs

The article analyses recent breakthroughs such as OpenAI's o1, Long CoT, and test‑time search, arguing that enabling LLMs to perform self‑critique and reinforcement learning with long output sequences is essential for future AI performance, while warning against overly structured workflows.

AI researchIn‑Context RLLLM

0 likes · 12 min read

Why Long CoT and In‑Context RL Are the Next Frontier for LLMs

Alibaba Cloud Infrastructure

Feb 20, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 Large Language Model on Knative with GPU A10

This guide explains how to deploy the DeepSeek‑R1 large language model on a Knative platform using an A10 GPU, covering preparation, service creation with appropriate annotations, YAML configuration, verification via curl, custom domain setup, and optional personal AI assistant deployment.

AIDeepSeekDeployment

0 likes · 8 min read

Deploying DeepSeek‑R1 Large Language Model on Knative with GPU A10

JD Tech Talk

Feb 20, 2025 · Artificial Intelligence

Multi‑Agent Architecture for an E‑Commerce Business Assistant: Design, Planning, Evaluation, and Sample Generation

The document describes the evolution, design principles, key technologies, online inference workflow, evaluation methods, and sample‑generation techniques of a large‑language‑model‑based multi‑agent system that powers a 24/7 e‑commerce merchant assistant, highlighting its benefits, challenges, and future work.

AI PlanningLLMMulti-Agent

0 likes · 21 min read

Multi‑Agent Architecture for an E‑Commerce Business Assistant: Design, Planning, Evaluation, and Sample Generation

JD Cloud Developers

Feb 20, 2025 · Artificial Intelligence

How Multi‑Agent ReAct Architecture Boosts E‑Commerce AI Assistants

This article explains the evolution of multi‑agent systems for e‑commerce assistants, detailing the ReAct‑based planning framework, hierarchical master‑sub agent collaboration, evaluation methods, and sample‑generation techniques that together improve accuracy, efficiency, and scalability of AI‑driven merchant services.

AI PlanningAgent ArchitectureLLM

0 likes · 23 min read

How Multi‑Agent ReAct Architecture Boosts E‑Commerce AI Assistants

Alibaba Cloud Developer

Feb 20, 2025 · Artificial Intelligence

How LLMs Power Real-Time Interactive 3D Worlds in Unreal Engine

This article explains how large language models are integrated with Unreal Engine to enable natural‑language‑driven 3D model search, manipulation, and scene understanding, detailing metadata extraction, vision‑language labeling, RAG‑based retrieval, and function‑call translation for interactive virtual environments.

3D interactionLLMRAG

0 likes · 21 min read

How LLMs Power Real-Time Interactive 3D Worlds in Unreal Engine

Architects' Tech Alliance

Feb 18, 2025 · Artificial Intelligence

How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment

This article explains DeepSeek's knowledge‑distillation approach for compressing large language models into small, efficient student models, details step‑by‑step local deployment requirements, performance optimizations, and highlights the cost, privacy, and application benefits of running the distilled model on‑premise.

AI inferenceDeepSeekLLM

0 likes · 10 min read

How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment

Java Backend Technology

Feb 18, 2025 · Artificial Intelligence

Boost Java AI Apps with DeepSeek4j: Full Chain Support & Reactive Streaming

DeepSeek4j 1.4 brings a Java‑native integration framework that fully preserves DeepSeek's chain‑of‑thought capabilities, adds reactive streaming, and offers a one‑line Spring Boot starter, enabling developers to quickly embed the model with simple configuration and rich debugging tools.

AI integrationDeepSeekLLM

0 likes · 5 min read

Boost Java AI Apps with DeepSeek4j: Full Chain Support & Reactive Streaming

Big Data Tech Team

Feb 18, 2025 · Artificial Intelligence

How DeepSeek Trains and Optimizes Its LLMs: From Pre‑training to Reasoning Models

This article breaks down DeepSeek's LLM training pipeline, explaining the massive pre‑training phase, instruction fine‑tuning, reinforcement‑learning‑from‑human‑feedback, and the distinct roles of its V3 instruction model and R1 reasoning model, while also highlighting performance metrics and current limitations.

DeepSeekLLMModel Training

0 likes · 8 min read

How DeepSeek Trains and Optimizes Its LLMs: From Pre‑training to Reasoning Models

Java One

Feb 17, 2025 · Artificial Intelligence

How to Get Free Access to DeepSeek R1 Across Major Cloud Platforms

This guide walks you through using DeepSeek R1 via the official website or popular third‑party cloud services, compares free token quotas, explains token accounting, and provides step‑by‑step instructions for configuring API access and AI clients such as Chatbox, Cherry Studio, and Dify.

AI clientAPIDeepSeek

0 likes · 11 min read

How to Get Free Access to DeepSeek R1 Across Major Cloud Platforms

Java Architecture Stack

Feb 17, 2025 · Artificial Intelligence

How to Deploy the Full-Feature DeepSeek LLM Locally and on Alibaba Cloud

This guide walks you through preparing the environment, installing Docker, cloning the DeepSeek repository, running the model with Docker or Ollama for quick start, using the enterprise API, and deploying the same model on Alibaba Cloud's free Bailei service within minutes.

AIAlibaba CloudDeepSeek

0 likes · 6 min read

How to Deploy the Full-Feature DeepSeek LLM Locally and on Alibaba Cloud

AI Large Model Application Practice

Feb 17, 2025 · Artificial Intelligence

Mastering Structured Output for DeepSeek‑R1 with LangChain, LangGraph, and ReAct Agents

DeepSeek‑R1 excels at deep reasoning but lacks native structured output; this guide explains why structured output matters, outlines common API‑level techniques, and provides three practical solutions—using an auxiliary model with a LangChain chain, a LangGraph workflow, and a ReAct agent—complete with code snippets and JSON‑mode tips.

DeepSeekLLMLangChain

0 likes · 12 min read

Mastering Structured Output for DeepSeek‑R1 with LangChain, LangGraph, and ReAct Agents

Code Mala Tang

Feb 16, 2025 · Artificial Intelligence

17 Proven Prompt Engineering Techniques to Master LLM Interactions

This article presents 17 practical prompt‑engineering strategies—ranging from zero‑shot and few‑shot prompting to role, style, and chain‑of‑thought methods—explaining their usage, ideal scenarios, and concrete examples to help you obtain higher‑quality responses from large language models.

Artificial IntelligenceChatGPTLLM

0 likes · 14 min read

17 Proven Prompt Engineering Techniques to Master LLM Interactions

Bighead's Algorithm Notes

Feb 15, 2025 · Artificial Intelligence

FinRL‑DeepSeek: How Integrating DeepSeek with RL Improves Portfolio Returns (Code Open‑Source)

This article reviews a new risk‑sensitive trading agent that combines reinforcement learning with large language models to extract stock recommendations and news‑based risk scores, describes the extended CVaR‑PPO algorithm, presents extensive experiments on the FNSPID dataset, and discusses the resulting performance gains and future work.

Algorithmic TradingCVaRDeepSeek

0 likes · 10 min read

FinRL‑DeepSeek: How Integrating DeepSeek with RL Improves Portfolio Returns (Code Open‑Source)

Alibaba Cloud Developer

Feb 14, 2025 · Artificial Intelligence

Unlock Faster LLM Inference: Full Stack of Chips, Frameworks & Services

The article examines the end‑to‑end architecture for large‑model inference, detailing seven layers—from chip hardware and programming toolkits to deep‑learning frameworks, inference accelerators, model providers, compute platforms, application orchestration, and traffic management—highlighting key vendors, open‑source projects, and performance‑optimizing techniques.

AI hardwareInferenceLLM

0 likes · 12 min read

Unlock Faster LLM Inference: Full Stack of Chips, Frameworks & Services

AI Large Model Application Practice

Feb 14, 2025 · Artificial Intelligence

Why Sub‑word Tokenizers Power Modern LLMs: From Characters to Tokens

This article explains how language models evolved from character‑level embeddings to word‑level and finally to sub‑word tokenizers, highlighting the efficiency, vocabulary coverage, and practical engineering challenges of sub‑word segmentation in modern AI systems.

AI fundamentalsLLMSubword Tokenization

0 likes · 8 min read

Why Sub‑word Tokenizers Power Modern LLMs: From Characters to Tokens

JD Tech

Feb 14, 2025 · Artificial Intelligence

JD Merchant Intelligent Assistant – Multi‑Agent System Architecture, Planning, and Evaluation

JD’s Merchant Intelligent Assistant leverages a large‑language‑model‑based multi‑agent architecture to provide 24/7 e‑commerce support, detailing its evolution, planning techniques, online inference, evaluation methods, sample generation, and practical insights for scalable AI‑driven operations.

E-commerce AILLMMulti-Agent

0 likes · 22 min read

Architect

Feb 13, 2025 · Artificial Intelligence

How to Build a Mini ChatGPT on a Single GPU with MiniMind

This article provides a comprehensive, step‑by‑step guide to training and fine‑tuning a miniature large‑language model called MiniMind, covering lightweight model design, open‑source training pipelines, required datasets, tokenizer options, and deployment via a web UI, all using PyTorch on modest hardware.

AILLMMiniMind

0 likes · 11 min read

How to Build a Mini ChatGPT on a Single GPU with MiniMind

Alibaba Cloud Infrastructure

Feb 13, 2025 · Cloud Computing

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

This guide walks you through deploying the DeepSeek‑R1 large‑language‑model inference service on Alibaba Cloud ACK One registered clusters using ACS GPU compute, covering model preparation, OSS storage setup, PersistentVolume configuration, arena‑based service deployment, and verification steps with concrete commands and parameters.

ACK OneACS GPUDeepSeek

0 likes · 14 min read

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

Alibaba Cloud Infrastructure

Feb 13, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 671B Distributed Inference Service on Alibaba Cloud ACK with vLLM and Dify

This article explains how to quickly deploy the full‑parameter DeepSeek‑R1 671B model in a multi‑node GPU‑enabled Kubernetes cluster on Alibaba Cloud ACK, covering prerequisites, model parallelism, vLLM‑Ray distributed deployment, service verification, and integration with Dify to build a private AI Q&A assistant.

DeepSeekDifyDistributed Deployment

0 likes · 12 min read

Deploying DeepSeek‑R1 671B Distributed Inference Service on Alibaba Cloud ACK with vLLM and Dify

JD Tech Talk

Feb 13, 2025 · Artificial Intelligence

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

This article introduces the DeepSeek family of models, explains the concepts of online search and deep reasoning, details the two‑phase training pipeline with data augmentation and reinforcement learning, and showcases practical experiments and deployment examples for the R1 and distilled variants.

DeepSeekLLMModel Training

0 likes · 10 min read

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

Baobao Algorithm Notes

Feb 13, 2025 · Artificial Intelligence

How to Build and Improve Reasoning LLMs: Methods, Trade‑offs, and DeepSeek Insights

This article explains what reasoning language models are, when they are needed, and reviews four main techniques— inference‑time scaling, pure reinforcement learning, combined SFT + RL, and distillation—illustrated with DeepSeek‑R1’s development, cost analysis, and low‑budget alternatives.

AI researchDeepSeekInference Scaling

0 likes · 27 min read

How to Build and Improve Reasoning LLMs: Methods, Trade‑offs, and DeepSeek Insights

Baobao Algorithm Notes

Feb 12, 2025 · Artificial Intelligence

How X‑R1 Triggers Aha Moments in Low‑Cost RL Training of 0.5B LLMs

The X‑R1 open‑source framework demonstrates that a 0.5B language model can achieve rapid reasoning improvements and observable "Aha Moments" using reinforcement learning on a modest 4‑GPU setup, detailing its design, performance metrics, installation steps, and future roadmap.

AILLMReinforcement Learning

0 likes · 6 min read

How X‑R1 Triggers Aha Moments in Low‑Cost RL Training of 0.5B LLMs

vivo Internet Technology

Feb 12, 2025 · Artificial Intelligence

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

The paper proposes a bidirectional optimization framework that fine‑tunes the low‑resource NLLB‑200 translation model with LoRA using data generated by ChatGPT, while also translating low‑resource prompts with NLLB before feeding them to LLMs, thereby improving multilingual translation quality yet requiring careful validation of noisy synthetic data.

Fine-tuningLLMLoRA

0 likes · 28 min read

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

Alibaba Cloud Infrastructure

Feb 12, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 Distilled Qwen‑32B‑FP8 Model on Alibaba Cloud GPU Instances with Docker and OpenWebUI

This guide explains how to prepare an Alibaba Cloud GPU instance, install Docker and NVIDIA tools, pull or build a container image, and run the FP8‑quantized DeepSeek‑R1‑Distill‑Qwen‑32B model using vLLM and OpenWebUI for both offline and online inference.

DeepSeekFP8 quantizationGPU

0 likes · 18 min read

Deploying DeepSeek‑R1 Distilled Qwen‑32B‑FP8 Model on Alibaba Cloud GPU Instances with Docker and OpenWebUI

DataFunSummit

Feb 12, 2025 · Artificial Intelligence

Didi's ChatBI: Evolution, Exploration, and Future of AI‑Powered Business Intelligence

This article details Didi's journey since early 2023 in building ChatBI, covering the evolution of BI platforms, the technical advances behind intelligent BI such as LLM‑driven NL2SQL, two main product paths, practical implementations, key challenges, and future directions for AI‑enhanced data analysis.

AIBusiness IntelligenceChatBI

0 likes · 12 min read

Didi's ChatBI: Evolution, Exploration, and Future of AI‑Powered Business Intelligence

JD Retail Technology

Feb 12, 2025 · Artificial Intelligence

Accelerating Generative Recommendation with NVIDIA TensorRT‑LLM in JD Advertising

JD Advertising accelerates its generative‑recall recommendation system by integrating NVIDIA TensorRT‑LLM, which simplifies the pipeline, injects LLM knowledge, scales to billions of parameters, and delivers over five‑fold throughput gains, one‑fifth the cost, and significant CTR improvements in both recommendation and search.

Inference OptimizationLLMTensorRT-LLM

0 likes · 13 min read

Accelerating Generative Recommendation with NVIDIA TensorRT‑LLM in JD Advertising

Architect's Alchemy Furnace

Feb 11, 2025 · Artificial Intelligence

How to Build a High‑Performance Local Enterprise Knowledge Base with AI

This article explains how to design and implement an on‑premise enterprise knowledge base by covering data preprocessing, vector database selection, LLM integration, system architecture, security, deployment, testing, and cost‑control, providing practical code snippets and best‑practice recommendations.

AILLMdata-processing

0 likes · 22 min read

How to Build a High‑Performance Local Enterprise Knowledge Base with AI

Infra Learning Club

Feb 11, 2025 · Artificial Intelligence

How to Run DeepSeek R1 Locally and Build a RAG System with Ollama and LangChain

This guide walks you through installing Ollama, pulling the open‑source DeepSeek R1 model, and using LangChain and Streamlit to create a locally hosted Retrieval‑Augmented Generation (RAG) system that can answer questions from uploaded PDFs without any cloud API.

DeepSeekLLMLangChain

0 likes · 6 min read

How to Run DeepSeek R1 Locally and Build a RAG System with Ollama and LangChain

Ops Development & AI Practice

Feb 10, 2025 · Artificial Intelligence

Mastering LLM Output: How Temperature, Top‑K, Top‑P & Max Tokens Shape AI Text

This article explains how the key LLM parameters—Temperature, Top‑K, Top‑P, and MaxOutputTokens—affect randomness, creativity, candidate selection, and output length, and provides practical guidance on tuning them for different AI text generation tasks.

AI GenerationLLMTemperature

0 likes · 7 min read

Mastering LLM Output: How Temperature, Top‑K, Top‑P & Max Tokens Shape AI Text

Architect

Feb 10, 2025 · Artificial Intelligence

Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3

This article reviews the development of DeepSeek's Mixture-of-Experts (MoE) models, tracing their evolution from the original DeepSeekMoE V1 through V2 to V3, detailing architectural innovations such as fine‑grained expert segmentation, shared‑expert isolation, load‑balancing losses, device‑limited routing, and the shift from softmax to sigmoid gating.

DeepSeekLLMMixture of Experts

0 likes · 21 min read

Evolution of DeepSeek Mixture‑of‑Experts (MoE) Architecture from V1 to V3

Alibaba Cloud Infrastructure

Feb 10, 2025 · Artificial Intelligence

Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe

This article presents a hybrid‑cloud solution that uses ACK Edge and KServe to dynamically allocate on‑premise and cloud GPU resources for large‑language‑model inference, addressing tidal traffic patterns, reducing costs, and ensuring high availability through elastic scaling and custom scheduling policies.

ACK@EdgeAuto ScalingKServe

0 likes · 13 min read

Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe

JD Retail Technology

Feb 10, 2025 · Artificial Intelligence

JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration

The JD Merchant Intelligent Assistant employs a large‑language‑model‑driven multi‑agent architecture with dynamic ReAct planning, enabling merchants to query and execute store operations in under a second with over 90 % decision accuracy, while reducing inference cost, hallucinations, and engineering effort across diverse e‑commerce tasks.

AILLMMulti-Agent

0 likes · 25 min read

JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration

JD Cloud Developers

Feb 10, 2025 · Artificial Intelligence

How to Deploy DeepSeek LLM Locally on JD Cloud GPU with Ollama and Chatbox

Learn step‑by‑step how to prepare a JD Cloud GPU instance, install GPU drivers, deploy Ollama, run DeepSeek‑R1 models, configure graphical clients like Chatbox on Windows and macOS, and optionally feed local data using AnythingLLM to build an offline knowledge base.

AnythingLLMChatboxDeepSeek

0 likes · 19 min read

How to Deploy DeepSeek LLM Locally on JD Cloud GPU with Ollama and Chatbox

Big Data Technology Architecture

Feb 9, 2025 · Artificial Intelligence

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

This article explains how to replicate Deepseek RI's slow‑thinking inference using the GRPO reinforcement‑learning algorithm on the Qwen2.5‑7B model in a free Colab notebook, covering the underlying COT concept, reward‑function design, data preparation, training configuration, and observed results.

Fine-tuningGRPOLLM

0 likes · 14 min read

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

Top Architect

Feb 9, 2025 · Artificial Intelligence

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

The article reviews DeepSeek‑R1’s training methodology—including cold‑start data collection, multi‑stage RL fine‑tuning, SFT data generation, and model distillation—highlights its performance comparable to OpenAI‑o1‑1217, and discusses key contributions, reward design, successful experiments, and failed attempts.

AI researchDeepSeekLLM

0 likes · 12 min read

DeepSeek‑R1: Training Pipeline, Reinforcement‑Learning Techniques, and Experimental Results

Infra Learning Club

Feb 8, 2025 · Artificial Intelligence

Why People Pay for DeepSeek Installation Packages (and How to Install It Yourself)

The article explains that DeepSeek is an open‑source LLM that many sellers monetize by offering paid installation packages, outlines the model lineup and size options, and provides a step‑by‑step guide to install and run DeepSeek locally with Ollama and Open WebUI.

AI modelsDeepSeekLLM

0 likes · 7 min read

Why People Pay for DeepSeek Installation Packages (and How to Install It Yourself)

Open Source Tech Hub

Feb 8, 2025 · Artificial Intelligence

How to Integrate DeepSeek’s Open‑Source LLM into Your Application

This guide introduces DeepSeek, outlines its cutting‑edge open‑source LLMs, and provides step‑by‑step instructions for accessing the admin backend, adding and configuring DeepSeek models, setting API endpoints and keys, and enabling frontend access.

APIDeepSeekLLM

0 likes · 3 min read

How to Integrate DeepSeek’s Open‑Source LLM into Your Application

Infra Learning Club

Feb 8, 2025 · Artificial Intelligence

Multi-Agent LLMs Explained: Benefits, Workflows, and Leading Frameworks

The article surveys the rise of multi‑agent LLM systems, detailing how specialized agents collaborate on tasks such as travel planning, outlining their workflow, comparing them with single‑agent models, listing prominent frameworks, and discussing current challenges and research citations.

AIAgent CollaborationAutoGen

0 likes · 13 min read

Multi-Agent LLMs Explained: Benefits, Workflows, and Leading Frameworks

Alibaba Cloud Infrastructure

Feb 8, 2025 · Artificial Intelligence

Deploying a Production‑Ready DeepSeek‑R1 Inference Service on Alibaba Cloud ACK with KServe

This guide explains how to deploy a production‑ready DeepSeek‑R1 inference service on Alibaba Cloud ACK using KServe, covering model preparation, storage configuration, service deployment, observability, autoscaling, model acceleration, gray‑release and GPU‑shared inference.

DeepSeekGPUInference

0 likes · 13 min read

Deploying a Production‑Ready DeepSeek‑R1 Inference Service on Alibaba Cloud ACK with KServe

Full-Stack DevOps & Kubernetes

Feb 8, 2025 · Artificial Intelligence

Deploy DeepSeek‑R1 on Tencent Cloud with Ollama: A Complete Step‑by‑Step Guide

This guide walks you through preparing a Tencent Cloud account, creating a Cloud Studio workspace, installing Ollama, downloading and running the DeepSeek‑R1 large language model, interacting via terminal or API, and managing resources and model versions.

AI Model DeploymentAPIDeepSeek

0 likes · 8 min read

Deploy DeepSeek‑R1 on Tencent Cloud with Ollama: A Complete Step‑by‑Step Guide

MaGe Linux Operations

Feb 7, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide

This article walks you through everything you need to know about DeepSeek R1—including its different model sizes, hardware requirements, installation tools like Ollama, LM Studio and Docker, and how to set up a visual interface with Open‑WebUI or Dify—for offline, private, and cost‑effective AI inference.

AIDeepSeekDocker

0 likes · 15 min read

How to Deploy DeepSeek R1 Locally: A Step‑by‑Step AI Model Guide

iKang Technology Team

Feb 7, 2025 · Artificial Intelligence

Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation

Retrieval‑Augmented Generation (RAG) using LangChain lets developers enhance large language models by embedding user queries, fetching relevant documents from a vector store, inserting the context into a prompt template, and generating concise, source‑grounded answers, offering low‑cost, up‑to‑date knowledge while reducing hallucinations and fine‑tuning expenses.

LLMLangChainRAG

0 likes · 10 min read

Retrieval‑Augmented Generation (RAG) with LangChain: Concepts and Python Implementation

Top Architect

Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

This article provides a comprehensive tutorial on locally deploying the full‑size DeepSeek R1 671B model using Ollama, covering dynamic quantization options, hardware specifications, detailed installation commands, configuration files, performance observations, and practical recommendations for consumer‑grade systems.

AIDeepSeekGPU

0 likes · 14 min read

Deploying DeepSeek R1 671B Model Locally with Ollama: Quantization, Hardware Requirements, and Step‑by‑Step Guide

Alibaba Cloud Developer

Feb 5, 2025 · Artificial Intelligence

10 Common Prompt Engineering Mistakes and How to Overcome Them

This article lists ten common misconceptions about prompt engineering, explains why each is flawed, and offers practical insights and strategies—such as using the CO‑STAR framework, tailoring prompts to specific models, keeping prompts concise, and continuously testing and refining—to help readers communicate effectively with large language models.

AI misconceptionsLLMPrompt Design

0 likes · 10 min read

10 Common Prompt Engineering Mistakes and How to Overcome Them

21CTO

Feb 4, 2025 · Artificial Intelligence

Is DeepSeek the Next Challenger to ChatGPT? A Deep Dive into Its AI Edge

This article explains what DeepSeek is, how its open‑source large language model works, its unique multilingual training, free access, the DeepSeek‑Coder variant, and compares its capabilities and goals with ChatGPT, highlighting strengths, limitations, and market impact.

AI modelsChatGPT comparisonDeepSeek

0 likes · 7 min read

Is DeepSeek the Next Challenger to ChatGPT? A Deep Dive into Its AI Edge

AIWalker

Feb 4, 2025 · Artificial Intelligence

Meta’s Open‑Source MILS Enables LLMs to See and Hear Without Training – SOTA on Images, Video, and Audio

The paper introduces MILS, a training‑free multimodal iterative LLM solver that lets large language models perceive and generate across image, video, and audio domains, achieving new state‑of‑the‑art results without any task‑specific data or fine‑tuning.

AI researchLLMMILS

0 likes · 18 min read

Meta’s Open‑Source MILS Enables LLMs to See and Hear Without Training – SOTA on Images, Video, and Audio

Alibaba Cloud Big Data AI Platform

Feb 1, 2025 · Artificial Intelligence

Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery

This article introduces Alibaba Cloud's PAI Model Gallery, detailing the DeepSeek-V3 and DeepSeek‑R1 large language models, their architectures and parameters, and provides a step‑by‑step guide for one‑click deployment of these models and their distilled variants using vLLM or BladeLLM.

AI inferenceAlibaba CloudDeepSeek

0 likes · 6 min read

Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery

CSS Magic

Jan 31, 2025 · Artificial Intelligence

Cursor vs. Windsurf vs. GitHub Copilot: Hands‑On Comparison of Three AI Code Editors

The article conducts a practical, step‑by‑step evaluation of Cursor, Windsurf, and GitHub Copilot’s multi‑file editing capabilities using a simple web‑chat bot, revealing that Cursor completes all required UI, storage, and application changes in a single interaction, while the others need two rounds, with Copilot showing notable improvement on a retest.

AI code editorCursorGitHub Copilot

0 likes · 9 min read

Cursor vs. Windsurf vs. GitHub Copilot: Hands‑On Comparison of Three AI Code Editors

DataFunSummit

Jan 30, 2025 · Databases

Mature Practices for Building Risk‑Control Knowledge Graphs on NebulaGraph and Leveraging Large Language Models

This article explains how NebulaGraph’s large‑scale graph database can be used to construct real‑time risk‑control knowledge graphs, describes practical applications such as community detection and path analysis, and explores how large language models enhance graph queries through Text‑to‑GQL, agents, exploration chains, and semi‑structured knowledge extraction.

AIGraph DatabaseLLM

0 likes · 11 min read

Mature Practices for Building Risk‑Control Knowledge Graphs on NebulaGraph and Leveraging Large Language Models

DataFunSummit

Jan 29, 2025 · Artificial Intelligence

Tencent OlaChat: An LLM‑Powered Intelligent Business Intelligence Platform – Architecture, Capabilities, and Practice

This article presents Tencent's OlaChat intelligent BI platform, detailing its evolution from traditional to intelligent BI, the impact of large language models on data analytics, the system's multi‑task dialogue, metadata retrieval enhancements, Text2SQL solutions, and real‑world deployment insights.

AIBusiness IntelligenceData Platform

0 likes · 21 min read

Architect

Jan 27, 2025 · Artificial Intelligence

How to Build a Retrieval‑Augmented Generation QA Assistant for an Open Platform

This article details a step‑by‑step design of a RAG‑based intelligent Q&A assistant for the DeWu Open Platform, covering background, RAG fundamentals, system architecture, technology selection, prompt engineering with CO‑STAR, data preprocessing, vector store setup, LangChain.js implementation, similarity search, runnable chaining, debugging, and future prospects.

AILLMLangChain

0 likes · 28 min read

How to Build a Retrieval‑Augmented Generation QA Assistant for an Open Platform

DataFunTalk

Jan 26, 2025 · Artificial Intelligence

58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations

Since the launch of ChatGPT, 58.com has built a Model‑as‑a‑Service platform called LingXi that trains and serves domain‑specific large language models, supports over a hundred internal scenarios with daily inference exceeding ten million calls, and continuously improves performance through quantization, GPU optimization, model miniaturization, and advanced AI applications such as interview assistants, voice agents, and RAG‑enabled agents.

AI PlatformAI applicationsInference Optimization

0 likes · 9 min read

58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations

DataFunSummit

Jan 24, 2025 · Artificial Intelligence

Exploring LLM‑Based Generative Business Intelligence (GenBI): Architecture, Implementation, and Lessons Learned

With the rise of LLM‑based generative AI, this article examines the emerging GenBI (Generative Business Intelligence) paradigm, detailing why self‑serving analytics are needed, the progress of Text‑to‑SQL, an LLM‑driven agent architecture, practical AWS Bedrock implementation, technical choices, lessons learned, and future outlook.

AWS BedrockAgentic AIBusiness Intelligence

0 likes · 18 min read

Exploring LLM‑Based Generative Business Intelligence (GenBI): Architecture, Implementation, and Lessons Learned

AI Large Model Application Practice

Jan 23, 2025 · Artificial Intelligence

Mastering Microsoft AutoGen 0.4: Build Async Multi‑Agent Apps from Scratch

This article provides a comprehensive, step‑by‑step guide to Microsoft AutoGen 0.4, explaining its layered architecture, core concepts such as Agent, Runtime, and Agent ID, and demonstrating both a simple Hello‑World multi‑agent example and an AI‑enabled agent with full Python code snippets.

AsyncAutoGenFramework

0 likes · 13 min read

Mastering Microsoft AutoGen 0.4: Build Async Multi‑Agent Apps from Scratch

DataFunSummit

Jan 21, 2025 · Artificial Intelligence

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

This article presents NVIDIA's NeMo technology stack for end‑to‑end large language model (LLM) training, covering the full software pipeline, model alignment with reinforcement learning from human feedback (RLHF), performance optimizations such as model parallelism, FP8, TensorRT‑LLM inference, dynamic load balancing, and future research directions.

Distributed TrainingGPU OptimizationLLM

0 likes · 24 min read

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

ByteFE

Jan 20, 2025 · Artificial Intelligence

Eino: An Open‑Source Golang Framework for Large‑Model Application Development

Eino is a Golang‑based, open‑source framework that streamlines the full devops lifecycle of large‑model applications by providing stable, strongly‑typed components, graph‑based orchestration, built‑in tooling, and extensible architecture to help developers quickly build reliable AI services.

AIFrameworkGolang

0 likes · 13 min read

Eino: An Open‑Source Golang Framework for Large‑Model Application Development

AI Large Model Application Practice

Jan 20, 2025 · Artificial Intelligence

How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs

This article explains how embeddings convert basic character indices into high‑dimensional vectors, describes their training via gradient descent, introduces the embedding matrix, and shows how these vectors enable modern language models to capture semantic relationships and be reused across tasks.

LLMNeural Networksembeddings

0 likes · 8 min read

How Embeddings Transform Simple Character Codes into Powerful Vectors for LLMs

DataFunTalk

Jan 18, 2025 · Artificial Intelligence

Understanding Xiaohongshu’s Content Recommendation Mechanisms: NoteLLM and SSD

This article analyzes Xiaohongshu’s content recommendation system by reviewing two official papers, detailing the NoteLLM framework for interest discovery and the Sliding Spectrum Decomposition (SSD) method for diversified recommendations, and explaining their underlying models, loss functions, and experimental results.

DiversityLLMcollaborative filtering

0 likes · 13 min read

Understanding Xiaohongshu’s Content Recommendation Mechanisms: NoteLLM and SSD

Alibaba Cloud Infrastructure

Jan 17, 2025 · Artificial Intelligence

Elastic Scaling of Large Language Model Inference on Alibaba Cloud ACK with Knative, ResourcePolicy, and Fluid

This article explains how to reduce inference cost and improve performance for large language models on Alibaba Cloud ACK by using Knative's request‑based autoscaling, custom ResourcePolicy priority scheduling, and Fluid data‑caching to achieve elastic scaling, resource pre‑emption, and faster model loading.

FluidInferenceKnative

0 likes · 22 min read

Elastic Scaling of Large Language Model Inference on Alibaba Cloud ACK with Knative, ResourcePolicy, and Fluid

AI Large Model Application Practice

Jan 16, 2025 · Artificial Intelligence

Boosting AI Agent Accuracy with External Validation and Multi‑Path Optimization

The article explains how AI agents can move beyond single‑turn responses by using two enhanced reflection strategies—external tool validation and multi‑path optimization (LATS)—to iteratively improve output quality, reliability, and applicability in complex, high‑stakes tasks.

AIExternal ValidationLATS

0 likes · 10 min read

Boosting AI Agent Accuracy with External Validation and Multi‑Path Optimization

Baobao Algorithm Notes

Jan 15, 2025 · Artificial Intelligence

How Multi-Token Prediction Boosts LLM Training and Inference Efficiency

This article reviews the evolution of Multi‑Token Prediction (MTP) techniques—from early blockwise parallel decoding to Meta's and DeepSeek's implementations—explaining their architectures, training and inference workflows, and the speed‑up gains they offer for large language models.

DeepSeekInference AccelerationLLM

0 likes · 20 min read

How Multi-Token Prediction Boosts LLM Training and Inference Efficiency

Alibaba Cloud Big Data AI Platform

Jan 15, 2025 · Artificial Intelligence

Build an Education‑Focused RAG Solution Using Alibaba PAI

This guide explains how to create a Retrieval‑Augmented Generation (RAG) solution for education on Alibaba PAI, covering knowledge‑base construction with PAI‑Designer, model deployment, connection setup in LangStudio, workflow configuration, online deployment, and a legal‑domain case comparison that highlights RAG's accuracy benefits.

Alibaba PAIEmbeddingKnowledge Base

0 likes · 14 min read

Build an Education‑Focused RAG Solution Using Alibaba PAI

Bilibili Tech

Jan 14, 2025 · Artificial Intelligence

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

We built an LLM‑powered system for Bilibili that automatically creates ad titles from user keywords, employing fluency, style, and quality classifiers, mixed domain data cleaning, and alignment methods such as SFT, DPO and KTO, resulting in a product that now generates about ten percent of daily titles and drives significant ad spend.

AI AlignmentAd Title GenerationBilibili

0 likes · 24 min read

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

JD Tech Talk

Jan 14, 2025 · Artificial Intelligence

Advantages and Engineering Implementation of Generative Recommendation Systems Using Large Language Models

This article explains how generative recommendation systems powered by large language models simplify the recommendation pipeline, integrate world knowledge, benefit from scaling laws, and require specialized engineering optimizations such as TensorRT‑LLM deployment, inference acceleration, and hybrid model strategies to achieve low latency and high throughput in real‑world e‑commerce scenarios.

AIInference OptimizationLLM

0 likes · 10 min read

Advantages and Engineering Implementation of Generative Recommendation Systems Using Large Language Models

JD Cloud Developers

Jan 14, 2025 · Artificial Intelligence

How Generative Recommendation Systems Transform E‑Commerce with LLMs

This article explains how large language models reshape recommendation systems by simplifying pipelines, integrating world knowledge, and leveraging scaling laws, and details the engineering steps for deploying generative recall models—including product encoding, user prompting, model training, TensorRT‑LLM optimization, and continuous performance improvements.

AI OptimizationGenerative RecommendationLLM

0 likes · 13 min read

How Generative Recommendation Systems Transform E‑Commerce with LLMs

AI Large Model Application Practice

Jan 14, 2025 · Artificial Intelligence

Turning Classification Nets into Language Generators: A Step‑by‑Step Guide

This article explains how a simple neural network trained for classification can be adapted to generate natural language by expanding its output layer, encoding characters as numbers, using a sliding‑window context, and recursively predicting the next token, illustrating each step with diagrams and concrete examples.

AILLMNeural Networks

0 likes · 10 min read

Turning Classification Nets into Language Generators: A Step‑by‑Step Guide

Baobao Algorithm Notes

Jan 10, 2025 · Artificial Intelligence

Unlocking Text Classification with Qwen2: Experiments, Tips, and LoRA Fine‑Tuning

This article shares practical experiments and insights on using Qwen2ForSequenceClassification for short‑ and long‑text sentiment tasks, compares it with BERT, outlines improvement strategies such as generative fine‑tuning and LoRA, and provides end‑to‑end code, training details, and evaluation results.

FineTuningLLMLoRA

0 likes · 25 min read

Unlocking Text Classification with Qwen2: Experiments, Tips, and LoRA Fine‑Tuning

Java Architecture Diary

Jan 10, 2025 · Artificial Intelligence

Generate Structured JSON with Ollama LLM Using Java

This guide explains why structured JSON output from LLMs is essential, walks through installing and running Ollama, and provides a complete Java Spring Boot implementation—including POJOs, service code, and best‑practice tips—to retrieve AI‑generated data in a reliable, parsable format.

AIJSONLLM

0 likes · 7 min read

Generate Structured JSON with Ollama LLM Using Java

Tencent Advertising Technology

Jan 9, 2025 · Artificial Intelligence

Applying Large Language Models to Search Advertising: End‑to‑End Generative Recall and System Optimizations

This report details how large language models (LLMs) were integrated into Tencent's search advertising pipeline—from early extraction‑distillation experiments in 2023 to a 2024 end‑to‑end generative recall architecture—showing significant improvements in relevance, diversity, and revenue through knowledge injection, supervised fine‑tuning, constrained beam‑search decoding, and high‑performance inference services.

AIBeam SearchLLM

0 likes · 11 min read

Applying Large Language Models to Search Advertising: End‑to‑End Generative Recall and System Optimizations

Baobao Algorithm Notes

Jan 9, 2025 · Artificial Intelligence

How to Efficiently Deploy and Manage 100 LoRA‑Enhanced LLMs with vLLM

A technical walkthrough shows how to use vLLM to load multiple LoRA adapters for role‑playing LLMs, analyzes the massive GPU and labor costs of naïve deployment, and presents a hosted multi‑LoRA platform as a cost‑effective solution.

AI inferenceLLMLoRA

0 likes · 11 min read

How to Efficiently Deploy and Manage 100 LoRA‑Enhanced LLMs with vLLM

Data Thinking Notes

Jan 7, 2025 · Databases

Unlocking LLM-Powered Text-to-SQL: From Basics to Cutting-Edge Techniques

This article provides a comprehensive overview of LLM-based Text-to-SQL technology, covering its background, evolution, challenges, various LLM-driven methods, benchmark datasets, evaluation metrics, and future research directions to guide researchers and practitioners in advancing natural language interfaces for databases.

LLMText-to-SQLdatabase

0 likes · 18 min read

Unlocking LLM-Powered Text-to-SQL: From Basics to Cutting-Edge Techniques

Infra Learning Club

Jan 7, 2025 · Artificial Intelligence

How GitHub Copilot Workspace Made Me Fear Unemployment

The author experiments with GitHub Copilot Workspace to automatically generate a WeChat mini‑program for family library management, documents the prompting process, code generation, bug fixes, UI tweaks, and reflects on the broader impact of AI‑driven development on programmers' future jobs.

AI code generationGitHub CopilotLLM

0 likes · 5 min read

How GitHub Copilot Workspace Made Me Fear Unemployment