Tagged articles
151 articles
Page 1 of 2
Old Zhang's AI Learning
Old Zhang's AI Learning
May 5, 2026 · Artificial Intelligence

vLLM 0.20.1 Fixes Instability and Speed Issues for DeepSeek V4

The vLLM 0.20.1 patch, released shortly after 0.20.0, consolidates stability fixes and performance optimizations for DeepSeek V4, adds several bug fixes, updates installation instructions, and provides targeted upgrade recommendations for different user scenarios.

DeepSeek-V4GPU inferenceModel Deployment
0 likes · 9 min read
vLLM 0.20.1 Fixes Instability and Speed Issues for DeepSeek V4
Architect
Architect
May 3, 2026 · Artificial Intelligence

Why the Same Model Feels Different in Coding Agents: Model Sets the Capability Ceiling, Harness Sets the Production Floor

The article examines how a model defines an agent’s ultimate capabilities while the harness determines its production reliability, detailing continuous evaluation, context‑budgeting, tool‑error classification, multi‑model migration, and SRE‑style engineering practices needed to keep AI coding agents stable and performant.

AI agentsAgent HarnessContinuous Evaluation
0 likes · 31 min read
Why the Same Model Feels Different in Coding Agents: Model Sets the Capability Ceiling, Harness Sets the Production Floor
Lao Guo's Learning Space
Lao Guo's Learning Space
May 3, 2026 · Artificial Intelligence

2026 Enterprise Guide to Large Model Fine‑Tuning: Choosing, Training, and Deploying

This comprehensive guide explains why enterprises should fine‑tune large language models instead of using raw APIs or RAG, compares six fine‑tuning techniques (Full, LoRA, QLoRA, AdaLoRA, DoRA, Prompt‑Tuning), evaluates popular toolchains, outlines a step‑by‑step workflow, presents cost analyses, real‑world case studies, and practical best‑practice recommendations for 2026.

Cost OptimizationEnterprise AIFine-tuning
0 likes · 18 min read
2026 Enterprise Guide to Large Model Fine‑Tuning: Choosing, Training, and Deploying
PMTalk Product Manager Community
PMTalk Product Manager Community
Apr 30, 2026 · Artificial Intelligence

How a Large AI Model Is Trained: Insights from a High‑Earning AI Product Manager

The article walks through model training, validation, ensemble learning, and deployment from an AI product manager’s viewpoint, using a churn‑prediction case to illustrate decision boundaries, metric choices, industry‑specific algorithm trade‑offs, cost considerations, and practical serving options.

AI product managementLarge ModelModel Deployment
0 likes · 6 min read
How a Large AI Model Is Trained: Insights from a High‑Earning AI Product Manager
AI Explorer
AI Explorer
Apr 29, 2026 · Artificial Intelligence

Open-Source ML Intern: One-Click Paper Reading, Training & Deployment – Hype or Real Deal?

ml‑intern, an open‑source AI agent from Hugging Face, automates the full ML workflow—reading papers, generating code, training and deploying models—using an asynchronous event‑driven loop with submission and event queues, supporting interactive and headless modes, Slack notifications, and multiple LLM back‑ends.

AI AgentHugging FaceLLM
0 likes · 5 min read
Open-Source ML Intern: One-Click Paper Reading, Training & Deployment – Hype or Real Deal?
Sohu Tech Products
Sohu Tech Products
Apr 15, 2026 · Artificial Intelligence

Why Harness Engineering Is the Next Evolution in AI System Design

This tutorial explains the three-stage evolution from Prompt Engineering to Context Engineering and finally Harness Engineering, detailing their motivations, core components, practical implementations, and why stable, end‑to‑end AI agents require a full harness to manage tasks, context, tools, execution, state, and error recovery.

AI systemsAgent DesignContext Engineering
0 likes · 31 min read
Why Harness Engineering Is the Next Evolution in AI System Design
SuanNi
SuanNi
Apr 13, 2026 · Artificial Intelligence

Deploy Qwen3 8B Model with vLLM: Step‑by‑Step Guide for Remote Inference

This guide walks you through deploying Alibaba’s open‑source Qwen‑3 8B model on the SumW platform using vLLM, covering environment activation, server launch with OpenAI‑compatible parameters, SSH tunneling for remote access, and Python client calls, while highlighting key configuration tips and common pitfalls.

Model DeploymentOpenAI APIPython SDK
0 likes · 6 min read
Deploy Qwen3 8B Model with vLLM: Step‑by‑Step Guide for Remote Inference
AI Large-Model Wave and Transformation Guide
AI Large-Model Wave and Transformation Guide
Apr 11, 2026 · Artificial Intelligence

How to Engineer Reliable AI Models: From Infrastructure to Deployment

This article presents a comprehensive, step‑by‑step framework for turning laboratory AI models into production‑ready systems, covering capability mapping, technology stack choices, model selection, prompt engineering, data pipelines, training strategies, and cross‑team collaboration to ensure stability, observability, and trustworthiness.

AI model engineeringModel DeploymentModel Monitoring
0 likes · 14 min read
How to Engineer Reliable AI Models: From Infrastructure to Deployment
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Apr 8, 2026 · Artificial Intelligence

Unlocking 8‑Hour Autonomous Coding: GLM‑5.1’s Leap with Kunlun XPU

The open‑source GLM‑5.1 model, adapted to Baidu Baige's Kunlun XPU via the vLLM‑Kunlun Plugin, delivers record‑breaking SWE‑bench scores, eight‑hour autonomous coding, long‑context handling up to 64K tokens, and scalable deployment across tens of thousands of chips, showcasing end‑to‑end AI acceleration.

GLM-5.1Kunlun XPUModel Deployment
0 likes · 8 min read
Unlocking 8‑Hour Autonomous Coding: GLM‑5.1’s Leap with Kunlun XPU
IT Services Circle
IT Services Circle
Apr 5, 2026 · Artificial Intelligence

Why Harness Engineering Is the Next Frontier in AI System Design

This article explains how AI engineering has evolved from Prompt Engineering to Context Engineering and now Harness Engineering, detailing each stage's challenges, core techniques, and real‑world practices that turn large language models into reliable, long‑running production systems.

Context EngineeringHarness EngineeringLLM operations
0 likes · 32 min read
Why Harness Engineering Is the Next Frontier in AI System Design
Advanced AI Application Practice
Advanced AI Application Practice
Mar 24, 2026 · Artificial Intelligence

Connecting OpenClaw to Ollama: Step‑by‑Step Guide and Common Pitfalls

This article explains why Ollama has become popular for local LLM deployment, outlines its core features, and provides a detailed, step‑by‑step tutorial for integrating OpenClaw with Ollama—including model selection, configuration, troubleshooting common errors, and advanced tips for customization and multi‑model switching.

AILocal-LLMModel Deployment
0 likes · 9 min read
Connecting OpenClaw to Ollama: Step‑by‑Step Guide and Common Pitfalls
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 5, 2026 · Artificial Intelligence

Timber: The “Ollama” for Traditional Machine Learning Models

Timber is a multi‑pass compiler that transforms classic ML models such as XGBoost and LightGBM into zero‑dependency C99 binaries, offering microsecond‑level inference latency, HTTP‑compatible serving, and substantial performance gains over Python runtimes, making it ideal for high‑throughput, low‑latency production scenarios.

LightGBMML compilerModel Deployment
0 likes · 8 min read
Timber: The “Ollama” for Traditional Machine Learning Models
AIWalker
AIWalker
Feb 27, 2026 · Artificial Intelligence

YOLO26 Review: End-to-End, NMS‑Free Edge AI Boosts CPU Inference by 43%

This article analyzes YOLO26’s architecture redesign that eliminates NMS, removes DFL, introduces progressive loss balancing, STAL, and the MuSGD optimizer, achieving up to 43% faster CPU inference and simplifying deployment for edge vision tasks across detection, segmentation, classification, pose estimation, and OBB.

CPU inferenceModel DeploymentNMS-free
0 likes · 13 min read
YOLO26 Review: End-to-End, NMS‑Free Edge AI Boosts CPU Inference by 43%
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 25, 2026 · Artificial Intelligence

Exploring Qwen 3.5: Small‑Scale MoE Models, Architecture, and Deployment Guides

This article reviews the three open‑source Qwen 3.5 models—including a 35B MoE, a 122B MoE, and a 27B dense version—detailing their parameter layouts, core attention designs, context length, inference performance, hardware requirements, and provides step‑by‑step code examples for loading them with Hugging Face Transformers and vLLM.

AIMoEModel Deployment
0 likes · 10 min read
Exploring Qwen 3.5: Small‑Scale MoE Models, Architecture, and Deployment Guides
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Feb 12, 2026 · Artificial Intelligence

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

This article explains how Baidu's new GLM-5 large model is adapted to the Kunlun P800 XPU, detailing the async reinforcement learning framework Slime, optimization techniques like INT8 quantization and tensor‑parallelism, and provides step‑by‑step deployment commands using the open‑source vLLM‑Kunlun plugin.

AI accelerationGLM-5INT8 Quantization
0 likes · 6 min read
Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin
Baidu Geek Talk
Baidu Geek Talk
Dec 17, 2025 · Artificial Intelligence

Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin

The vLLM‑Kunlun Plugin, jointly released by Baidu Baige and Kunlun Chip, provides a high‑performance, zero‑intrusion solution for deploying open‑source large language models on domestic Kunlun XPU hardware, includes fused operators, precision‑validation and profiling tools, and supports over twenty mainstream and multimodal models.

Kunlun XPUModel Deploymentopen‑source
0 likes · 7 min read
Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin
21CTO
21CTO
Oct 6, 2025 · Artificial Intelligence

How to Become an AI Engineer: Skills, Workflow, and Career Path

This guide explains what AI engineering entails, outlines the end‑to‑end workflow from problem definition and data preparation through model development, deployment, and monitoring, and highlights the essential programming, cloud, and MLOps skills, career tracks, emerging trends, and salary outlook for aspiring AI engineers.

AI EngineeringMLOpsModel Deployment
0 likes · 11 min read
How to Become an AI Engineer: Skills, Workflow, and Career Path
DevOps Cloud Academy
DevOps Cloud Academy
Sep 25, 2025 · Artificial Intelligence

How to Build Scalable MLOps Infrastructure for Enterprise AI Success

This article explains what MLOps is, why a robust MLOps framework is essential for businesses, outlines its core components, compares MLOps with AIOps, details the benefits of investing in MLOps, and provides a step‑by‑step guide to designing enterprise‑grade AI MLOps infrastructure.

AI InfrastructureMLOpsMachine Learning Operations
0 likes · 17 min read
How to Build Scalable MLOps Infrastructure for Enterprise AI Success
DevOps Cloud Academy
DevOps Cloud Academy
Sep 21, 2025 · Artificial Intelligence

How to Deploy Machine Learning Models Efficiently: A Complete Guide

This guide explains what model deployment is, why it matters, the various deployment types, readiness criteria, best practices, common challenges, real‑world case studies, and the most popular tools and platforms for deploying machine learning models in production.

AIMLOpsModel Deployment
0 likes · 20 min read
How to Deploy Machine Learning Models Efficiently: A Complete Guide
DaTaobao Tech
DaTaobao Tech
Sep 17, 2025 · Artificial Intelligence

Boosting ID Card Photo Quality with Multimodal AI: A Practical Deployment Guide

This article details how a multimodal AI model was integrated to detect and improve ID card photo quality, covering common image issues, differences between OCR and multimodal extraction, deployment strategies, performance metrics, cost estimation, and the resulting business and technical benefits.

ID verificationModel DeploymentMultimodal AI
0 likes · 13 min read
Boosting ID Card Photo Quality with Multimodal AI: A Practical Deployment Guide
Dunmao Tech Hub
Dunmao Tech Hub
Sep 1, 2025 · Artificial Intelligence

Deploy DeepSeek‑r1 Locally with a One‑Click Ollama Script

This guide walks you through a Bash script that automatically checks for Ollama, installs it if missing, lets you choose a DeepSeek‑r1 model size, starts the Ollama service, and runs the selected model locally, complete with usage examples and a token‑cost note.

AIDeepSeekModel Deployment
0 likes · 7 min read
Deploy DeepSeek‑r1 Locally with a One‑Click Ollama Script
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 17, 2025 · Artificial Intelligence

How to Build a House Price Prediction Model with Python: A Step‑by‑Step Guide

This tutorial walks developers through the complete workflow of building a house‑price regression model—from problem definition, data collection and preprocessing, feature engineering, and model selection, to training, hyper‑parameter tuning, evaluation, optimization, deployment as a Flask service, and ongoing monitoring—using Python, pandas, scikit‑learn, and visualisation libraries.

Model DeploymentPythonfeature engineering
0 likes · 29 min read
How to Build a House Price Prediction Model with Python: A Step‑by‑Step Guide
DataFunSummit
DataFunSummit
Jun 18, 2025 · Artificial Intelligence

How to Upload, Test, and Deploy MiniLM on Modelers.cn: A Step‑by‑Step Guide

This article walks through uploading a MiniLM model to the Modelers.cn community, explains why testing is essential, demonstrates both usability and local tests with openMind, and provides complete Python code for classification and simple question‑answering, enabling developers to quickly deploy and evaluate MiniLM in practice.

MiniLMModel DeploymentNLP
0 likes · 9 min read
How to Upload, Test, and Deploy MiniLM on Modelers.cn: A Step‑by‑Step Guide
DaTaobao Tech
DaTaobao Tech
Jun 4, 2025 · Artificial Intelligence

Understanding Large Language Model Architecture, Parameters, Memory, Storage, and Fine‑Tuning Techniques

This article provides a comprehensive overview of large language models (LLMs), covering their transformer architecture, parameter counts, GPU memory and storage requirements, and detailed fine‑tuning methods such as prompt engineering, data construction, LoRA, PEFT, RLHF, and DPO, along with practical deployment and inference acceleration strategies.

DPOFine-tuningLLM
0 likes · 17 min read
Understanding Large Language Model Architecture, Parameters, Memory, Storage, and Fine‑Tuning Techniques
Architect
Architect
May 31, 2025 · Artificial Intelligence

Edge Intelligence Implementation in the Vivo Official App: Architecture, Feature Engineering, and Model Deployment

The article details how edge intelligence is applied to the Vivo official app to improve product recommendation on the smart‑hardware floor by abstracting the problem, designing feature engineering pipelines, training TensorFlow models, converting them to TFLite, and deploying inference on mobile devices, while also covering monitoring and performance considerations.

Model DeploymentTensorFlow Liteedge AI
0 likes · 19 min read
Edge Intelligence Implementation in the Vivo Official App: Architecture, Feature Engineering, and Model Deployment
Alibaba Cloud Developer
Alibaba Cloud Developer
May 28, 2025 · Artificial Intelligence

Unlocking LLM Fine‑Tuning: From Architecture to LoRA, DPO and Deployment

This article provides a comprehensive guide to large language model fine‑tuning, covering model architecture, parameter and memory calculations, prompt engineering, data construction, LoRA and PEFT techniques, reinforcement learning methods such as DPO, and practical deployment workflows on internal platforms.

Fine‑TuningLLMLoRA
0 likes · 21 min read
Unlocking LLM Fine‑Tuning: From Architecture to LoRA, DPO and Deployment
vivo Internet Technology
vivo Internet Technology
May 21, 2025 · Artificial Intelligence

How Vivo’s App Leverages Edge AI to Personalize Product Recommendations

This article details how Vivo’s official app implements edge intelligence to dynamically rank and recommend hardware products on its homepage, covering problem abstraction, data collection, feature engineering, model design, TensorFlow‑Lite conversion, on‑device inference, and monitoring for a personalized user experience.

AndroidModel DeploymentTensorFlow Lite
0 likes · 19 min read
How Vivo’s App Leverages Edge AI to Personalize Product Recommendations
Baidu Geek Talk
Baidu Geek Talk
May 12, 2025 · Artificial Intelligence

One‑Click Deployment of Baidu Qwen3 Large Models on Baidu Baige AI Platform

This guide explains how to use Baidu Baige's AI heterogeneous computing platform to deploy the eight‑model Qwen3 family—including dense and MoE variants—via a one‑click process, covering resource configuration, inference acceleration options, and post‑deployment service access.

AIBaidu BaigeCloud AI
0 likes · 4 min read
One‑Click Deployment of Baidu Qwen3 Large Models on Baidu Baige AI Platform
Top Architect
Top Architect
Mar 22, 2025 · Artificial Intelligence

Spring AI: Intelligent Development Trend for Java Developers

The article introduces Spring AI as an emerging tool for Java developers, explains its background, goals, and core components such as data processing, model training, deployment and monitoring, showcases application scenarios like NLP, image processing, recommendation systems and predictive analytics, and also includes promotional offers for AI resources and community groups.

Artificial IntelligenceModel Deploymentjava
0 likes · 17 min read
Spring AI: Intelligent Development Trend for Java Developers
Top Architecture Tech Stack
Top Architecture Tech Stack
Mar 22, 2025 · Artificial Intelligence

Spring AI: An Overview of Intelligent Development Trends

This article introduces Spring AI, a Spring ecosystem module that simplifies building, training, and deploying AI applications for Java developers, covering its background, goals, core components such as data processing, model training, deployment, practical code examples, use cases, advantages, challenges, and future outlook.

Artificial IntelligenceModel DeploymentSpring Boot
0 likes · 12 min read
Spring AI: An Overview of Intelligent Development Trends
Architecture Digest
Architecture Digest
Mar 21, 2025 · Artificial Intelligence

Spring AI: Emerging Trends in Intelligent Development

This article introduces Spring AI, explains its background, goals, core components such as data processing, model training, deployment and monitoring, showcases practical use cases like NLP, image processing and recommendation systems, and discusses its advantages, challenges, and future outlook for Java developers.

Artificial IntelligenceModel Deploymentdata-processing
0 likes · 16 min read
Spring AI: Emerging Trends in Intelligent Development
Efficient Ops
Efficient Ops
Mar 9, 2025 · Artificial Intelligence

Essential LLMOps Tools: Build, Deploy, Monitor, and Manage Large Language Models

LLMOps, the end-to-end methodology for managing large language models, encompasses a curated set of development, deployment, monitoring, and local management tools—such as LangChain, vLLM, LangSmith, and Ollama—enabling practitioners to efficiently build, scale, and maintain AI applications.

AI DevelopmentLLMOpsModel Deployment
0 likes · 6 min read
Essential LLMOps Tools: Build, Deploy, Monitor, and Manage Large Language Models
Programmer DD
Programmer DD
Mar 6, 2025 · Artificial Intelligence

Discover QwQ-32B: A 32B LLM Matching 671B DeepSeek‑R1 Performance

The QwQ-32B model, released by Alibaba Cloud, delivers DeepSeek‑R1‑level results with only 32 billion parameters, offers integrated agent capabilities, is open‑source under Apache 2.0, and can be quickly deployed locally via Ollama or integrated into Java applications using Spring AI.

AI inferenceModel DeploymentOllama
0 likes · 4 min read
Discover QwQ-32B: A 32B LLM Matching 671B DeepSeek‑R1 Performance
Alibaba Cloud Native
Alibaba Cloud Native
Feb 19, 2025 · Cloud Native

Engineering Traffic Management for DeepSeek: Cloud‑Native Deployment Strategies

This article outlines practical cloud‑native deployment options for DeepSeek models, explains common engineering challenges such as traffic spikes, latency, security, quota control, and provides detailed AI‑gateway solutions—including fallback, content safety, API key management, gray‑release routing, caching, and observability—to ensure reliable large‑model applications.

DeepSeekModel Deploymenttraffic management
0 likes · 9 min read
Engineering Traffic Management for DeepSeek: Cloud‑Native Deployment Strategies
Alibaba Cloud Native
Alibaba Cloud Native
Feb 18, 2025 · Cloud Native

Deploy DeepSeek‑R1 on Alibaba Cloud ACK One Using ACS GPU in Minutes

This guide shows how to overcome on‑premise compute limits by registering a local Kubernetes cluster to Alibaba Cloud ACK One, provisioning ACS GPU resources, and deploying the DeepSeek‑R1 inference model with the vLLM framework through a series of concrete commands and YAML configurations.

ACK OneACS GPUDeepSeek
0 likes · 15 min read
Deploy DeepSeek‑R1 on Alibaba Cloud ACK One Using ACS GPU in Minutes
Architecture & Thinking
Architecture & Thinking
Feb 18, 2025 · Artificial Intelligence

Why Is DeepSeek Server Overloaded? Causes and Practical Workarounds

The article investigates why DeepSeek frequently returns a “server busy” message, analyzing factors such as sudden traffic spikes, compute and bandwidth limitations, security attacks, and maintenance policies, and then offers actionable solutions including query optimization, off‑peak usage, third‑party cloud platforms, and local deployment.

AIDeepSeekModel Deployment
0 likes · 10 min read
Why Is DeepSeek Server Overloaded? Causes and Practical Workarounds
Architect
Architect
Feb 17, 2025 · Artificial Intelligence

Deploying DeepSeek R1 on Huawei Ascend 910B: Weight Conversion and Troubleshooting

This article details a step‑by‑step deployment of the DeepSeek R1 model on Huawei Ascend 910B NPUs, covering FP8‑to‑BF16 weight conversion, custom container image preparation, configuration of MindIE services, common pitfalls, and practical troubleshooting tips for large‑scale inference.

DeepSeekHuawei AscendMindIE
0 likes · 8 min read
Deploying DeepSeek R1 on Huawei Ascend 910B: Weight Conversion and Troubleshooting
ByteDance Cloud Native
ByteDance Cloud Native
Feb 13, 2025 · Cloud Computing

Deploy the Full‑Size DeepSeek‑R1 Model on Volcengine Cloud with Terraform and Kubernetes

This guide walks you through two practical solutions for deploying the massive DeepSeek‑R1 model on Volcengine Cloud—one using Terraform for a quick two‑node GPU setup and another leveraging cloud‑native multi‑node distributed inference with Kubernetes, covering resource sizing, environment preparation, model download, monitoring, autoscaling, and storage acceleration.

AIKubernetesModel Deployment
0 likes · 22 min read
Deploy the Full‑Size DeepSeek‑R1 Model on Volcengine Cloud with Terraform and Kubernetes
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 13, 2025 · Cloud Computing

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

This guide walks you through deploying the DeepSeek‑R1 large‑language‑model inference service on Alibaba Cloud ACK One registered clusters using ACS GPU compute, covering model preparation, OSS storage setup, PersistentVolume configuration, arena‑based service deployment, and verification steps with concrete commands and parameters.

ACK OneACS GPUDeepSeek
0 likes · 14 min read
Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes
DeWu Technology
DeWu Technology
Feb 12, 2025 · Artificial Intelligence

Edge Intelligence for Intelligent Video Cover Recommendation

The article describes an edge‑based video‑cover recommendation system for DeWu that leverages the MNN SDK and a lightweight MobileNetV3 model, performing on‑device inference with quantization and parallel processing to automatically select high‑quality covers, achieving sub‑second latency and boosting click‑through rates by up to 18 %.

Inference OptimizationModel DeploymentVideo Cover
0 likes · 12 min read
Edge Intelligence for Intelligent Video Cover Recommendation
Tencent Cloud Developer
Tencent Cloud Developer
Feb 7, 2025 · Artificial Intelligence

Launch DeepSeek Models in Seconds with One‑Click Cloud Development

This guide shows how to start DeepSeek large‑language models on cnb.cool in just 5‑10 seconds without downloading, using a simple three‑step process that includes forking the repository, selecting a model branch, and running Ollama or Docker commands, plus options for long‑term cloud deployment.

AICloud NativeDeepSeek
0 likes · 3 min read
Launch DeepSeek Models in Seconds with One‑Click Cloud Development
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Feb 5, 2025 · Artificial Intelligence

Deploy DeepSeek‑V3 on Ascend: Step‑by‑Step Guide for Fast AI Inference

This guide walks developers through obtaining the DeepSeek‑V3 model on the Ascend community, converting weights for GPU and NPU, loading the appropriate MindIE Docker image, launching the container, and configuring service‑level parameters to achieve efficient, out‑of‑the‑box AI inference on Ascend hardware.

AI inferenceAscendDeepSeek
0 likes · 4 min read
Deploy DeepSeek‑V3 on Ascend: Step‑by‑Step Guide for Fast AI Inference
21CTO
21CTO
Feb 4, 2025 · Artificial Intelligence

Run DeepSeek Locally with Ollama: A Complete Step‑by‑Step Guide

This guide walks you through installing Ollama, selecting the appropriate DeepSeek model, running it locally, and exploring integration options, highlighting the benefits of offline AI such as data privacy, faster performance, and zero subscription costs.

AI TutorialArtificial IntelligenceDeepSeek
0 likes · 7 min read
Run DeepSeek Locally with Ollama: A Complete Step‑by‑Step Guide
Tencent Tech
Tencent Tech
Feb 4, 2025 · Artificial Intelligence

Deploy and Test DeepSeek Large Language Models on Tencent Cloud TI in Minutes

This guide walks you through quickly deploying DeepSeek series models on the Tencent Cloud TI platform, covering model selection, resource planning, step‑by‑step service creation, free online trial, API testing via built‑in tools or curl, and managing inference services for both large and compact models.

AI inferenceDeepSeekModel Deployment
0 likes · 13 min read
Deploy and Test DeepSeek Large Language Models on Tencent Cloud TI in Minutes
JavaEdge
JavaEdge
Feb 2, 2025 · Artificial Intelligence

Mastering LLMOps: From Model Deployment to Scalable AI Operations

This article explains LLMOps—its goals, core activities, benefits, best practices, and how using an LLMOps platform like Dify can dramatically cut development time, simplify prompt engineering, data preparation, monitoring, and deployment of large language models.

AI OperationsData ManagementLLMOps
0 likes · 13 min read
Mastering LLMOps: From Model Deployment to Scalable AI Operations
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 1, 2025 · Artificial Intelligence

Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery

This article introduces Alibaba Cloud's PAI Model Gallery, detailing the DeepSeek-V3 and DeepSeek‑R1 large language models, their architectures and parameters, and provides a step‑by‑step guide for one‑click deployment of these models and their distilled variants using vLLM or BladeLLM.

AI inferenceAlibaba CloudDeepSeek
0 likes · 6 min read
Deploy DeepSeek-V3 and R1 Models with One-Click on Alibaba Cloud PAI Model Gallery
DevOps
DevOps
Jan 6, 2025 · Artificial Intelligence

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

This article reviews ten mainstream LLM deployment solutions—including WebLLM, LM Studio, Ollama, vLLM, LightLLM, OpenLLM, HuggingFace TGI, GPT4ALL, llama.cpp, and Triton Inference Server—detailing their technical characteristics, strengths, drawbacks, and example deployment workflows for both personal and enterprise environments.

AI inferenceGPU AccelerationLLM
0 likes · 16 min read
Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations
DeWu Technology
DeWu Technology
Dec 11, 2024 · Artificial Intelligence

MLOps Practices for Improving Order Fulfillment Timeliness

The supply‑chain team leveraged core MLOps practices—versioning, testing, automated reproducible pipelines, deployment monitoring, and documentation—to eliminate data leakage, ensure online consistency, and accelerate model upgrades, using traffic‑replay, FAAS‑based decoupling, and approval workflows, ultimately cutting order‑fulfillment times, reducing costs, and enabling business teams to adopt reliable AI models at scale.

MLOpsModel Deploymentautomation
0 likes · 18 min read
MLOps Practices for Improving Order Fulfillment Timeliness
Test Development Learning Exchange
Test Development Learning Exchange
Dec 5, 2024 · Artificial Intelligence

End-to-End House Prices Prediction Project: Data Collection, Preprocessing, Modeling, Evaluation, and Deployment with Python

This tutorial walks through a complete house price prediction project, covering data collection from Kaggle, preprocessing with pandas and scikit‑learn, model training using RandomForestRegressor, evaluation, and deployment of a Flask API for real‑time predictions, providing full code examples.

FlaskModel DeploymentPython
0 likes · 9 min read
End-to-End House Prices Prediction Project: Data Collection, Preprocessing, Modeling, Evaluation, and Deployment with Python
Baidu Geek Talk
Baidu Geek Talk
Nov 25, 2024 · Artificial Intelligence

PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX

PP‑ShiTuV2, a PaddleX pipeline that integrates subject detection, deep feature encoding, and vector retrieval, delivers 91 % recall@1 on AliProducts, surpasses earlier models by over 20 points, runs efficiently on GPU and CPU, and offers simple installation, quick‑start code, and full fine‑tuning support.

Computer VisionDeep LearningModel Deployment
0 likes · 8 min read
PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX
Huolala Tech
Huolala Tech
Oct 24, 2024 · Artificial Intelligence

How Huolala’s Dolphin Platform Accelerates AI Model Delivery with Cloud‑Native Automation

This article describes how Huolala built a cloud‑native AI development platform called Dolphin to overcome low model delivery efficiency and poor compute‑resource utilization, detailing its architecture, one‑stop workflow, resource‑pooling, observability, and future roadmap for scaling AI across the company.

Cloud NativeKubernetesModel Deployment
0 likes · 10 min read
How Huolala’s Dolphin Platform Accelerates AI Model Delivery with Cloud‑Native Automation
Baidu Geek Talk
Baidu Geek Talk
Sep 23, 2024 · Artificial Intelligence

Intelligent Early Screening System for Malignant Skin Tumors Based on PaddleX Low‑Code AI

The Meikel Studio team created an intelligent early‑screening system for malignant skin tumors on the PaddleX low‑code AI platform, which automatically captures dermatoscopic images, segments lesions with the PP‑LiteSeg model, achieves high accuracy (mIoU 0.868) and rapid inference, and offers one‑click deployment via RESTful API to improve diagnosis efficiency and support future medical‑imaging applications.

AI segmentationModel DeploymentPaddleX
0 likes · 9 min read
Intelligent Early Screening System for Malignant Skin Tumors Based on PaddleX Low‑Code AI
58 Tech
58 Tech
Aug 7, 2024 · Artificial Intelligence

Bridging Compute and Applications: 58.com AI Lab’s Large‑Model Platform and AI Agent Solutions

In this article, 58.com AI Lab senior director Zhan Kunlin explains how the company built a multi‑layer AI platform, created a vertical large‑language model called LingXi, and developed an AI Agent system with RAG capabilities to accelerate practical AI applications across various business scenarios.

AI PlatformAI agentsModel Deployment
0 likes · 10 min read
Bridging Compute and Applications: 58.com AI Lab’s Large‑Model Platform and AI Agent Solutions
DataFunTalk
DataFunTalk
Jun 21, 2024 · Artificial Intelligence

Fine‑tuning Large Language Models with Alibaba Cloud PAI: Practices, Techniques, and Deployment

This article introduces the Alibaba Cloud PAI platform for large language model (LLM) fine‑tuning, covering model‑training pipelines, performance‑cost trade‑offs, retrieval‑augmented generation, fine‑tuning methods such as full‑parameter, LoRA and QLoRA, model selection, data preparation, evaluation, and real‑world deployment examples.

AI PlatformFine-tuningLLM
0 likes · 20 min read
Fine‑tuning Large Language Models with Alibaba Cloud PAI: Practices, Techniques, and Deployment
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jun 12, 2024 · Artificial Intelligence

Deploy Llama‑2 on ACK with KServe, Triton, and TensorRT‑LLM – Step‑by‑Step Guide

This tutorial walks through deploying the Llama‑2‑7b‑hf model on Alibaba Cloud Kubernetes (ACK) using KServe, Triton Inference Server with the TensorRT‑LLM backend, covering prerequisites, model preparation, YAML configuration, PV/PVC setup, runtime creation, and troubleshooting steps.

AI inferenceKServeKubernetes
0 likes · 13 min read
Deploy Llama‑2 on ACK with KServe, Triton, and TensorRT‑LLM – Step‑by‑Step Guide
Baidu Tech Salon
Baidu Tech Salon
Jun 7, 2024 · Artificial Intelligence

How AI Transforms Financial Report Extraction: From Layout Analysis to Table Recognition

This article examines the challenges of extracting data from complex financial reports and presents an AI‑driven solution that combines advanced layout analysis, table recognition, OCR, and large‑language‑model integration using Baidu’s PaddlePaddle low‑code platform, detailing model selection, training, performance tuning, and deployment.

AIDocument ExtractionLayout Analysis
0 likes · 11 min read
How AI Transforms Financial Report Extraction: From Layout Analysis to Table Recognition
Sohu Tech Products
Sohu Tech Products
Jun 5, 2024 · Artificial Intelligence

How Treelite Supercharges Tree Model Inference by Up to 6×

This article introduces Treelite, an open‑source library that compiles XGBoost, LightGBM, and scikit‑learn tree models into optimized shared libraries, explains its branch‑prediction and comparison‑simplification techniques, and provides step‑by‑step Python examples showing significant inference speed gains across different batch sizes.

LightGBMModel DeploymentPython
0 likes · 6 min read
How Treelite Supercharges Tree Model Inference by Up to 6×
DataFunSummit
DataFunSummit
May 10, 2024 · Artificial Intelligence

LLMOps: Definition, Fine‑tuning Techniques, Application Architecture, Challenges and Solutions

This article introduces LLMOps by defining large language model operations, explains the three stages of LLM development, details modern fine‑tuning methods such as PEFT, Adapter, Prefix, Prompt and LoRA, outlines the architecture for building LLM applications, discusses the main difficulties of agent‑based deployments, and presents practical solutions including Prompt IDE, low‑code deployment, monitoring and cost control.

AI OperationsFine-tuningLLMOps
0 likes · 14 min read
LLMOps: Definition, Fine‑tuning Techniques, Application Architecture, Challenges and Solutions
360 Tech Engineering
360 Tech Engineering
Apr 15, 2024 · Artificial Intelligence

Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform

This article explains the concept, motivations, and step‑by‑step workflow for fine‑tuning large language models—specifically Qwen‑14B—covering data preparation, training commands with DeepSpeed, hyper‑parameter settings, evaluation, and deployment via FastChat, all illustrated with code snippets and configuration details.

AIDeepSpeedFastChat
0 likes · 10 min read
Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Mar 22, 2024 · Artificial Intelligence

InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide

This tutorial walks through fine‑tuning Shanghai AI Lab’s open‑source InternLM models with XTuner, explaining chat‑format conventions, loading and inference (including multimodal InternLM‑XComposer), dataset preparation, configuration sections, DeepSpeed acceleration, and memory‑efficient QLoRA details for 7‑B‑parameter chat models.

Chat FormatDeepSpeedFine-tuning
0 likes · 22 min read
InternLM Model Fine-Tuning Tutorial with XTuner: Chat Format and Practical Implementation Guide
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 29, 2024 · Artificial Intelligence

Deploy and Fine‑Tune Qwen1.5 LLM with Alibaba PAI‑QuickStart

This article introduces Alibaba Cloud's open‑source Qwen1.5 large language model series, highlights its multilingual, human‑preference alignment, and long‑context capabilities, and provides step‑by‑step guidance on using PAI‑QuickStart for model deployment, fine‑tuning, and Python SDK integration.

Fine-tuningModel DeploymentPAI-QuickStart
0 likes · 9 min read
Deploy and Fine‑Tune Qwen1.5 LLM with Alibaba PAI‑QuickStart
DataFunSummit
DataFunSummit
Feb 25, 2024 · Artificial Intelligence

Tencent FinTech AI Development Platform: Architecture, Challenges, and Solutions

This article introduces Tencent FinTech’s AI development platform, outlining its business background and goals, the technical challenges encountered in feature engineering, model training, and inference stability, and the comprehensive solutions—including a unified feature engine, distributed training framework, optimized deployment, and future plans for large‑scale graph training and AutoML.

AI PlatformFinTechModel Deployment
0 likes · 13 min read
Tencent FinTech AI Development Platform: Architecture, Challenges, and Solutions
21CTO
21CTO
Feb 22, 2024 · Artificial Intelligence

How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop

Google’s newly released open‑source Gemma models let developers run powerful large‑language‑model workloads on notebooks, workstations, or cloud platforms, offering competitive performance, extensive tooling, and built‑in safety measures for responsible AI deployment.

AI SafetyGemmaGoogle AI
0 likes · 6 min read
How Google’s Open‑Source Gemma Model Brings LLM Power to Your Laptop
DataFunSummit
DataFunSummit
Feb 3, 2024 · Artificial Intelligence

Practical Application of Large Language Models in MaShang Consumer Finance: From Model Building to Deployment

This article details how MaShang Consumer Finance leverages large language models for sales, collection, and customer service, covering company background, AI research achievements, model training infrastructure, data‑quality and compliance challenges, prompt engineering, inference acceleration, evaluation methods, and lessons learned from real‑world deployment.

Data QualityLLMModel Deployment
0 likes · 21 min read
Practical Application of Large Language Models in MaShang Consumer Finance: From Model Building to Deployment
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 12, 2024 · Artificial Intelligence

Deploy and Fine‑Tune Mixtral‑8x7B on Alibaba Cloud PAI: A Step‑by‑Step Guide

This guide introduces the open‑source Mixtral‑8x7B large language model, explains its architecture and performance, and provides detailed instructions for using Alibaba Cloud PAI‑QuickStart to deploy, invoke via API or SDK, and fine‑tune the model with LoRA on Lingjun GPU resources.

Alibaba Cloud PAIFine-tuningMixtral
0 likes · 16 min read
Deploy and Fine‑Tune Mixtral‑8x7B on Alibaba Cloud PAI: A Step‑by‑Step Guide
Alibaba Cloud Native
Alibaba Cloud Native
Jan 6, 2024 · Cloud Computing

Deploy ModelScope Models to Alibaba Cloud Function Compute in 5 Minutes

This guide walks readers through using ModelScope’s SwingDeploy service to locate, configure, and instantly deploy open‑source AI models to Alibaba Cloud Function Compute, explaining the resources created, how to invoke the model via HTTP triggers, and how to optimize performance with provisioned instances, logging, and concurrency settings.

AI model servingAlibaba CloudFunction Compute
0 likes · 15 min read
Deploy ModelScope Models to Alibaba Cloud Function Compute in 5 Minutes
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 27, 2023 · Artificial Intelligence

Comprehensive Overview of Large Language Models: Capabilities, Limitations, Deployment, and Future Trends

This article provides a detailed examination of large language models, covering their underlying technologies, capabilities and constraints, model families, training processes, cloud and edge deployment challenges, agent architectures, and emerging trends, offering practical insights for developers, product managers, and researchers.

Artificial IntelligenceEdge ComputingLLM
0 likes · 43 min read
Comprehensive Overview of Large Language Models: Capabilities, Limitations, Deployment, and Future Trends
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 1, 2023 · Operations

Deploy Hugging Face Transformers with One Click Using LMDeploy

This article explains how LMDeploy streamlines the deployment of Hugging Face transformer models by adding online conversion, offering an OpenAI‑compatible API server, a Gradio WebUI, and 4‑bit weight‑only quantization with AWQ, providing step‑by‑step commands, code examples, and performance insights.

AI inferenceAPI ServerHugging Face
0 likes · 9 min read
Deploy Hugging Face Transformers with One Click Using LMDeploy
Ant R&D Efficiency
Ant R&D Efficiency
Oct 26, 2023 · Artificial Intelligence

TestAgent: Open-Source 7B LLM That Supercharges Automated Test Generation

TestAgent is an open-source 7B test-domain LLM that delivers multi-language test-case generation, automatic assert completion, and a rapid deployment framework, offering industry-leading pass@1 scores, a ChatBot UI, and detailed setup instructions for diverse hardware environments.

AI testingModel DeploymentSoftware Testing
0 likes · 8 min read
TestAgent: Open-Source 7B LLM That Supercharges Automated Test Generation
DaTaobao Tech
DaTaobao Tech
Oct 25, 2023 · Artificial Intelligence

Prompt Engineering, LLM Supervised Fine‑Tuning, and Mobile Tmall AI Assistant Application

The article explains prompt engineering techniques, supervised fine‑tuning of large language models, and their practical deployment in the Mobile Tmall AI shopping assistant, detailing ChatGPT’s generation steps, Transformer architecture, prompt clarity, delimiters, role‑play, few‑shot and chain‑of‑thought prompting, SFT versus pre‑training, LoRA adapters, data collection, Qwen‑14B training configuration, SDK‑based inference, and comprehensive evaluation.

AI AssistantLLM fine-tuningModel Deployment
0 likes · 14 min read
Prompt Engineering, LLM Supervised Fine‑Tuning, and Mobile Tmall AI Assistant Application
Baidu Geek Talk
Baidu Geek Talk
Oct 11, 2023 · Artificial Intelligence

How Baidu’s Qianfan 2.0 Supercharges Large‑Model Development and Deployment

The article reviews Baidu Cloud’s Qianfan 2.0 platform, detailing its expanded model catalog, dataset library, Chinese‑language enhancements, compression and speed gains, robust AI infrastructure, application templates, and end‑to‑end data‑labeling pipeline that together lower cost and accelerate large‑model adoption across industries.

AI PlatformCloud AIModel Deployment
0 likes · 14 min read
How Baidu’s Qianfan 2.0 Supercharges Large‑Model Development and Deployment
JD Tech
JD Tech
Aug 4, 2023 · Artificial Intelligence

Deploying and Evaluating the Vicuna Open‑Source Large Language Model on a Single Machine

This article details a step‑by‑step guide to deploying the Vicuna open‑source LLM on a single server, covering model preparation, environment setup, dependency installation, GPU and CUDA configuration, inference commands, performance evaluation, and attempted fine‑tuning, while sharing practical observations and results.

Fine‑tuningGPUInference
0 likes · 16 min read
Deploying and Evaluating the Vicuna Open‑Source Large Language Model on a Single Machine
360 Quality & Efficiency
360 Quality & Efficiency
Aug 4, 2023 · Artificial Intelligence

Machine Learning Model Testing Workflow and Best Practices

This article outlines the essential concepts, data preparation, model creation, training, deployment, and verification steps for testing machine‑learning models, highlighting dataset requirements, algorithm categories, framework choices, resource considerations, and provides a sample inference request.

AIModel DeploymentXGBoost
0 likes · 7 min read
Machine Learning Model Testing Workflow and Best Practices
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 25, 2023 · Artificial Intelligence

Fine‑Tune and Deploy Llama 2 on Alibaba Cloud PAI in Minutes

This guide walks you through using Meta's open‑source Llama 2 models on Alibaba Cloud's PAI platform, covering low‑code LoRA fine‑tuning, full‑parameter fine‑tuning with PAI‑DSW, and rapid WebUI deployment via PAI‑EAS, complete with step‑by‑step instructions, code snippets, and resource requirements.

AIAlibaba CloudFine-tuning
0 likes · 16 min read
Fine‑Tune and Deploy Llama 2 on Alibaba Cloud PAI in Minutes
DataFunTalk
DataFunTalk
Jul 11, 2023 · Artificial Intelligence

Sunshine Insurance Group's Zhèngyán Large Model Open Platform: Architecture, Tools, and Business Applications

The article describes Sunshine Insurance Group's Zhèngyán Large Model Open Platform, detailing its three‑layer architecture, AutoTrain tool, self‑developed LLM, smart routing, plugin marketplace, intelligent review, and how these capabilities empower insurance marketing, sales, service, and management through AI‑driven solutions.

AI PlatformInsurance TechnologyModel Deployment
0 likes · 13 min read
Sunshine Insurance Group's Zhèngyán Large Model Open Platform: Architecture, Tools, and Business Applications
DataFunSummit
DataFunSummit
Jun 24, 2023 · Artificial Intelligence

From Model to Service: Alibaba Cloud Machine Learning PAI One‑Stop Model Development and Deployment Practice

This article presents an end‑to‑end overview of Alibaba Cloud’s Machine Learning PAI platform, detailing the three‑stage ML workflow, challenges in model development, the role of pre‑trained and open‑source models, PAI’s architecture, a hands‑on demo, and MLOps best practices for efficient model deployment.

Alibaba CloudMLOpsModel Deployment
0 likes · 11 min read
From Model to Service: Alibaba Cloud Machine Learning PAI One‑Stop Model Development and Deployment Practice
JD Retail Technology
JD Retail Technology
May 18, 2023 · Artificial Intelligence

Local Deployment, Inference, and Fine‑tuning of the Vicuna‑7B Large Language Model

This article details the step‑by‑step process of preparing the environment, merging weights, installing dependencies, running inference, evaluating Vicuna‑7B against other models, and attempting fine‑tuning, while highlighting performance results, encountered issues, and future work for large language model deployment.

Fine-tuningGPUInference
0 likes · 11 min read
Local Deployment, Inference, and Fine‑tuning of the Vicuna‑7B Large Language Model
HelloTech
HelloTech
Apr 19, 2023 · Cloud Native

How FaaS Transforms AI Platforms: Lessons from Haro’s Cloud‑Native Journey

The article analyzes the operational, stability, and cost challenges of Haro’s AI platform, explains why a serverless FaaS architecture—specifically Knative—was selected, and details the implementation steps, performance gains, and future scenarios for AI workloads.

AI PlatformCloud NativeCost Optimization
0 likes · 8 min read
How FaaS Transforms AI Platforms: Lessons from Haro’s Cloud‑Native Journey
HelloTech
HelloTech
Apr 12, 2023 · Artificial Intelligence

Integrating Machine Learning Ranking into Elasticsearch: Architecture, Components, and Performance

The team embedded a full machine‑learning ranking pipeline as an Elasticsearch plug‑in—combining real‑time and offline feature stores, hot‑loadable model jars via Dragonfly, an MLeap execution engine, and a DSL for feature definition—replacing the coarse‑ranking logistic‑regression with a tree model that adds ~10 ms latency but yields a 1.2 % AB‑test lift, while maintaining high throughput, low CPU usage, and supporting future batch deep‑learning rescoring.

Model Deploymentfeature engineeringonline prediction
0 likes · 16 min read
Integrating Machine Learning Ranking into Elasticsearch: Architecture, Components, and Performance
Tencent Advertising Technology
Tencent Advertising Technology
Mar 30, 2023 · Artificial Intelligence

Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising

Tencent’s Taiji machine learning platform, a cloud‑native, distributed parameter‑server system, provides end‑to‑end MLOps for advertising by integrating data ingestion, feature engineering, model training, evaluation, deployment, and monitoring, supporting massive models up to billions of parameters while improving efficiency, scalability, and resource management.

Distributed TrainingMLOpsMachine Learning Platform
0 likes · 18 min read
Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising