Tagged articles

Model Optimization

117 articles · Page 1 of 2

Jun 29, 2026 · Artificial Intelligence

DeepSeek’s DSpark Boosts AI Inference Speed Up to 400% with Speculative Decoding

DeepSeek’s open‑source DSpark applies speculative decoding to its V4 Flash and Pro models, delivering 51%‑400% inference throughput gains that vary by task, while also supporting other models such as Gemma and Qwen, positioning it as a versatile, cross‑model acceleration solution.

AI Inference AccelerationDeepSeekGemma

0 likes · 6 min read

DeepSeek’s DSpark Boosts AI Inference Speed Up to 400% with Speculative Decoding

AI Large-Model Wave and Transformation Guide

Jun 11, 2026 · Artificial Intelligence

How a 4B Ontology Model Beats Trillion-Parameter LLMs with 89.47% Enterprise Inference Accuracy

A 4‑billion‑parameter Large Ontology Model (LOM) outperforms the trillion‑parameter DeepSeek‑V3.2 on complex enterprise reasoning tasks, achieving 89.47% accuracy by embedding a dual‑layer ontology into the model through a three‑stage Build‑Align‑Reason framework, dramatically lowering deployment cost and latency.

Enterprise AIKnowledge GraphLOM

0 likes · 12 min read

How a 4B Ontology Model Beats Trillion-Parameter LLMs with 89.47% Enterprise Inference Accuracy

Machine Heart

May 26, 2026 · Artificial Intelligence

Grok Survives xAI Shutdown with 1.5‑T V9‑Medium Model – Musk Announces

After xAI’s dissolution, Elon Musk revealed that the new Grok V9‑Medium model, a 1.5‑trillion‑parameter foundation model optimized for Blackwell GPUs and enriched with Cursor data, has completed training, will undergo fine‑tuning and reinforcement learning, and is slated for public release within weeks, while the older 0.5‑T model will be open‑sourced later this year.

AI AgentBlackwell GPUCursor data

0 likes · 6 min read

Grok Survives xAI Shutdown with 1.5‑T V9‑Medium Model – Musk Announces

Machine Learning Algorithms & Natural Language Processing

May 4, 2026 · Artificial Intelligence

DeepSeek‑TUI: A Claude‑Code‑Style Terminal Agent Optimized for DeepSeek

DeepSeek‑TUI is a Rust‑based terminal coding agent modeled after Claude Code, specially tuned for DeepSeek V4, offering chain‑of‑thought streaming, a 1 M‑token context window with automatic compression, cost‑saving RLM mode, multiple operation tiers, and a rapid release cadence that has driven its popularity to over 2.3k GitHub stars.

AIDeepSeekModel Optimization

0 likes · 9 min read

DeepSeek‑TUI: A Claude‑Code‑Style Terminal Agent Optimized for DeepSeek

AIWalker

Mar 23, 2026 · Artificial Intelligence

Dynamic Dense Computing and Minimal End‑to‑End Design: YOLO-Master & YOLO26

By introducing a dynamic mixture‑of‑experts routing scheme and an end‑to‑end architecture that eliminates NMS and DFL, YOLO‑Master and YOLO26 dramatically cut compute waste and latency on edge devices, achieving up to 43% faster CPU inference while keeping model accuracy, with all code openly released.

Dynamic RoutingMixture of ExpertsModel Optimization

0 likes · 7 min read

Dynamic Dense Computing and Minimal End‑to‑End Design: YOLO-Master & YOLO26

Machine Learning Algorithms & Natural Language Processing

Mar 10, 2026 · Artificial Intelligence

Why the First Token Becomes a Value Garbage Bin – LeCun Team Dissects Spike and Attention Sink Mechanics

The paper by Yann LeCun’s team reveals that massive activation spikes and attention sinks in Transformers are not inherently coupled; spikes arise from position‑0 token interactions and specific feed‑forward dynamics, while attention sinks emerge from Pre‑norm normalization and head dimension, offering practical insights for model quantization and long‑context inference.

Attention SinkLLMMassive Activations

0 likes · 9 min read

Why the First Token Becomes a Value Garbage Bin – LeCun Team Dissects Spike and Attention Sink Mechanics

AIWalker

Mar 7, 2026 · Artificial Intelligence

YOLO-Master v2026.02 Unveils Four Innovations for SOTA Object Detection

Tencent’s YOLO-Master v2026.02 adds a Mixture‑of‑Experts architecture, zero‑overhead LoRA fine‑tuning, Sparse SAHI inference for large images, and Cluster‑Weighted NMS, delivering 3‑5× faster inference, up to 70% reduced training resources, and markedly higher detection accuracy across diverse benchmarks.

LoRAMixture of ExpertsModel Optimization

0 likes · 15 min read

YOLO-Master v2026.02 Unveils Four Innovations for SOTA Object Detection

PaperAgent

Jan 19, 2026 · Artificial Intelligence

How Reinforcement Learning Can Boost LLM Reasoning by Shaping Token Distributions

Recent research shows that applying reinforcement learning to large language models can dramatically improve inference performance, but its effectiveness depends on the token distribution produced during pre‑training, prompting a novel rewrite of cross‑entropy as a single‑step policy gradient with controllable entropy parameters.

LLMModel OptimizationRL

0 likes · 6 min read

How Reinforcement Learning Can Boost LLM Reasoning by Shaping Token Distributions

AI Frontier Lectures

Jan 15, 2026 · Artificial Intelligence

What Makes YOLO26 the Next Leap in Edge AI Object Detection?

YOLO26, the latest Ultralytics release, introduces a unified model family with five sizes, removes distribution focal loss, offers end‑to‑end inference without NMS, adds progressive loss balancing and the MuSGD optimizer, and delivers up to 43% faster CPU performance, making it ideal for edge and real‑world vision applications.

Model OptimizationYOLO26edge AI

0 likes · 12 min read

What Makes YOLO26 the Next Leap in Edge AI Object Detection?

Old Meng AI Explorer

Jan 10, 2026 · Artificial Intelligence

Run Large Language Models on a Laptop: How ktransformers Breaks the GPU Barrier

ktransformers is an open‑source AI model optimization framework that uses dynamic quantization, layer fusion and memory reuse to cut memory usage by up to 50%, double loading speed and reduce inference cost, enabling 7B‑13B models to run smoothly on ordinary CPUs or low‑end GPUs.

KTransformersModel OptimizationPython

0 likes · 11 min read

Run Large Language Models on a Laptop: How ktransformers Breaks the GPU Barrier

Architect

Jan 1, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

DeepSeek’s new paper introduces mHC, a manifold‑constrained version of Hyper‑Connections that stabilizes gradient flow, adds only 6.7% training overhead, and enables reliable training of 27‑billion‑parameter models while improving benchmark performance by about 2%.

AI ArchitectureLarge‑Scale TrainingManifold-Constrained

0 likes · 7 min read

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

Old Meng AI Explorer

Dec 29, 2025 · Artificial Intelligence

Run 100B LLMs on a Laptop: How BitNet’s 1‑bit Quantization Makes It Possible

BitNet’s 1‑bit quantization shrinks model size and compute needs by tenfold, enabling ordinary CPUs and low‑power ARM devices to run 2B‑100B language models locally with acceptable speed, low power consumption, and near‑original quality, while providing simple installation and optional GPU acceleration.

BitNetCPU inferenceLLM Quantization

0 likes · 10 min read

Run 100B LLMs on a Laptop: How BitNet’s 1‑bit Quantization Makes It Possible

Alibaba Cloud Developer

Dec 18, 2025 · Artificial Intelligence

How to Build a Real‑Time AI‑Powered Anime‑Style Video Generator for Social Apps

This technical report details the end‑to‑end workflow for integrating an AIGC video generation module into a social app, covering requirement analysis, model and hardware selection, dataset construction, LoRA and full‑parameter training, multiple acceleration techniques such as Sage Attention, TeaCache, XDiT, gradient‑checkpointing offload, tiled VAE, and quantization, followed by extensive performance evaluation and metric‑based ranking of the final models.

AI video generationDiffusion ModelsLoRA fine-tuning

0 likes · 38 min read

How to Build a Real‑Time AI‑Powered Anime‑Style Video Generator for Social Apps

Data Party THU

Dec 10, 2025 · Artificial Intelligence

How DeepSeek‑V3.2 Cuts Inference Cost and Boosts Agent Skills with Sparse Attention

DeepSeek's V3.2 release introduces a dual‑model lineup, a Sparse Attention architecture that halves long‑context inference cost, a post‑training reinforcement‑learning pipeline that exceeds 10% of pre‑training compute, and a revamped agent framework that dramatically improves tool‑use and reasoning performance across benchmarks.

Agentic AIDeepSeekLarge Language Model

0 likes · 11 min read

How DeepSeek‑V3.2 Cuts Inference Cost and Boosts Agent Skills with Sparse Attention

Baobao Algorithm Notes

Dec 7, 2025 · Artificial Intelligence

Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings

This article critically examines recent RL‑for‑LLM studies, revealing that reinforcement learning improves search efficiency but does not extend the intrinsic reasoning capabilities of base models, and explores the underlying model‑conditioned optimization bias, comparisons with SFT distillation, and the trade‑off with catastrophic forgetting.

Catastrophic ForgettingLLMModel Optimization

0 likes · 11 min read

Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings

Alibaba Cloud Developer

Nov 5, 2025 · Artificial Intelligence

How TinyAI Brings a Full‑Stack AI Framework to Pure Java

TinyAI is a completely Java‑implemented, lightweight full‑stack AI framework that demonstrates how to build a production‑grade deep‑learning system—from low‑level numeric tensors and automatic differentiation to modular neural‑network layers, training pipelines, large‑language‑model implementations, and intelligent agent architectures—while remaining education‑friendly and free of external dependencies.

AI FrameworkAgent SystemCode examples

0 likes · 33 min read

How TinyAI Brings a Full‑Stack AI Framework to Pure Java

Data Party THU

Oct 18, 2025 · Artificial Intelligence

Can Classic Graph Autoencoders Rival SOTA? Surprising Optimizations Reveal Their Power

Researchers from Peking University demonstrate that, by applying modern optimization techniques to the decades‑old Graph Autoencoder (GAE), the model can achieve state‑of‑the‑art link‑prediction performance on benchmarks like ogbl‑ppa, while delivering orders‑of‑magnitude speed improvements, challenging the trend toward ever‑more complex GNNs.

EfficiencyGraph Neural NetworksModel Optimization

0 likes · 10 min read

Can Classic Graph Autoencoders Rival SOTA? Surprising Optimizations Reveal Their Power

JD Tech Talk

Sep 11, 2025 · Artificial Intelligence

How to Seamlessly Migrate AI Workloads from Nvidia GPUs to Domestic Accelerators

This article explains why migrating AI applications from Nvidia GPUs to domestic graphics cards is urgent, outlines the technical challenges, and introduces JoyScale’s zero‑perception migration stack that enables end‑to‑end hardware, software, and model adaptation for reliable, high‑performance AI deployment.

AI migrationJoyScaleModel Optimization

0 likes · 11 min read

How to Seamlessly Migrate AI Workloads from Nvidia GPUs to Domestic Accelerators

Data Party THU

Aug 18, 2025 · Artificial Intelligence

Unlock XGBoost Performance: Master the Core Parameters

This article provides a detailed, visual guide to XGBoost's most important hyper‑parameters—such as max_depth, min_child_weight, learning_rate, gamma, subsample, colsample_bytree, scale_pos_weight, alpha, and lambda—explaining how each influences tree complexity, regularization, and model generalization, and offering practical examples for effective tuning.

Model OptimizationRegularizationXGBoost

0 likes · 12 min read

Unlock XGBoost Performance: Master the Core Parameters

DevOps

Aug 16, 2025 · Artificial Intelligence

Google Unveils Gemma 3 270M: A Tiny, High‑Efficiency Open‑Source AI Model

Google has released the open‑source Gemma 3 270M model—a compact, 270‑million‑parameter AI that runs on as little as 2 GB RAM, supports over 140 languages, handles images, and offers strong instruction‑following performance, making it ideal for edge devices and custom fine‑tuning.

Gemma 3Google AIModel Optimization

0 likes · 5 min read

Google Unveils Gemma 3 270M: A Tiny, High‑Efficiency Open‑Source AI Model

Data Thinking Notes

Jul 30, 2025 · Artificial Intelligence

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

This article reviews the most influential papers in large language model research since 2017, covering foundational works such as the Transformer, GPT‑3, BERT, scaling laws, and recent innovations like FlashAttention, Mamba, and QLoRA, highlighting their core contributions and impact on AI development.

AI researchModel OptimizationTransformer

0 likes · 28 min read

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

AI Algorithm Path

Jul 13, 2025 · Artificial Intelligence

How to Calculate the Right AI Model Size for Your PC (3B, 7B, 13B)

This article explains how to estimate the GPU memory required for running large language models of 3 B, 7 B, and 13 B parameters, walks through step‑by‑step calculations, shows how hardware limits affect feasibility, and offers practical optimization techniques such as quantization and CPU offloading.

AI model sizingCPU offloadingFP16

0 likes · 5 min read

How to Calculate the Right AI Model Size for Your PC (3B, 7B, 13B)

JD Cloud Developers

Jun 24, 2025 · Artificial Intelligence

How JD Retail’s xLLM Architecture Revolutionizes AI Inference for E‑Commerce

At GAITC2025, JD Retail’s AI Infra lead Zhang Ke detailed the challenges of e‑commerce AI inference and introduced the xLLM edge‑cloud unified large‑model architecture, highlighting adaptive scheduling, offline unified scheduling, multi‑layer pipelines, and agent collaboration that boost performance, cut costs, and pave the way for future AI advancements.

AI inferenceModel Optimizatione-commerce

0 likes · 6 min read

How JD Retail’s xLLM Architecture Revolutionizes AI Inference for E‑Commerce

JD Retail Technology

Jun 20, 2025 · Artificial Intelligence

How JD Retail’s xLLM Architecture Revolutionizes AI Inference for E‑Commerce

The article details JD Retail’s collaboration with Tsinghua University to build the xLLM edge‑cloud unified large‑model inference framework, addressing e‑commerce AI challenges such as diverse inputs, task scheduling, model compression, and cost, while outlining future research directions and performance gains.

AI inferenceModel Optimizatione-commerce

0 likes · 7 min read

Kuaishou Tech

Jun 4, 2025 · Artificial Intelligence

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

The KwaiPilot team released the KwaiCoder‑AutoThink‑preview model, which introduces a novel automatic‑thinking training paradigm and a process‑supervised reinforcement‑learning method called Step‑SRPO, enabling the model to dynamically switch between thinking and non‑thinking modes, reduce inference cost, and achieve up to 20‑point gains on code and math benchmarks while handling large‑scale codebases.

AI researchLarge Language ModelModel Optimization

0 likes · 12 min read

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

JD Tech

May 26, 2025 · Artificial Intelligence

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

This article details how JD Retail's young algorithm engineers tackled a series of AI engineering problems—including advertising image quality assessment with multi‑reward models, large‑language‑model‑driven query expansion, FFT‑and‑RDP‑based model pruning, and agent‑centric reinforcement learning—while sharing practical growth insights and code snippets.

AIModel OptimizationQuery Expansion

0 likes · 15 min read

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

ZhongAn Tech Team

Apr 28, 2025 · Artificial Intelligence

Weekly Tech Overview: Major AI Model Updates, Industry Funding, and Expert Perspectives on AI Agents and Consciousness

This weekly technology digest highlights significant advancements in artificial intelligence, including OpenAI's GPT-4o upgrades, Tencent's Hunyuan 3D v2.5 release, and major funding rounds for xAI and Manus, alongside expert discussions on the future evolution of AI agent networks and the theoretical possibility of machine consciousness.

AI AgentsAI fundingModel Optimization

0 likes · 7 min read

Weekly Tech Overview: Major AI Model Updates, Industry Funding, and Expert Perspectives on AI Agents and Consciousness

Baidu Tech Salon

Apr 28, 2025 · Artificial Intelligence

Inside Baidu’s Wenxin 4.5 Turbo & X1 Turbo: Architecture, Training Tricks, and Real-World Impact

At the Create2025 AI Developer Conference, Baidu unveiled the multimodal Wenxin 4.5 Turbo and X1 Turbo models, detailing their innovative architecture, self‑feedback post‑training, composite reasoning chains, data pipelines, and the new Wenxin KuaiMa 3.5 code assistant, while also showcasing ecosystem growth and cultural AI applications.

AI ConferenceBaiduLarge Language Model

0 likes · 9 min read

Inside Baidu’s Wenxin 4.5 Turbo & X1 Turbo: Architecture, Training Tricks, and Real-World Impact

JD Tech

Mar 19, 2025 · Artificial Intelligence

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

This article details JD Retail's AI engine that seamlessly supports both GPU and domestic NPU hardware, describing its heterogeneous cluster architecture, unified training and inference APIs, performance optimizations, extensive model coverage, and multiple production use cases across e‑commerce, logistics, and intelligent assistance.

AI EngineGPUJD Retail

0 likes · 20 min read

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

MaGe Linux Operations

Mar 8, 2025 · Artificial Intelligence

How Cloud‑Large Models and Edge‑Small Models Can Revolutionize AI Deployment

The article explains why combining powerful cloud AI models with lightweight edge models is essential for overcoming compute‑cost trade‑offs, privacy constraints, and scenario gaps, and provides a four‑step guide, real‑world case studies, and future directions for collaborative AI deployment.

AI DeploymentModel Optimizationcloud AI

0 likes · 8 min read

How Cloud‑Large Models and Edge‑Small Models Can Revolutionize AI Deployment

Meituan Technology Team

Mar 6, 2025 · Artificial Intelligence

INT8 Quantization and Inference Optimization of DeepSeek R1 Model

Meituan’s search and recommendation team converted the FP8‑only DeepSeek‑R1 model to INT8 by first casting weights to BF16 and then applying block‑wise or channel‑wise quantization, which preserves GSM8K and MMLU accuracy while delivering 33% to 50% higher throughput on A100‑80G GPUs, and they released the SGLang‑based inference scripts and quantized weights publicly, enabling deployment on older NVIDIA hardware without accuracy loss.

DeepSeek-R1GPU deploymentINT8 Quantization

0 likes · 11 min read

INT8 Quantization and Inference Optimization of DeepSeek R1 Model

Open Source Linux

Mar 5, 2025 · Artificial Intelligence

How DeepSeek‑R1 Redefines Prompt Engineering and Real‑World AI Deployment

The article analyzes DeepSeek‑R1’s low‑cost inference architecture, Chinese language optimizations, novel prompt‑engineering techniques, and the practical challenges of deploying large domestic models, offering insights into vertical AI applications and the evolving open‑source ecosystem in China.

AI DeploymentDeepSeekLarge Language Model

0 likes · 8 min read

How DeepSeek‑R1 Redefines Prompt Engineering and Real‑World AI Deployment

JD Retail Technology

Mar 4, 2025 · Artificial Intelligence

JD Retail End-to-End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Applications

JD Retail’s Nine‑Number Algorithm Platform delivers an end‑to‑end AI engine that unifies GPU and domestic NPU resources across a thousand‑card cluster, offering zero‑cost model migration, optimized training and inference pipelines, support for over 40 LLM and multimodal models, and proven business‑level performance that reduces dependence on overseas chips.

AIGPUModel Optimization

0 likes · 19 min read

JD Retail End-to-End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Applications

Software Engineering 3.0 Era

Feb 21, 2025 · Artificial Intelligence

How NSA and MoE Are Shaping the Future of Large‑Model Development

The article examines Native Sparse Attention (NSA) and Mixture‑of‑Experts (MoE) as complementary innovations that improve data quality, model architecture, and inference efficiency for large models, while also discussing their challenges and potential research directions.

Mixture of ExpertsModel OptimizationNative Sparse Attention

0 likes · 11 min read

How NSA and MoE Are Shaping the Future of Large‑Model Development

DaTaobao Tech

Feb 21, 2025 · Artificial Intelligence

AI-Powered Face Swapping for the Spring Festival Gala: System Design and Deployment

The paper details the design and deployment of an AI‑driven face‑swap platform for the 2025 CCTV Spring Festival Gala, featuring a dual‑model SDXL pipeline with ControlNet and LoRA fine‑tuning, optimized preprocessing and GPU‑specific acceleration to achieve sub‑3‑second latency at over 10 k QPS, supporting scaling, throttling, and multi‑region load balancing, and ultimately serving ten million users and generating hundreds of millions of personalized gala images.

AI EngineeringAIGCModel Optimization

0 likes · 28 min read

AI-Powered Face Swapping for the Spring Festival Gala: System Design and Deployment

Tencent Technical Engineering

Feb 14, 2025 · Artificial Intelligence

Technical Overview of DeepSeek Series Models and Innovations

The DeepSeek series introduces a refined Mixture‑of‑Experts architecture with fine‑grained expert partitioning, shared experts, and learnable load‑balancing, alongside innovations such as Group Relative Policy Optimization, Multi‑Head Latent Attention, Multi‑Token Prediction, mixed‑precision FP8 training, and the R1/R1‑Zero models that use Long‑CoT reasoning, reinforcement‑learning pipelines, and distillation to achieve OpenAI‑comparable performance at lower cost.

AIDeepSeekMixture of Experts

0 likes · 25 min read

Technical Overview of DeepSeek Series Models and Innovations

Huawei Cloud Developer Alliance

Feb 8, 2025 · Artificial Intelligence

Why DeepSeek V3 and R1 Are Redefining Low‑Cost AI: Architecture, Training Tricks, and Industry Impact

This article analyses DeepSeek's V3 and R1 models, explaining how their innovative MoE architecture, Multi‑Head Latent Attention, low‑cost training strategies, and distributed‑training optimizations deliver high‑performance large language models while reducing GPU/NPU demand and sparking industry excitement.

AI inferenceDeepSeekMixture of Experts

0 likes · 16 min read

Why DeepSeek V3 and R1 Are Redefining Low‑Cost AI: Architecture, Training Tricks, and Industry Impact

IT Architects Alliance

Feb 8, 2025 · Artificial Intelligence

Inside DeepSeek: How Its Innovative Architecture Redefines AI Performance

This article examines DeepSeek's advanced Transformer‑based architecture, dynamic routing, MoE system, multi‑stage training, efficient inference, multimodal capabilities, real‑world applications, technical challenges, and future prospects, providing a comprehensive technical analysis of the model's strengths and limitations.

AI ArchitectureDeepSeekLarge Language Model

0 likes · 15 min read

Inside DeepSeek: How Its Innovative Architecture Redefines AI Performance

Tencent Cloud Developer

Feb 6, 2025 · Artificial Intelligence

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

The article reviews DeepSeek’s V‑series papers, explaining how scaling‑law insights, Grouped Query Attention, a depth‑first design, loss‑free load balancing, multi‑token prediction and Multi‑Head Latent Attention together enable economical mixture‑of‑experts LLMs that rival closed‑source models while cutting compute and hardware costs.

DeepSeekGrouped Query AttentionMixture of Experts

0 likes · 13 min read

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

DevOps

Jan 25, 2025 · Artificial Intelligence

DeepSeek R1: An Open‑Source Large Model Matching OpenAI’s o1 at a Fraction of the Cost

DeepSeek’s newly released R1 model delivers performance comparable to OpenAI’s o1 while cutting inference costs by 90‑95%, leveraging innovative MLA and MoE architectures, low‑cost hardware training, an open‑source strategy, and a youthful, flat‑structured team that challenges the AI industry’s high‑spending model.

AI startupCost‑Efficient TrainingDeepSeek

0 likes · 12 min read

DeepSeek R1: An Open‑Source Large Model Matching OpenAI’s o1 at a Fraction of the Cost

DevOps

Dec 8, 2024 · Artificial Intelligence

Understanding Fine-Tuning in Machine Learning: Concepts, Importance, Steps, and Applications

This article explains fine‑tuning in machine learning, covering its definition, why it matters, the role of pre‑trained models, detailed step‑by‑step procedures, advantages, and diverse applications such as NLP, computer vision, speech and finance, with practical examples like face recognition and object detection.

AI ApplicationsModel Optimizationfine-tuning

0 likes · 16 min read

Understanding Fine-Tuning in Machine Learning: Concepts, Importance, Steps, and Applications

Model Perspective

Dec 5, 2024 · Artificial Intelligence

Choosing the Right Activation Function: Pros, Cons, and Best Practices

Activation functions are crucial for neural networks, providing non‑linearity, normalization, and gradient flow; this article reviews common functions such as Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Noisy ReLU, Softmax, and Swish, comparing their characteristics, advantages, drawbacks, and guidance for selecting the appropriate one.

Model Optimizationactivation functionsmachine learning

0 likes · 10 min read

Choosing the Right Activation Function: Pros, Cons, and Best Practices

Zhuanzhuan Tech

Oct 24, 2024 · Artificial Intelligence

Pre‑Ranking in Recommendation Systems: Model and Sample Optimization Practices at Zhuanzhuan Home Page

This article reviews the role of pre‑ranking in multi‑stage recommendation pipelines, compares dual‑tower and fully‑connected DNN models, discusses negative and positive sample selection strategies, and presents Zhuanzhuan's practical improvements in model architecture and traffic‑pool allocation to boost precision and diversity.

Model Optimizationdual-towerpre‑ranking

0 likes · 16 min read

Pre‑Ranking in Recommendation Systems: Model and Sample Optimization Practices at Zhuanzhuan Home Page

Tencent Advertising Technology

Oct 14, 2024 · Artificial Intelligence

Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising

This paper presents the implementation and practice of generative retrieval based on Yuan large model in Tencent Advertising, addressing three key challenges: user intent capture, model alignment in advertising domain, and high-performance platform design under ROI constraints.

Generative RetrievalHigh-performance computingModel Optimization

0 likes · 17 min read

Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising

DataFunSummit

Oct 3, 2024 · Artificial Intelligence

A Survey of Multimodal Recommendation Systems: From Background to Future Directions

This article reviews the latest academic advances in multimodal recommendation systems, covering background, system workflow, modal encoders, feature interaction (connection, fusion, filtering), feature enhancement, model optimization, and future research challenges.

AIModel Optimizationfeature enhancement

0 likes · 18 min read

A Survey of Multimodal Recommendation Systems: From Background to Future Directions

iQIYI Technical Product Team

Jul 26, 2024 · Artificial Intelligence

Optimizing Advertising Feature Evaluation Process with the Opal Machine Learning Platform

By migrating iQIYI’s advertising feature‑evaluation workflow to the Opal machine‑learning platform, the team replaced a manual, engineer‑heavy process with a unified, automated pipeline that cut evaluation cycles from five days to 1.5 days, tripling iteration speed while lowering barriers and improving consistency for future feature optimization.

Feature EvaluationModel OptimizationOpal Platform

0 likes · 6 min read

Optimizing Advertising Feature Evaluation Process with the Opal Machine Learning Platform

Kuaishou Tech

Jul 17, 2024 · Artificial Intelligence

Key Technical Innovations in Kuaishou’s “Kuaiyi” Large Model and Its Real-World Applications

The article details Kuaishou’s development of the 175B “Kuaiyi” multimodal large model, presenting eight novel technical innovations—from Temporal Scaling Law and MiLe Loss to MoE‑enhanced reward modeling—and describes how these advances enable high‑performance AI services such as the AI Xiao Kuai chatbot across diverse real‑world scenarios.

AI ApplicationsLarge Language ModelModel Optimization

0 likes · 12 min read

Key Technical Innovations in Kuaishou’s “Kuaiyi” Large Model and Its Real-World Applications

360 Smart Cloud

Jul 4, 2024 · Artificial Intelligence

Optimizing Mixture-of-Experts (MoE) Training with the QLM Framework

This article introduces the background and challenges of large language model training, explains the Mixture-of-Experts (MoE) architecture, and details several optimization techniques implemented in the QLM framework—including fine-grained and shared experts, top‑k gating, token distribution, expert parallelism, and grouped GEMM – to improve training efficiency and performance.

AIMixture of ExpertsModel Optimization

0 likes · 10 min read

Optimizing Mixture-of-Experts (MoE) Training with the QLM Framework

Bilibili Tech

Jun 14, 2024 · Artificial Intelligence

Technical Report on the Index-1.9B Series: Model Variants, Pre‑training Optimizations, and Alignment Experiments

The report presents the open‑source Index‑1.9B family—base, pure, chat, and character variants—detailing benchmark results, pre‑training optimizations such as a normalized LM‑Head and deeper‑slim architectures, the importance of modest instruction data, alignment via SFT/DPO, role‑play enhancements with RAG, and acknowledges remaining safety and factual limitations.

EvaluationInstruction TuningLLM

0 likes · 15 min read

Technical Report on the Index-1.9B Series: Model Variants, Pre‑training Optimizations, and Alignment Experiments

Baobao Algorithm Notes

May 9, 2024 · Artificial Intelligence

Inside Deepseek‑V2: How Multi‑Head Latent Attention Cuts KV‑Cache and Boosts Performance

This article provides an in‑depth technical analysis of Deepseek‑V2, covering its 236B parameter size, Multi‑Head Latent Attention optimization that reduces KV‑cache memory, architectural details, training pipelines, infrastructure choices, and performance results on benchmarks such as MMLU and instruction following.

AI ArchitectureDeepSeekLarge Language Model

0 likes · 17 min read

Inside Deepseek‑V2: How Multi‑Head Latent Attention Cuts KV‑Cache and Boosts Performance

Baidu Geek Talk

Mar 27, 2024 · Industry Insights

How Baidu’s Qianfan Platform Is Accelerating Enterprise AI Adoption

The article reviews Baidu’s Qianfan AI platform, highlighting rapid large‑model advances, enterprise challenges, new AppBuilder features, lightweight model releases, and cost‑effective model routing that together aim to boost AI adoption across industries.

AIEnterprise AIIndustry Trends

0 likes · 16 min read

How Baidu’s Qianfan Platform Is Accelerating Enterprise AI Adoption

Ximalaya Technology Team

Feb 20, 2024 · Artificial Intelligence

Optimization of Deep Learning-Based CTR Models in Advertising

This report presents recent advances in optimizing deep learning click‑through‑rate models for advertising, including improved embedding mechanisms, novel feature‑interaction and architecture designs such as attention‑based behavior sequencing, multi‑tower and Mixture‑of‑Experts networks, dynamic ID handling, hourly updates, incremental training, and outlines future multi‑modal and embedding‑importance research.

CTR modelEmbedding TechniquesModel Optimization

0 likes · 13 min read

Optimization of Deep Learning-Based CTR Models in Advertising

Sohu Tech Products

Jan 3, 2024 · Artificial Intelligence

OPPO Advertising Recall Algorithm: Architecture, Model Selection, Evaluation, and Optimization

OPPO revamped its advertising recall system by replacing a latency‑prone directional pipeline with an ANN‑based full‑ad personalized architecture, employing a dual‑tower LTR model, multi‑path auxiliary branches, refined offline metrics, price‑sensitive and hard‑negative sampling, and hybrid joint training, which together boosted ARPU by about 15%.

AdvertisingModel Optimizationlarge-scale classification

0 likes · 24 min read

OPPO Advertising Recall Algorithm: Architecture, Model Selection, Evaluation, and Optimization

DataFunSummit

Dec 3, 2023 · Artificial Intelligence

Shopee Live Personalized CTR Optimization via Calibration‑Based Meta‑Learning

This article presents Shopee's calibration‑based meta‑learning approach for personalized click‑through‑rate prediction in live streaming, detailing business context, modeling goals, model evolution from Calibration4CVR to CBMR, EmbCB and MlpCB optimizations, and multi‑task and multi‑scene extensions that achieve significant AUC and business metric improvements.

CTRModel OptimizationShopee

0 likes · 11 min read

Shopee Live Personalized CTR Optimization via Calibration‑Based Meta‑Learning

Baidu Tech Salon

Nov 10, 2023 · Artificial Intelligence

Baidu Search Deep Learning Model Architecture and Optimization Practices

Baidu's Search Architecture team details how its deep‑learning models have evolved to deliver direct answer results via semantic embeddings, describes a massive online inference pipeline that rewrites queries, ranks relevance, and classifies types, and outlines optimization techniques—including data I/O, CPU/GPU balancing, pruning, quantization, and distillation—to achieve high‑throughput, low‑latency search.

BaiduGPU OptimizationInference System

0 likes · 13 min read

Baidu Search Deep Learning Model Architecture and Optimization Practices

Baidu Geek Talk

Nov 9, 2023 · Artificial Intelligence

Deep Learning Model Architecture Evolution in Baidu Search

The article chronicles Baidu Search’s Model Architecture Group’s evolution of deep‑learning‑driven search, detailing the shift from inverted‑index to semantic vector indexing, the use of transformer‑based models for text and image queries, large‑scale offline/online pipelines, and extensive GPU‑centric optimizations such as pruning, quantization and distillation, all aimed at delivering precise, cost‑effective results to hundreds of millions of users.

ERNIEGPU inferenceModel Optimization

0 likes · 14 min read

Deep Learning Model Architecture Evolution in Baidu Search

DaTaobao Tech

Sep 11, 2023 · Artificial Intelligence

Large Language Model Upgrade Paths and Architecture Selection

This article analyzes upgrade paths of major LLMs—ChatGLM, LLaMA, Baichuan—detailing performance, context length, and architectural changes, then examines essential capabilities, data cleaning, tokenizer and attention design, and offers practical guidance for balanced scaling and efficient model construction.

BaichuanChatGLMData preprocessing

0 likes · 32 min read

Large Language Model Upgrade Paths and Architecture Selection

NetEase Media Technology Team

Aug 9, 2023 · Artificial Intelligence

GPU Model Inference Optimization Practices in NetEase News Recommendation System

The article outlines practical GPU inference optimization for NetEase’s news recommendation, covering model analysis with Netron, multi‑GPU parallelism, memory‑copy reduction, batch sizing, TensorRT conversion and tuning, custom plugins, and the GRPS serving framework to achieve significant latency and utilization gains.

GPU inferenceModel OptimizationProfiling

0 likes · 44 min read

GPU Model Inference Optimization Practices in NetEase News Recommendation System

DataFunTalk

Aug 9, 2023 · Artificial Intelligence

Key Technologies for Domain‑Specific Large Models: Insights from the World AI Conference

This report, based on Professor Xiao Yanghua’s presentation at the World AI Conference, examines why vertical domains need general large models, outlines their key capabilities such as open‑world understanding, combinatorial innovation, evaluation, complex instruction execution, task planning, and symbolic reasoning, and discusses current limitations and optimization strategies for domain‑specific deployment.

AI evaluationModel Optimizationlarge language models

0 likes · 17 min read

Key Technologies for Domain‑Specific Large Models: Insights from the World AI Conference

DataFunSummit

Jun 28, 2023 · Artificial Intelligence

OPPO's CHAOS Pretrained Large Model and GammaE Knowledge‑Graph Multi‑hop Reasoning: Techniques and Insights

This article presents OPPO Research Institute's recent advances in large‑model AI, detailing the CHAOS pretrained model that topped the CLUE leaderboard, the knowledge‑enhanced training pipeline, and the GammaE model for multi‑hop reasoning over knowledge graphs, together with experimental results and practical training tips.

AI researchGammaEKnowledge Graph

0 likes · 20 min read

OPPO's CHAOS Pretrained Large Model and GammaE Knowledge‑Graph Multi‑hop Reasoning: Techniques and Insights

Bilibili Tech

Jun 13, 2023 · Artificial Intelligence

InferX Inference Framework and Its Integration with Triton for High‑Performance AI Model Serving

Bilibili’s self‑developed InferX framework, combined with NVIDIA Triton Inference Server, streamlines AI model serving by adding quantization, structured sparsity, and custom kernels, delivering up to eight‑fold throughput gains, cutting GPU usage by half, and enabling faster, cost‑effective OCR and large‑model deployments.

AI inferenceGPU UtilizationInferX

0 likes · 10 min read

InferX Inference Framework and Its Integration with Triton for High‑Performance AI Model Serving

DataFunTalk

Apr 25, 2023 · Artificial Intelligence

DAMO-YOLO: An Efficient Target Detection Framework with NAS, Multi‑Scale Fusion, and Full‑Scale Distillation

This article introduces DAMO‑YOLO, a high‑performance object detection framework that combines low‑cost model customization via MAE‑NAS, an Efficient RepGFPN with HeavyNeck for superior multi‑scale detection, and a full‑scale distillation technique, delivering faster inference, lower FLOPs, and higher accuracy across diverse industrial scenarios.

DistillationModel OptimizationNAS

0 likes · 15 min read

DAMO-YOLO: An Efficient Target Detection Framework with NAS, Multi‑Scale Fusion, and Full‑Scale Distillation

Alibaba Cloud Big Data AI Platform

Mar 21, 2023 · Artificial Intelligence

How We Tripled CTR Model Training Speed in the Alibaba‑Intel DeepRec Challenge

The MetaSpore team detailed a three‑pronged optimization—sparse model tuning, training‑pipeline acceleration, and low‑level framework tweaks—that boosted DeepRec CTR model training efficiency by over three times without sacrificing AUC, securing first place in the global AI competition.

AI competitionCTRDeepRec

0 likes · 9 min read

How We Tripled CTR Model Training Speed in the Alibaba‑Intel DeepRec Challenge

Alibaba Cloud Big Data AI Platform

Mar 9, 2023 · Artificial Intelligence

How We Won the DeepRec CTR Contest: 36% Faster Training with Operator Tweaks

The NicePerf team, after clinching the top spot in the Tianchi DeepRec CTR model performance competition, shares a detailed walkthrough of their CPU‑only training optimizations—including operator selection, custom C++ kernels, and workflow tweaks—that cut overall training time by over a third.

CPU trainingDIENDeepRec

0 likes · 9 min read

How We Won the DeepRec CTR Contest: 36% Faster Training with Operator Tweaks

Bilibili Tech

Feb 28, 2023 · Artificial Intelligence

High‑Quality Automatic Speech Recognition (ASR) Solutions at Bilibili: Data, Model, and Deployment Optimizations

Bilibili’s high‑quality ASR system combines large‑scale filtered business data, semi‑supervised Noisy‑Student training, an end‑to‑end CTC model with lattice‑free MMI decoding, and FP16‑optimized FasterTransformer inference on Triton, delivering top‑ranked accuracy, low latency, and scalable deployment for diverse Chinese‑English video content.

ASRBilibiliEnd-to-End

0 likes · 18 min read

High‑Quality Automatic Speech Recognition (ASR) Solutions at Bilibili: Data, Model, and Deployment Optimizations

58 Tech

Jan 12, 2023 · Artificial Intelligence

Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results

This article presents a comprehensive overview of the Efficient Conformer model for large‑scale end‑to‑end speech recognition, detailing its architectural improvements such as progressive downsampling and grouped multi‑head self‑attention, the PyTorch implementation in WeNet, streaming inference handling, experimental CER gains on AISHELL‑1 and production data, and future development plans.

ASREfficient ConformerModel Optimization

0 likes · 16 min read

Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results

Baidu Geek Talk

Jan 5, 2023 · Artificial Intelligence

How Baidu’s AIAK‑Inference Supercharges AI Model Inference on GPUs

This article provides an end‑to‑end analysis of AI inference bottlenecks, reviews common industry acceleration techniques, and details Baidu Intelligent Cloud’s AIAK‑Inference suite—including its architecture, optimization strategies such as model pruning, operator fusion, and single‑operator tuning—followed by a demo showing significant latency reductions on ResNet‑50 and other models.

AI inferenceAIAK-InferenceBaidu Cloud

0 likes · 16 min read

How Baidu’s AIAK‑Inference Supercharges AI Model Inference on GPUs

DataFunTalk

Jan 2, 2023 · Artificial Intelligence

Tail Traffic Modeling and Data‑Driven Risk Strategies at 360 Shuke

This article presents 360 Shuke's practical approach to modeling low‑volume (tail) credit traffic using accumulated data, covering the characteristics of tail traffic, sample expansion under low approval rates, timeliness‑based data clustering, and ranking optimization for high‑quality head customers.

Data ClusteringModel Optimizationrisk modeling

0 likes · 19 min read

Tail Traffic Modeling and Data‑Driven Risk Strategies at 360 Shuke

Baidu Intelligent Cloud Tech Hub

Dec 27, 2022 · Artificial Intelligence

How to Supercharge AI Inference: End‑to‑End Acceleration Strategies and Baidu’s AIAK‑Inference

This article presents a comprehensive analysis of AI inference bottlenecks, explores industry acceleration techniques such as model simplification, operator fusion, and single‑operator optimization, and details Baidu Cloud's AIAK‑Inference suite with practical demos showing up to 90% latency reduction.

AI inferenceAIAK-InferenceBaidu Cloud

0 likes · 16 min read

How to Supercharge AI Inference: End‑to‑End Acceleration Strategies and Baidu’s AIAK‑Inference

Alipay Experience Technology

Dec 8, 2022 · Artificial Intelligence

How xNN Revolutionizes Edge AI with Scalable Modeling and Optimization

This article explains the evolution of Ant Group's xNN edge‑AI framework, detailing its four‑layer model‑optimization space, the lightweight modeling of version 1.0, and the transition to scalable modeling in version 2.0 to better exploit fragmented device compute resources.

Model Optimizationdeep learningedge AI

0 likes · 21 min read

How xNN Revolutionizes Edge AI with Scalable Modeling and Optimization

21CTO

Sep 26, 2022 · Artificial Intelligence

Unlocking Live-Streaming Recommendations: Strategies from Tencent Music’s Interactive Systems

This article explores the evolution of recommendation systems for interactive live‑streaming scenarios, covering common system traits, user cold‑start solutions, prior knowledge modeling, scene‑specific modeling, and practical Q&A insights drawn from Tencent Music’s real‑world deployments.

AILive StreamingModel Optimization

0 likes · 19 min read

Unlocking Live-Streaming Recommendations: Strategies from Tencent Music’s Interactive Systems

DaTaobao Tech

Sep 7, 2022 · Artificial Intelligence

Online Deep Learning (ODL) Model Optimization for Real‑Time Recommendation

The team enhanced real‑time recommendation by redesigning TensorFlow graphs—using constant‑folding, a custom CallGraphOP cache, a simplified dense layer, and CUDA‑Graph compatibility—boosting single‑machine throughput ~40%, raising GPU utilization from 30% to 43%, cutting latency and saving roughly 30% of hardware resources.

CUDA GraphGPU performanceModel Optimization

0 likes · 11 min read

Online Deep Learning (ODL) Model Optimization for Real‑Time Recommendation

Alibaba Terminal Technology

Jun 22, 2022 · Artificial Intelligence

How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization

This article examines the computational capabilities of modern mobile devices for machine learning, compares training times on a MacBook and iPhone, explains model evaluation metrics like FLOPs, and provides step‑by‑step guides for converting and optimizing models using TensorFlow, PyTorch, ONNX, JAX, and TVM for edge deployment.

JAXModel OptimizationTVM

0 likes · 29 min read

How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization

DataFunTalk

May 25, 2022 · Artificial Intelligence

Optimizing E-commerce Product Copy Generation: Challenges, Framework, and System Practices

This article presents a comprehensive overview of the challenges in e‑commerce product copy generation, introduces a unified framework comprising a copy generation system, a copy‑cleaning subsystem, and a quality evaluation module, and details practical optimization techniques applied to short and long copy scenarios.

AIModel OptimizationText Generation

0 likes · 17 min read

Optimizing E-commerce Product Copy Generation: Challenges, Framework, and System Practices

DaTaobao Tech

May 18, 2022 · Artificial Intelligence

Deep Ranking Optimization for E-commerce Recommendation

The 2021 Taobao New‑Product team boosted e‑commerce recommendation by redesigning the coarse‑ranking stage with a dual‑tower DSSM, low‑cost feature‑crossing, NOVA attention and multi‑task distillation from a fine‑ranking teacher, delivering up to +30‰ GAUC gain and 3‑5 % online CTR and click improvements.

Model Optimizationdeep rankinge-commerce

0 likes · 17 min read

Deep Ranking Optimization for E-commerce Recommendation

Baidu Geek Talk

Apr 28, 2022 · Artificial Intelligence

How AI Powers Financial Form Automation and Insurance Q&A: Open‑Source Solutions

This article presents open‑source AI solutions for financial form recognition and insurance smart Q&A, detailing the challenges, model choices, optimization strategies, performance results, and deployment methods using PaddleOCR, PaddleNLP, LayoutXLM, RocketQA and SimCSE.

AIFinTechForm Recognition

0 likes · 10 min read

How AI Powers Financial Form Automation and Insurance Q&A: Open‑Source Solutions

DaTaobao Tech

Apr 26, 2022 · Artificial Intelligence

Optimization of Recall, Ranking, and Downward Modeling for the "Every Square Every House" Infinite-Scroll Light App

This article details a year‑long series of experiments on the Taobao “Every Square Every House” infinite‑scroll light app, describing how added recall paths, a coarse‑ranking filter, multi‑task MMOE sorting, a lightweight down‑scroll predictor, and relevance‑enhanced features together boosted click‑through, scroll depth and per‑user engagement by double‑digit percentages.

A/B testingModel OptimizationMulti-Task Learning

0 likes · 14 min read

Optimization of Recall, Ranking, and Downward Modeling for the "Every Square Every House" Infinite-Scroll Light App

Tencent Cloud Developer

Apr 20, 2022 · Artificial Intelligence

Coarse Ranking in Recommendation Systems: Architecture, Models, and Optimization

Coarse ranking bridges recall and fine ranking by trimming tens of thousands of candidates to a few hundred or thousand using a three‑part framework—sample construction, ordinary and cross‑feature engineering, and evolving deep models—from rule‑based to lightweight MLPs, while employing distillation, feature crossing, pruning, quantization, and bias mitigation to balance accuracy with strict latency constraints.

Model OptimizationRecommendation Systemsartificial-intelligence

0 likes · 9 min read

Coarse Ranking in Recommendation Systems: Architecture, Models, and Optimization

Tencent Cloud Developer

Mar 15, 2022 · Artificial Intelligence

Comprehensive Overview of Ranking Models in Recommendation Systems

The article provides a thorough guide to ranking in recommendation systems, detailing the pipeline architecture, sample handling challenges, extensive feature engineering categories, the evolution from collaborative filtering to deep and attention‑based models, and key optimization trade‑offs between memorization, generalization, and efficient user‑interest modeling.

CTR PredictionModel OptimizationRanking

0 likes · 19 min read

Comprehensive Overview of Ranking Models in Recommendation Systems

Baidu Geek Talk

Mar 9, 2022 · Artificial Intelligence

Communication Tower Recognition Using PaddlePaddle: An Industrial AI Practice

The article describes an industrial AI system that uses PaddlePaddle’s PP‑PicoDet model, enhanced with COCO pre‑training and quantization, to accurately recognize communication towers in diverse outdoor conditions, achieving 94.5% mAP at 78 ms inference and supporting edge deployment via PaddleLite and ONNX.

Edge deploymentModel OptimizationPP-PicoDet

0 likes · 6 min read

Communication Tower Recognition Using PaddlePaddle: An Industrial AI Practice

DataFunTalk

Jan 26, 2022 · Artificial Intelligence

Exploring and Practicing Generative Chat in OPPO's XiaoBu Assistant

This article presents a comprehensive overview of OPPO's XiaoBu Assistant, detailing its research background, chat skill architecture, evolution from retrieval and rule‑based methods to generative models, industry model comparisons, decoding and ranking strategies, safety mechanisms, performance optimizations, and evaluation results.

ChatbotDialogue SystemsGenerative AI

0 likes · 20 min read

Exploring and Practicing Generative Chat in OPPO's XiaoBu Assistant

ByteDance Terminal Technology

Nov 9, 2021 · Artificial Intelligence

Edge AI Video Preloading: Case Study and Implementation with ByteDance's Client AI Platform

This article presents a comprehensive case study of applying edge AI to video preloading on the Xigua Video platform, detailing scenario analysis, predictive modeling of user behavior, feature engineering, on‑device model inference, dynamic algorithm package deployment, experimental evaluation, and the resulting performance and cost improvements.

A/B testingModel Optimizationclient inference

0 likes · 18 min read

Edge AI Video Preloading: Case Study and Implementation with ByteDance's Client AI Platform

iQIYI Technical Product Team

Nov 5, 2021 · Artificial Intelligence

Accelerating 4K Video Super‑Resolution with TensorRT: iQIYI’s Optimization and Production Practices

iQIYI optimized a 4K video super-resolution model using TensorRT, employing split of graph, operator fusion, custom CUDA kernels, and int8 quantization, achieving tenfold speedup (≈180 ms per 1080p frame) and demonstrating deep customization potential for large‑scale production.

INT8 QuantizationModel OptimizationTensorRT

0 likes · 17 min read

Accelerating 4K Video Super‑Resolution with TensorRT: iQIYI’s Optimization and Production Practices

DataFunTalk

Oct 22, 2021 · Artificial Intelligence

Applying AI Techniques to Credit Reporting and Risk Modeling: Model Structure, Pre‑training, Ranking and Interpretability

This article presents a comprehensive overview of how AI technologies are applied to credit reporting and loan risk modeling, detailing data characteristics, end‑to‑end model architectures, pre‑training strategies, risk‑ranking methods, and interpretability techniques for financial risk assessment.

AIModel OptimizationRisk Ranking

0 likes · 17 min read

Applying AI Techniques to Credit Reporting and Risk Modeling: Model Structure, Pre‑training, Ranking and Interpretability

DataFunSummit

Oct 22, 2021 · Artificial Intelligence

Applying AI Techniques to Credit Reporting and Risk Modeling

This article presents a comprehensive overview of how AI technologies are applied to credit reporting, covering data characteristics, end‑to‑end model architectures, pre‑training strategies, risk ranking objectives, and interpretability methods to improve financial risk assessment.

AIModel Optimizationcredit risk

0 likes · 16 min read

Applying AI Techniques to Credit Reporting and Risk Modeling

58 Tech

Sep 24, 2021 · Artificial Intelligence

58.com AI Algorithm Competition: Award Ceremony, Top Teams, and Solution Sharing

The 58.com AI algorithm competition showcased over 210 teams competing to improve job recommendation click‑through and conversion rates, featured an award ceremony with speeches, highlighted the ten winning teams, and presented detailed solution shares—including tree models, feature‑engineering techniques, and deep‑learning approaches—while offering GPU resources on the WPAI platform for continued participation.

AI competitionCTR PredictionModel Optimization

0 likes · 10 min read

58.com AI Algorithm Competition: Award Ceremony, Top Teams, and Solution Sharing

HaoDF Tech Team

Sep 15, 2021 · Artificial Intelligence

Optimizing Question‑Answer Search Similarity in Haodf Online: A Semantic Similarity Model Case Study

This article describes how Haodf Online improved its medical question‑answer search by analyzing search challenges, adopting semantic similarity models based on pre‑trained language embeddings, designing contrastive training tasks, and evaluating the resulting increase in click‑through rate and user engagement.

Model Optimizationmedical AInatural language processing

0 likes · 12 min read

Optimizing Question‑Answer Search Similarity in Haodf Online: A Semantic Similarity Model Case Study

Meituan Technology Team

Sep 9, 2021 · Artificial Intelligence

GPU Optimization Practices for CTR Models at Meituan

Meituan accelerates CTR model inference by fusing operators with TVM, optimizing CPU‑GPU data transfers, manually tuning high‑frequency subgraphs, and dynamically offloading workloads, achieving up to ten‑fold throughput gains on Tesla T4 GPUs while keeping latency stable and only modestly increasing beyond 128 QPS, though compilation remains slow and large‑model support needs improvement.

CTRGPUModel Optimization

0 likes · 16 min read

GPU Optimization Practices for CTR Models at Meituan

Alimama Tech

Sep 8, 2021 · Artificial Intelligence

Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening

Alibaba Mama’s advertising platform modernized its Tree‑based Deep Model by introducing a dual‑tower full‑library DNN with aggressive pre‑filtering and custom GPU TopK kernels, and a flattened‑tree model that retains beam search with multi‑head attention, while applying memory‑aware tricks such as attention swapping, softmax approximation, tiled‑matmul splitting, TensorCore batching, INT8 quantization and cache‑resident ad vectors, enabling multi‑fold latency reductions with minimal recall loss.

Beam SearchGPU AccelerationModel Optimization

0 likes · 15 min read

Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening

Baidu Geek Talk

Sep 8, 2021 · Artificial Intelligence

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

The article provides a comprehensive technical overview of PaddleOCR's PP‑OCRv2, detailing its five major algorithmic enhancements, performance improvements over previous versions, historical milestones, core capabilities, and links to the open‑source repositories for developers interested in state‑of‑the‑art OCR solutions.

Data AugmentationModel OptimizationOCR

0 likes · 10 min read

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

58 Tech

Jul 7, 2021 · Artificial Intelligence

Multi‑Objective Modeling for CRM Opportunity Allocation: Iterative Deep Learning Approaches

This article details the development and iterative optimization of multi‑task deep learning models—including XGBoost‑based baselines, MMoE, ESMM‑enhanced MMoE, PLE, and bias‑aware ranking—to simultaneously improve call‑out and connect‑out rates in a CRM opportunity distribution system, presenting offline gains and online deployment results for each version.

CRMModel OptimizationMulti-Task Learning

0 likes · 33 min read

Multi‑Objective Modeling for CRM Opportunity Allocation: Iterative Deep Learning Approaches

DataFunTalk

Jun 4, 2021 · Artificial Intelligence

Advances in Ranking Algorithms for the "Good Goods" Recommendation Scenario

This article presents a comprehensive overview of recent advancements in ranking algorithms for the Good Goods recommendation scenario, covering long‑sequence modeling, category‑retrieval attention, multi‑objective ranking, model structure optimizations, loss functions, and LTR techniques, along with experimental results and practical insights.

LTRModel OptimizationRanking

0 likes · 13 min read

Advances in Ranking Algorithms for the "Good Goods" Recommendation Scenario

AntTech

Apr 13, 2021 · Artificial Intelligence

Ant Financial’s ZhiXiaoBao Team Achieves Human-Level Scores on SQuAD 2.0 and Advances Machine Reading Comprehension

The ZhiXiaoBao technical team at Ant Financial broke the SQuAD 2.0 leaderboard with a model that surpasses human performance, detailing the challenges of natural‑language understanding, the specific ranking and data‑augmentation techniques they employed, and the broader impact on fintech knowledge‑base automation and future AI research.

FinTechKnowledge BaseModel Optimization

0 likes · 9 min read

Ant Financial’s ZhiXiaoBao Team Achieves Human-Level Scores on SQuAD 2.0 and Advances Machine Reading Comprehension

DataFunTalk

Apr 5, 2021 · Artificial Intelligence

Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge

The article reviews the Chinese NLP pre‑training model generalization competition, detailing data preprocessing, augmentation, external data usage, model scaling and architecture tweaks, loss functions, learning‑rate and adversarial training strategies, regularization techniques, post‑processing optimizations, and ineffective methods, highlighting their impact on performance metrics.

Data AugmentationLoss FunctionsModel Optimization

0 likes · 15 min read

Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge

58 Tech

Mar 24, 2021 · Artificial Intelligence

Automated Detection of Illegal Watermarks in Images Using Deep Learning at 58.com

This article describes how 58.com built an end‑to‑end deep‑learning watermark detection service, covering business needs, data collection and augmentation, model selection and iterative improvements (Faster‑RCNN, SSD, YOLOv3, anchor‑free methods), deployment results, and future research directions.

Image ModerationModel Optimizationcomputer vision

0 likes · 14 min read

Automated Detection of Illegal Watermarks in Images Using Deep Learning at 58.com

DataFunTalk

Mar 17, 2021 · Artificial Intelligence

Deep Ranking Model Evolution and Applications in Taobao Live: DBMTL, DMR, and RUI Ranking

This article presents a comprehensive overview of Taobao Live's deep ranking system evolution, detailing the DBMTL multi‑task learning framework, the two‑tower DMR matching‑ranking architecture, and the RUI Ranking refer‑item model, together with their offline formulas, online deployment scenarios, and measured performance gains across click‑through, watch‑time, and conversion metrics.

AIModel OptimizationMulti-Task Learning

0 likes · 27 min read

Deep Ranking Model Evolution and Applications in Taobao Live: DBMTL, DMR, and RUI Ranking

360 Tech Engineering

Mar 1, 2021 · Artificial Intelligence

Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search

This article details the engineering challenges of serving a large BERT model in real‑time for 360 Search and describes a series of optimizations—including TensorRT‑based kernel fusion, model quantization, knowledge distillation, multi‑stream execution, caching, and dynamic sequence handling—that together achieve low latency, high throughput, and stable deployment on GPU clusters.

BERTGPUModel Optimization

0 likes · 10 min read

Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search

DataFunTalk

Feb 10, 2021 · Artificial Intelligence

Deep Learning Based Search Ranking Optimization for 58.com Rental Services

This article describes how 58.com’s rental platform leverages deep learning models such as Wide&Deep, DeepFM, DCN, DIN, and DIEN to improve search ranking, detailing data pipelines, feature engineering, model iteration, multi‑task training, prediction optimizations, and resulting online performance gains.

Model OptimizationMulti-Task LearningRecommendation Systems

0 likes · 27 min read

Deep Learning Based Search Ranking Optimization for 58.com Rental Services

58 Tech

Jan 25, 2021 · Artificial Intelligence

Deep Learning Ranking Models for 58.com Rental Search: Architecture, Model Iterations, and Optimization

This article presents the end‑to‑end design, feature engineering, model evolution (Wide&Deep, DeepFM, DCN, DIN, DIEN), multi‑task training, and deployment optimizations that 58.com applied to improve search ranking for its rental business, demonstrating significant gains in click‑through and conversion rates.

Model OptimizationMulti-Task Learningfeature engineering

0 likes · 28 min read

Deep Learning Ranking Models for 58.com Rental Search: Architecture, Model Iterations, and Optimization

DeWu Technology

Nov 18, 2020 · Artificial Intelligence

Evolution and Technical Analysis of Dewu Photo Search

Dewu Photo Search evolved from a limited Aliyun‑based prototype to a self‑developed pipeline using EfficientNet detection and 128‑dim embeddings, boosting top‑1 shoe accuracy over 100 % and overall precision by up to 41 %, while reducing latency and improving scalability despite remaining stability challenges.

Model Optimizationdeep learningfeature extraction

0 likes · 10 min read