Tagged articles
113 articles
Page 1 of 2
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 4, 2026 · Artificial Intelligence

DeepSeek‑TUI: A Claude‑Code‑Style Terminal Agent Optimized for DeepSeek

DeepSeek‑TUI is a Rust‑based terminal coding agent modeled after Claude Code, specially tuned for DeepSeek V4, offering chain‑of‑thought streaming, a 1 M‑token context window with automatic compression, cost‑saving RLM mode, multiple operation tiers, and a rapid release cadence that has driven its popularity to over 2.3k GitHub stars.

AICoding AgentDeepSeek
0 likes · 9 min read
DeepSeek‑TUI: A Claude‑Code‑Style Terminal Agent Optimized for DeepSeek
AIWalker
AIWalker
Mar 23, 2026 · Artificial Intelligence

Dynamic Dense Computing and Minimal End‑to‑End Design: YOLO-Master & YOLO26

By introducing a dynamic mixture‑of‑experts routing scheme and an end‑to‑end architecture that eliminates NMS and DFL, YOLO‑Master and YOLO26 dramatically cut compute waste and latency on edge devices, achieving up to 43% faster CPU inference while keeping model accuracy, with all code openly released.

Computer VisionMixture of ExpertsModel Optimization
0 likes · 7 min read
Dynamic Dense Computing and Minimal End‑to‑End Design: YOLO-Master & YOLO26
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 10, 2026 · Artificial Intelligence

Why the First Token Becomes a Value Garbage Bin – LeCun Team Dissects Spike and Attention Sink Mechanics

The paper by Yann LeCun’s team reveals that massive activation spikes and attention sinks in Transformers are not inherently coupled; spikes arise from position‑0 token interactions and specific feed‑forward dynamics, while attention sinks emerge from Pre‑norm normalization and head dimension, offering practical insights for model quantization and long‑context inference.

Attention SinkLLMMassive Activations
0 likes · 9 min read
Why the First Token Becomes a Value Garbage Bin – LeCun Team Dissects Spike and Attention Sink Mechanics
AIWalker
AIWalker
Mar 7, 2026 · Artificial Intelligence

YOLO-Master v2026.02 Unveils Four Innovations for SOTA Object Detection

Tencent’s YOLO-Master v2026.02 adds a Mixture‑of‑Experts architecture, zero‑overhead LoRA fine‑tuning, Sparse SAHI inference for large images, and Cluster‑Weighted NMS, delivering 3‑5× faster inference, up to 70% reduced training resources, and markedly higher detection accuracy across diverse benchmarks.

Computer VisionLoRAMixture of Experts
0 likes · 15 min read
YOLO-Master v2026.02 Unveils Four Innovations for SOTA Object Detection
PaperAgent
PaperAgent
Jan 19, 2026 · Artificial Intelligence

How Reinforcement Learning Can Boost LLM Reasoning by Shaping Token Distributions

Recent research shows that applying reinforcement learning to large language models can dramatically improve inference performance, but its effectiveness depends on the token distribution produced during pre‑training, prompting a novel rewrite of cross‑entropy as a single‑step policy gradient with controllable entropy parameters.

LLMModel OptimizationRL
0 likes · 6 min read
How Reinforcement Learning Can Boost LLM Reasoning by Shaping Token Distributions
AI Frontier Lectures
AI Frontier Lectures
Jan 15, 2026 · Artificial Intelligence

What Makes YOLO26 the Next Leap in Edge AI Object Detection?

YOLO26, the latest Ultralytics release, introduces a unified model family with five sizes, removes distribution focal loss, offers end‑to‑end inference without NMS, adds progressive loss balancing and the MuSGD optimizer, and delivers up to 43% faster CPU performance, making it ideal for edge and real‑world vision applications.

Model OptimizationYOLO26edge AI
0 likes · 12 min read
What Makes YOLO26 the Next Leap in Edge AI Object Detection?
Architect
Architect
Jan 1, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

DeepSeek’s new paper introduces mHC, a manifold‑constrained version of Hyper‑Connections that stabilizes gradient flow, adds only 6.7% training overhead, and enables reliable training of 27‑billion‑parameter models while improving benchmark performance by about 2%.

AI ArchitectureDeep LearningLarge-Scale Training
0 likes · 7 min read
How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency
Old Meng AI Explorer
Old Meng AI Explorer
Dec 29, 2025 · Artificial Intelligence

Run 100B LLMs on a Laptop: How BitNet’s 1‑bit Quantization Makes It Possible

BitNet’s 1‑bit quantization shrinks model size and compute needs by tenfold, enabling ordinary CPUs and low‑power ARM devices to run 2B‑100B language models locally with acceptable speed, low power consumption, and near‑original quality, while providing simple installation and optional GPU acceleration.

BitNetCPU inferenceLLM quantization
0 likes · 10 min read
Run 100B LLMs on a Laptop: How BitNet’s 1‑bit Quantization Makes It Possible
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 18, 2025 · Artificial Intelligence

How to Build a Real‑Time AI‑Powered Anime‑Style Video Generator for Social Apps

This technical report details the end‑to‑end workflow for integrating an AIGC video generation module into a social app, covering requirement analysis, model and hardware selection, dataset construction, LoRA and full‑parameter training, multiple acceleration techniques such as Sage Attention, TeaCache, XDiT, gradient‑checkpointing offload, tiled VAE, and quantization, followed by extensive performance evaluation and metric‑based ranking of the final models.

AI video generationLoRA fine-tuningModel Optimization
0 likes · 38 min read
How to Build a Real‑Time AI‑Powered Anime‑Style Video Generator for Social Apps
Data Party THU
Data Party THU
Dec 10, 2025 · Artificial Intelligence

How DeepSeek‑V3.2 Cuts Inference Cost and Boosts Agent Skills with Sparse Attention

DeepSeek's V3.2 release introduces a dual‑model lineup, a Sparse Attention architecture that halves long‑context inference cost, a post‑training reinforcement‑learning pipeline that exceeds 10% of pre‑training compute, and a revamped agent framework that dramatically improves tool‑use and reasoning performance across benchmarks.

Agentic AIDeepSeekModel Optimization
0 likes · 11 min read
How DeepSeek‑V3.2 Cuts Inference Cost and Boosts Agent Skills with Sparse Attention
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 7, 2025 · Artificial Intelligence

Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings

This article critically examines recent RL‑for‑LLM studies, revealing that reinforcement learning improves search efficiency but does not extend the intrinsic reasoning capabilities of base models, and explores the underlying model‑conditioned optimization bias, comparisons with SFT distillation, and the trade‑off with catastrophic forgetting.

Catastrophic ForgettingLLMModel Optimization
0 likes · 11 min read
Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 5, 2025 · Artificial Intelligence

How TinyAI Brings a Full‑Stack AI Framework to Pure Java

TinyAI is a completely Java‑implemented, lightweight full‑stack AI framework that demonstrates how to build a production‑grade deep‑learning system—from low‑level numeric tensors and automatic differentiation to modular neural‑network layers, training pipelines, large‑language‑model implementations, and intelligent agent architectures—while remaining education‑friendly and free of external dependencies.

AI FrameworkAgent SystemCode Examples
0 likes · 33 min read
How TinyAI Brings a Full‑Stack AI Framework to Pure Java
Data Party THU
Data Party THU
Oct 18, 2025 · Artificial Intelligence

Can Classic Graph Autoencoders Rival SOTA? Surprising Optimizations Reveal Their Power

Researchers from Peking University demonstrate that, by applying modern optimization techniques to the decades‑old Graph Autoencoder (GAE), the model can achieve state‑of‑the‑art link‑prediction performance on benchmarks like ogbl‑ppa, while delivering orders‑of‑magnitude speed improvements, challenging the trend toward ever‑more complex GNNs.

Model Optimizationefficiencygraph autoencoder
0 likes · 10 min read
Can Classic Graph Autoencoders Rival SOTA? Surprising Optimizations Reveal Their Power
JD Tech Talk
JD Tech Talk
Sep 11, 2025 · Artificial Intelligence

How to Seamlessly Migrate AI Workloads from Nvidia GPUs to Domestic Accelerators

This article explains why migrating AI applications from Nvidia GPUs to domestic graphics cards is urgent, outlines the technical challenges, and introduces JoyScale’s zero‑perception migration stack that enables end‑to‑end hardware, software, and model adaptation for reliable, high‑performance AI deployment.

AI migrationJoyScaleModel Optimization
0 likes · 11 min read
How to Seamlessly Migrate AI Workloads from Nvidia GPUs to Domestic Accelerators
Data Party THU
Data Party THU
Aug 18, 2025 · Artificial Intelligence

Unlock XGBoost Performance: Master the Core Parameters

This article provides a detailed, visual guide to XGBoost's most important hyper‑parameters—such as max_depth, min_child_weight, learning_rate, gamma, subsample, colsample_bytree, scale_pos_weight, alpha, and lambda—explaining how each influences tree complexity, regularization, and model generalization, and offering practical examples for effective tuning.

Model OptimizationRegularizationXGBoost
0 likes · 12 min read
Unlock XGBoost Performance: Master the Core Parameters
DevOps
DevOps
Aug 16, 2025 · Artificial Intelligence

Google Unveils Gemma 3 270M: A Tiny, High‑Efficiency Open‑Source AI Model

Google has released the open‑source Gemma 3 270M model—a compact, 270‑million‑parameter AI that runs on as little as 2 GB RAM, supports over 140 languages, handles images, and offers strong instruction‑following performance, making it ideal for edge devices and custom fine‑tuning.

Gemma 3Google AIModel Optimization
0 likes · 5 min read
Google Unveils Gemma 3 270M: A Tiny, High‑Efficiency Open‑Source AI Model
Data Thinking Notes
Data Thinking Notes
Jul 30, 2025 · Artificial Intelligence

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

This article reviews the most influential papers in large language model research since 2017, covering foundational works such as the Transformer, GPT‑3, BERT, scaling laws, and recent innovations like FlashAttention, Mamba, and QLoRA, highlighting their core contributions and impact on AI development.

AI researchModel OptimizationTransformer
0 likes · 28 min read
Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs
AI Algorithm Path
AI Algorithm Path
Jul 13, 2025 · Artificial Intelligence

How to Calculate the Right AI Model Size for Your PC (3B, 7B, 13B)

This article explains how to estimate the GPU memory required for running large language models of 3 B, 7 B, and 13 B parameters, walks through step‑by‑step calculations, shows how hardware limits affect feasibility, and offers practical optimization techniques such as quantization and CPU offloading.

AI model sizingCPU offloadingFP16
0 likes · 5 min read
How to Calculate the Right AI Model Size for Your PC (3B, 7B, 13B)
JD Cloud Developers
JD Cloud Developers
Jun 24, 2025 · Artificial Intelligence

How JD Retail’s xLLM Architecture Revolutionizes AI Inference for E‑Commerce

At GAITC2025, JD Retail’s AI Infra lead Zhang Ke detailed the challenges of e‑commerce AI inference and introduced the xLLM edge‑cloud unified large‑model architecture, highlighting adaptive scheduling, offline unified scheduling, multi‑layer pipelines, and agent collaboration that boost performance, cut costs, and pave the way for future AI advancements.

AI inferenceLarge ModelModel Optimization
0 likes · 6 min read
How JD Retail’s xLLM Architecture Revolutionizes AI Inference for E‑Commerce
JD Retail Technology
JD Retail Technology
Jun 20, 2025 · Artificial Intelligence

How JD Retail’s xLLM Architecture Revolutionizes AI Inference for E‑Commerce

The article details JD Retail’s collaboration with Tsinghua University to build the xLLM edge‑cloud unified large‑model inference framework, addressing e‑commerce AI challenges such as diverse inputs, task scheduling, model compression, and cost, while outlining future research directions and performance gains.

AI inferenceModel Optimizationedge-cloud
0 likes · 7 min read
How JD Retail’s xLLM Architecture Revolutionizes AI Inference for E‑Commerce
Kuaishou Tech
Kuaishou Tech
Jun 4, 2025 · Artificial Intelligence

KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning

The KwaiPilot team released the KwaiCoder‑AutoThink‑preview model, which introduces a novel automatic‑thinking training paradigm and a process‑supervised reinforcement‑learning method called Step‑SRPO, enabling the model to dynamically switch between thinking and non‑thinking modes, reduce inference cost, and achieve up to 20‑point gains on code and math benchmarks while handling large‑scale codebases.

AI researchCode GenerationModel Optimization
0 likes · 12 min read
KwaiCoder-AutoThink-preview: An Automatic‑Thinking Large Model Enhanced with Step‑SRPO Reinforcement Learning
JD Tech
JD Tech
May 26, 2025 · Artificial Intelligence

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

This article details how JD Retail's young algorithm engineers tackled a series of AI engineering problems—including advertising image quality assessment with multi‑reward models, large‑language‑model‑driven query expansion, FFT‑and‑RDP‑based model pruning, and agent‑centric reinforcement learning—while sharing practical growth insights and code snippets.

AIComputer VisionModel Optimization
0 likes · 15 min read
Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning
ZhongAn Tech Team
ZhongAn Tech Team
Apr 28, 2025 · Artificial Intelligence

Weekly Tech Overview: Major AI Model Updates, Industry Funding, and Expert Perspectives on AI Agents and Consciousness

This weekly technology digest highlights significant advancements in artificial intelligence, including OpenAI's GPT-4o upgrades, Tencent's Hunyuan 3D v2.5 release, and major funding rounds for xAI and Manus, alongside expert discussions on the future evolution of AI agent networks and the theoretical possibility of machine consciousness.

AI agentsAI fundingModel Optimization
0 likes · 7 min read
Weekly Tech Overview: Major AI Model Updates, Industry Funding, and Expert Perspectives on AI Agents and Consciousness
Baidu Tech Salon
Baidu Tech Salon
Apr 28, 2025 · Artificial Intelligence

Inside Baidu’s Wenxin 4.5 Turbo & X1 Turbo: Architecture, Training Tricks, and Real-World Impact

At the Create2025 AI Developer Conference, Baidu unveiled the multimodal Wenxin 4.5 Turbo and X1 Turbo models, detailing their innovative architecture, self‑feedback post‑training, composite reasoning chains, data pipelines, and the new Wenxin KuaiMa 3.5 code assistant, while also showcasing ecosystem growth and cultural AI applications.

AI ConferenceBaiduCode Generation
0 likes · 9 min read
Inside Baidu’s Wenxin 4.5 Turbo & X1 Turbo: Architecture, Training Tricks, and Real-World Impact
JD Tech
JD Tech
Mar 19, 2025 · Artificial Intelligence

JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications

This article details JD Retail's AI engine that seamlessly supports both GPU and domestic NPU hardware, describing its heterogeneous cluster architecture, unified training and inference APIs, performance optimizations, extensive model coverage, and multiple production use cases across e‑commerce, logistics, and intelligent assistance.

AI EngineGPUJD Retail
0 likes · 20 min read
JD Retail's End‑to‑End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Real‑World Applications
Meituan Technology Team
Meituan Technology Team
Mar 6, 2025 · Artificial Intelligence

INT8 Quantization and Inference Optimization of DeepSeek R1 Model

Meituan’s search and recommendation team converted the FP8‑only DeepSeek‑R1 model to INT8 by first casting weights to BF16 and then applying block‑wise or channel‑wise quantization, which preserves GSM8K and MMLU accuracy while delivering 33% to 50% higher throughput on A100‑80G GPUs, and they released the SGLang‑based inference scripts and quantized weights publicly, enabling deployment on older NVIDIA hardware without accuracy loss.

DeepSeek-R1GPU deploymentINT8 Quantization
0 likes · 11 min read
INT8 Quantization and Inference Optimization of DeepSeek R1 Model
Open Source Linux
Open Source Linux
Mar 5, 2025 · Artificial Intelligence

How DeepSeek‑R1 Redefines Prompt Engineering and Real‑World AI Deployment

The article analyzes DeepSeek‑R1’s low‑cost inference architecture, Chinese language optimizations, novel prompt‑engineering techniques, and the practical challenges of deploying large domestic models, offering insights into vertical AI applications and the evolving open‑source ecosystem in China.

AI deploymentDeepSeekModel Optimization
0 likes · 8 min read
How DeepSeek‑R1 Redefines Prompt Engineering and Real‑World AI Deployment
JD Retail Technology
JD Retail Technology
Mar 4, 2025 · Artificial Intelligence

JD Retail End-to-End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Applications

JD Retail’s Nine‑Number Algorithm Platform delivers an end‑to‑end AI engine that unifies GPU and domestic NPU resources across a thousand‑card cluster, offering zero‑cost model migration, optimized training and inference pipelines, support for over 40 LLM and multimodal models, and proven business‑level performance that reduces dependence on overseas chips.

AIDistributed TrainingGPU
0 likes · 19 min read
JD Retail End-to-End AI Engine Compatible with GPU and Domestic NPU: Architecture, Optimization, and Applications
DaTaobao Tech
DaTaobao Tech
Feb 21, 2025 · Artificial Intelligence

AI-Powered Face Swapping for the Spring Festival Gala: System Design and Deployment

The paper details the design and deployment of an AI‑driven face‑swap platform for the 2025 CCTV Spring Festival Gala, featuring a dual‑model SDXL pipeline with ControlNet and LoRA fine‑tuning, optimized preprocessing and GPU‑specific acceleration to achieve sub‑3‑second latency at over 10 k QPS, supporting scaling, throttling, and multi‑region load balancing, and ultimately serving ten million users and generating hundreds of millions of personalized gala images.

AI EngineeringAIGCModel Optimization
0 likes · 28 min read
AI-Powered Face Swapping for the Spring Festival Gala: System Design and Deployment
Tencent Technical Engineering
Tencent Technical Engineering
Feb 14, 2025 · Artificial Intelligence

Technical Overview of DeepSeek Series Models and Innovations

The DeepSeek series introduces a refined Mixture‑of‑Experts architecture with fine‑grained expert partitioning, shared experts, and learnable load‑balancing, alongside innovations such as Group Relative Policy Optimization, Multi‑Head Latent Attention, Multi‑Token Prediction, mixed‑precision FP8 training, and the R1/R1‑Zero models that use Long‑CoT reasoning, reinforcement‑learning pipelines, and distillation to achieve OpenAI‑comparable performance at lower cost.

AIDeepSeekMixture of Experts
0 likes · 25 min read
Technical Overview of DeepSeek Series Models and Innovations
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Feb 8, 2025 · Artificial Intelligence

Why DeepSeek V3 and R1 Are Redefining Low‑Cost AI: Architecture, Training Tricks, and Industry Impact

This article analyses DeepSeek's V3 and R1 models, explaining how their innovative MoE architecture, Multi‑Head Latent Attention, low‑cost training strategies, and distributed‑training optimizations deliver high‑performance large language models while reducing GPU/NPU demand and sparking industry excitement.

AI inferenceDeepSeekMixture of Experts
0 likes · 16 min read
Why DeepSeek V3 and R1 Are Redefining Low‑Cost AI: Architecture, Training Tricks, and Industry Impact
IT Architects Alliance
IT Architects Alliance
Feb 8, 2025 · Artificial Intelligence

Inside DeepSeek: How Its Innovative Architecture Redefines AI Performance

This article examines DeepSeek's advanced Transformer‑based architecture, dynamic routing, MoE system, multi‑stage training, efficient inference, multimodal capabilities, real‑world applications, technical challenges, and future prospects, providing a comprehensive technical analysis of the model's strengths and limitations.

AI ArchitectureDeepSeekModel Optimization
0 likes · 15 min read
Inside DeepSeek: How Its Innovative Architecture Redefines AI Performance
Tencent Cloud Developer
Tencent Cloud Developer
Feb 6, 2025 · Artificial Intelligence

DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts

The article reviews DeepSeek’s V‑series papers, explaining how scaling‑law insights, Grouped Query Attention, a depth‑first design, loss‑free load balancing, multi‑token prediction and Multi‑Head Latent Attention together enable economical mixture‑of‑experts LLMs that rival closed‑source models while cutting compute and hardware costs.

DeepSeekGrouped Query AttentionMixture of Experts
0 likes · 13 min read
DeepSeek V Series: Technical Overview of Scaling Laws, Grouped Query Attention, and Mixture‑of‑Experts
DevOps
DevOps
Jan 25, 2025 · Artificial Intelligence

DeepSeek R1: An Open‑Source Large Model Matching OpenAI’s o1 at a Fraction of the Cost

DeepSeek’s newly released R1 model delivers performance comparable to OpenAI’s o1 while cutting inference costs by 90‑95%, leveraging innovative MLA and MoE architectures, low‑cost hardware training, an open‑source strategy, and a youthful, flat‑structured team that challenges the AI industry’s high‑spending model.

AI startupCost‑Efficient TrainingDeepSeek
0 likes · 12 min read
DeepSeek R1: An Open‑Source Large Model Matching OpenAI’s o1 at a Fraction of the Cost
DevOps
DevOps
Dec 8, 2024 · Artificial Intelligence

Understanding Fine-Tuning in Machine Learning: Concepts, Importance, Steps, and Applications

This article explains fine‑tuning in machine learning, covering its definition, why it matters, the role of pre‑trained models, detailed step‑by‑step procedures, advantages, and diverse applications such as NLP, computer vision, speech and finance, with practical examples like face recognition and object detection.

AI applicationsFine-tuningModel Optimization
0 likes · 16 min read
Understanding Fine-Tuning in Machine Learning: Concepts, Importance, Steps, and Applications
Model Perspective
Model Perspective
Dec 5, 2024 · Artificial Intelligence

Choosing the Right Activation Function: Pros, Cons, and Best Practices

Activation functions are crucial for neural networks, providing non‑linearity, normalization, and gradient flow; this article reviews common functions such as Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Noisy ReLU, Softmax, and Swish, comparing their characteristics, advantages, drawbacks, and guidance for selecting the appropriate one.

Model OptimizationNeural Networksactivation functions
0 likes · 10 min read
Choosing the Right Activation Function: Pros, Cons, and Best Practices
Zhuanzhuan Tech
Zhuanzhuan Tech
Oct 24, 2024 · Artificial Intelligence

Pre‑Ranking in Recommendation Systems: Model and Sample Optimization Practices at Zhuanzhuan Home Page

This article reviews the role of pre‑ranking in multi‑stage recommendation pipelines, compares dual‑tower and fully‑connected DNN models, discusses negative and positive sample selection strategies, and presents Zhuanzhuan's practical improvements in model architecture and traffic‑pool allocation to boost precision and diversity.

Model Optimizationdual-towerpre‑ranking
0 likes · 16 min read
Pre‑Ranking in Recommendation Systems: Model and Sample Optimization Practices at Zhuanzhuan Home Page
Tencent Advertising Technology
Tencent Advertising Technology
Oct 14, 2024 · Artificial Intelligence

Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising

This paper presents the implementation and practice of generative retrieval based on Yuan large model in Tencent Advertising, addressing three key challenges: user intent capture, model alignment in advertising domain, and high-performance platform design under ROI constraints.

Generative RetrievalHigh‑performance computingModel Optimization
0 likes · 17 min read
Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 26, 2024 · Artificial Intelligence

Optimizing Advertising Feature Evaluation Process with the Opal Machine Learning Platform

By migrating iQIYI’s advertising feature‑evaluation workflow to the Opal machine‑learning platform, the team replaced a manual, engineer‑heavy process with a unified, automated pipeline that cut evaluation cycles from five days to 1.5 days, tripling iteration speed while lowering barriers and improving consistency for future feature optimization.

Feature EvaluationModel OptimizationOpal Platform
0 likes · 6 min read
Optimizing Advertising Feature Evaluation Process with the Opal Machine Learning Platform
Kuaishou Tech
Kuaishou Tech
Jul 17, 2024 · Artificial Intelligence

Key Technical Innovations in Kuaishou’s “Kuaiyi” Large Model and Its Real-World Applications

The article details Kuaishou’s development of the 175B “Kuaiyi” multimodal large model, presenting eight novel technical innovations—from Temporal Scaling Law and MiLe Loss to MoE‑enhanced reward modeling—and describes how these advances enable high‑performance AI services such as the AI Xiao Kuai chatbot across diverse real‑world scenarios.

AI applicationsModel OptimizationMultimodal AI
0 likes · 12 min read
Key Technical Innovations in Kuaishou’s “Kuaiyi” Large Model and Its Real-World Applications
360 Smart Cloud
360 Smart Cloud
Jul 4, 2024 · Artificial Intelligence

Optimizing Mixture-of-Experts (MoE) Training with the QLM Framework

This article introduces the background and challenges of large language model training, explains the Mixture-of-Experts (MoE) architecture, and details several optimization techniques implemented in the QLM framework—including fine-grained and shared experts, top‑k gating, token distribution, expert parallelism, and grouped GEMM – to improve training efficiency and performance.

AIDistributed TrainingMixture of Experts
0 likes · 10 min read
Optimizing Mixture-of-Experts (MoE) Training with the QLM Framework
Bilibili Tech
Bilibili Tech
Jun 14, 2024 · Artificial Intelligence

Technical Report on the Index-1.9B Series: Model Variants, Pre‑training Optimizations, and Alignment Experiments

The report presents the open‑source Index‑1.9B family—base, pure, chat, and character variants—detailing benchmark results, pre‑training optimizations such as a normalized LM‑Head and deeper‑slim architectures, the importance of modest instruction data, alignment via SFT/DPO, role‑play enhancements with RAG, and acknowledges remaining safety and factual limitations.

AlignmentInstruction TuningLLM
0 likes · 15 min read
Technical Report on the Index-1.9B Series: Model Variants, Pre‑training Optimizations, and Alignment Experiments
Baobao Algorithm Notes
Baobao Algorithm Notes
May 9, 2024 · Artificial Intelligence

Inside Deepseek‑V2: How Multi‑Head Latent Attention Cuts KV‑Cache and Boosts Performance

This article provides an in‑depth technical analysis of Deepseek‑V2, covering its 236B parameter size, Multi‑Head Latent Attention optimization that reduces KV‑cache memory, architectural details, training pipelines, infrastructure choices, and performance results on benchmarks such as MMLU and instruction following.

AI ArchitectureDeepSeekModel Optimization
0 likes · 17 min read
Inside Deepseek‑V2: How Multi‑Head Latent Attention Cuts KV‑Cache and Boosts Performance
Baidu Geek Talk
Baidu Geek Talk
Mar 27, 2024 · Industry Insights

How Baidu’s Qianfan Platform Is Accelerating Enterprise AI Adoption

The article reviews Baidu’s Qianfan AI platform, highlighting rapid large‑model advances, enterprise challenges, new AppBuilder features, lightweight model releases, and cost‑effective model routing that together aim to boost AI adoption across industries.

AIEnterprise AIModel Optimization
0 likes · 16 min read
How Baidu’s Qianfan Platform Is Accelerating Enterprise AI Adoption
Ximalaya Technology Team
Ximalaya Technology Team
Feb 20, 2024 · Artificial Intelligence

Optimization of Deep Learning-Based CTR Models in Advertising

This report presents recent advances in optimizing deep learning click‑through‑rate models for advertising, including improved embedding mechanisms, novel feature‑interaction and architecture designs such as attention‑based behavior sequencing, multi‑tower and Mixture‑of‑Experts networks, dynamic ID handling, hourly updates, incremental training, and outlines future multi‑modal and embedding‑importance research.

CTR modelDeep LearningEmbedding Techniques
0 likes · 13 min read
Optimization of Deep Learning-Based CTR Models in Advertising
Sohu Tech Products
Sohu Tech Products
Jan 3, 2024 · Artificial Intelligence

OPPO Advertising Recall Algorithm: Architecture, Model Selection, Evaluation, and Optimization

OPPO revamped its advertising recall system by replacing a latency‑prone directional pipeline with an ANN‑based full‑ad personalized architecture, employing a dual‑tower LTR model, multi‑path auxiliary branches, refined offline metrics, price‑sensitive and hard‑negative sampling, and hybrid joint training, which together boosted ARPU by about 15%.

AdvertisingModel Optimizationlarge-scale classification
0 likes · 24 min read
OPPO Advertising Recall Algorithm: Architecture, Model Selection, Evaluation, and Optimization
DataFunSummit
DataFunSummit
Dec 3, 2023 · Artificial Intelligence

Shopee Live Personalized CTR Optimization via Calibration‑Based Meta‑Learning

This article presents Shopee's calibration‑based meta‑learning approach for personalized click‑through‑rate prediction in live streaming, detailing business context, modeling goals, model evolution from Calibration4CVR to CBMR, EmbCB and MlpCB optimizations, and multi‑task and multi‑scene extensions that achieve significant AUC and business metric improvements.

CTRModel OptimizationShopee
0 likes · 11 min read
Shopee Live Personalized CTR Optimization via Calibration‑Based Meta‑Learning
Baidu Tech Salon
Baidu Tech Salon
Nov 10, 2023 · Artificial Intelligence

Baidu Search Deep Learning Model Architecture and Optimization Practices

Baidu's Search Architecture team details how its deep‑learning models have evolved to deliver direct answer results via semantic embeddings, describes a massive online inference pipeline that rewrites queries, ranks relevance, and classifies types, and outlines optimization techniques—including data I/O, CPU/GPU balancing, pruning, quantization, and distillation—to achieve high‑throughput, low‑latency search.

BaiduGPU OptimizationInference System
0 likes · 13 min read
Baidu Search Deep Learning Model Architecture and Optimization Practices
Baidu Geek Talk
Baidu Geek Talk
Nov 9, 2023 · Artificial Intelligence

Deep Learning Model Architecture Evolution in Baidu Search

The article chronicles Baidu Search’s Model Architecture Group’s evolution of deep‑learning‑driven search, detailing the shift from inverted‑index to semantic vector indexing, the use of transformer‑based models for text and image queries, large‑scale offline/online pipelines, and extensive GPU‑centric optimizations such as pruning, quantization and distillation, all aimed at delivering precise, cost‑effective results to hundreds of millions of users.

ErnieGPU inferenceModel Optimization
0 likes · 14 min read
Deep Learning Model Architecture Evolution in Baidu Search
DaTaobao Tech
DaTaobao Tech
Sep 11, 2023 · Artificial Intelligence

Large Language Model Upgrade Paths and Architecture Selection

This article analyzes upgrade paths of major LLMs—ChatGLM, LLaMA, Baichuan—detailing performance, context length, and architectural changes, then examines essential capabilities, data cleaning, tokenizer and attention design, and offers practical guidance for balanced scaling and efficient model construction.

BaichuanChatGLMLLM architecture
0 likes · 32 min read
Large Language Model Upgrade Paths and Architecture Selection
NetEase Media Technology Team
NetEase Media Technology Team
Aug 9, 2023 · Artificial Intelligence

GPU Model Inference Optimization Practices in NetEase News Recommendation System

The article outlines practical GPU inference optimization for NetEase’s news recommendation, covering model analysis with Netron, multi‑GPU parallelism, memory‑copy reduction, batch sizing, TensorRT conversion and tuning, custom plugins, and the GRPS serving framework to achieve significant latency and utilization gains.

GPU inferenceModel OptimizationProfiling
0 likes · 44 min read
GPU Model Inference Optimization Practices in NetEase News Recommendation System
DataFunTalk
DataFunTalk
Aug 9, 2023 · Artificial Intelligence

Key Technologies for Domain‑Specific Large Models: Insights from the World AI Conference

This report, based on Professor Xiao Yanghua’s presentation at the World AI Conference, examines why vertical domains need general large models, outlines their key capabilities such as open‑world understanding, combinatorial innovation, evaluation, complex instruction execution, task planning, and symbolic reasoning, and discusses current limitations and optimization strategies for domain‑specific deployment.

AI EvaluationModel OptimizationVertical AI
0 likes · 17 min read
Key Technologies for Domain‑Specific Large Models: Insights from the World AI Conference
DataFunSummit
DataFunSummit
Jun 28, 2023 · Artificial Intelligence

OPPO's CHAOS Pretrained Large Model and GammaE Knowledge‑Graph Multi‑hop Reasoning: Techniques and Insights

This article presents OPPO Research Institute's recent advances in large‑model AI, detailing the CHAOS pretrained model that topped the CLUE leaderboard, the knowledge‑enhanced training pipeline, and the GammaE model for multi‑hop reasoning over knowledge graphs, together with experimental results and practical training tips.

AI researchGammaEKnowledge Graph
0 likes · 20 min read
OPPO's CHAOS Pretrained Large Model and GammaE Knowledge‑Graph Multi‑hop Reasoning: Techniques and Insights
Bilibili Tech
Bilibili Tech
Jun 13, 2023 · Artificial Intelligence

InferX Inference Framework and Its Integration with Triton for High‑Performance AI Model Serving

Bilibili’s self‑developed InferX framework, combined with NVIDIA Triton Inference Server, streamlines AI model serving by adding quantization, structured sparsity, and custom kernels, delivering up to eight‑fold throughput gains, cutting GPU usage by half, and enabling faster, cost‑effective OCR and large‑model deployments.

AI inferenceGPU utilizationInferX
0 likes · 10 min read
InferX Inference Framework and Its Integration with Triton for High‑Performance AI Model Serving
DataFunTalk
DataFunTalk
Apr 25, 2023 · Artificial Intelligence

DAMO-YOLO: An Efficient Target Detection Framework with NAS, Multi‑Scale Fusion, and Full‑Scale Distillation

This article introduces DAMO‑YOLO, a high‑performance object detection framework that combines low‑cost model customization via MAE‑NAS, an Efficient RepGFPN with HeavyNeck for superior multi‑scale detection, and a full‑scale distillation technique, delivering faster inference, lower FLOPs, and higher accuracy across diverse industrial scenarios.

DistillationModel OptimizationNAS
0 likes · 15 min read
DAMO-YOLO: An Efficient Target Detection Framework with NAS, Multi‑Scale Fusion, and Full‑Scale Distillation
Bilibili Tech
Bilibili Tech
Feb 28, 2023 · Artificial Intelligence

High‑Quality Automatic Speech Recognition (ASR) Solutions at Bilibili: Data, Model, and Deployment Optimizations

Bilibili’s high‑quality ASR system combines large‑scale filtered business data, semi‑supervised Noisy‑Student training, an end‑to‑end CTC model with lattice‑free MMI decoding, and FP16‑optimized FasterTransformer inference on Triton, delivering top‑ranked accuracy, low latency, and scalable deployment for diverse Chinese‑English video content.

ASRBilibiliEnd-to-End
0 likes · 18 min read
High‑Quality Automatic Speech Recognition (ASR) Solutions at Bilibili: Data, Model, and Deployment Optimizations
58 Tech
58 Tech
Jan 12, 2023 · Artificial Intelligence

Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results

This article presents a comprehensive overview of the Efficient Conformer model for large‑scale end‑to‑end speech recognition, detailing its architectural improvements such as progressive downsampling and grouped multi‑head self‑attention, the PyTorch implementation in WeNet, streaming inference handling, experimental CER gains on AISHELL‑1 and production data, and future development plans.

ASREfficient ConformerModel Optimization
0 likes · 16 min read
Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results
Baidu Geek Talk
Baidu Geek Talk
Jan 5, 2023 · Artificial Intelligence

How Baidu’s AIAK‑Inference Supercharges AI Model Inference on GPUs

This article provides an end‑to‑end analysis of AI inference bottlenecks, reviews common industry acceleration techniques, and details Baidu Intelligent Cloud’s AIAK‑Inference suite—including its architecture, optimization strategies such as model pruning, operator fusion, and single‑operator tuning—followed by a demo showing significant latency reductions on ResNet‑50 and other models.

AI inferenceAIAK-InferenceBaidu Cloud
0 likes · 16 min read
How Baidu’s AIAK‑Inference Supercharges AI Model Inference on GPUs
DataFunTalk
DataFunTalk
Jan 2, 2023 · Artificial Intelligence

Tail Traffic Modeling and Data‑Driven Risk Strategies at 360 Shuke

This article presents 360 Shuke's practical approach to modeling low‑volume (tail) credit traffic using accumulated data, covering the characteristics of tail traffic, sample expansion under low approval rates, timeliness‑based data clustering, and ranking optimization for high‑quality head customers.

Data ClusteringModel OptimizationRisk Modeling
0 likes · 19 min read
Tail Traffic Modeling and Data‑Driven Risk Strategies at 360 Shuke
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 27, 2022 · Artificial Intelligence

How to Supercharge AI Inference: End‑to‑End Acceleration Strategies and Baidu’s AIAK‑Inference

This article presents a comprehensive analysis of AI inference bottlenecks, explores industry acceleration techniques such as model simplification, operator fusion, and single‑operator optimization, and details Baidu Cloud's AIAK‑Inference suite with practical demos showing up to 90% latency reduction.

AI inferenceAIAK-InferenceBaidu Cloud
0 likes · 16 min read
How to Supercharge AI Inference: End‑to‑End Acceleration Strategies and Baidu’s AIAK‑Inference
21CTO
21CTO
Sep 26, 2022 · Artificial Intelligence

Unlocking Live-Streaming Recommendations: Strategies from Tencent Music’s Interactive Systems

This article explores the evolution of recommendation systems for interactive live‑streaming scenarios, covering common system traits, user cold‑start solutions, prior knowledge modeling, scene‑specific modeling, and practical Q&A insights drawn from Tencent Music’s real‑world deployments.

AIModel OptimizationRecommendation Systems
0 likes · 19 min read
Unlocking Live-Streaming Recommendations: Strategies from Tencent Music’s Interactive Systems
DaTaobao Tech
DaTaobao Tech
Sep 7, 2022 · Artificial Intelligence

Online Deep Learning (ODL) Model Optimization for Real‑Time Recommendation

The team enhanced real‑time recommendation by redesigning TensorFlow graphs—using constant‑folding, a custom CallGraphOP cache, a simplified dense layer, and CUDA‑Graph compatibility—boosting single‑machine throughput ~40%, raising GPU utilization from 30% to 43%, cutting latency and saving roughly 30% of hardware resources.

CUDA GraphGPU performanceModel Optimization
0 likes · 11 min read
Online Deep Learning (ODL) Model Optimization for Real‑Time Recommendation
Alibaba Terminal Technology
Alibaba Terminal Technology
Jun 22, 2022 · Artificial Intelligence

How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization

This article examines the computational capabilities of modern mobile devices for machine learning, compares training times on a MacBook and iPhone, explains model evaluation metrics like FLOPs, and provides step‑by‑step guides for converting and optimizing models using TensorFlow, PyTorch, ONNX, JAX, and TVM for edge deployment.

JAXModel OptimizationTVM
0 likes · 29 min read
How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization
DataFunTalk
DataFunTalk
May 25, 2022 · Artificial Intelligence

Optimizing E-commerce Product Copy Generation: Challenges, Framework, and System Practices

This article presents a comprehensive overview of the challenges in e‑commerce product copy generation, introduces a unified framework comprising a copy generation system, a copy‑cleaning subsystem, and a quality evaluation module, and details practical optimization techniques applied to short and long copy scenarios.

AIModel OptimizationText Generation
0 likes · 17 min read
Optimizing E-commerce Product Copy Generation: Challenges, Framework, and System Practices
DaTaobao Tech
DaTaobao Tech
May 18, 2022 · Artificial Intelligence

Deep Ranking Optimization for E-commerce Recommendation

The 2021 Taobao New‑Product team boosted e‑commerce recommendation by redesigning the coarse‑ranking stage with a dual‑tower DSSM, low‑cost feature‑crossing, NOVA attention and multi‑task distillation from a fine‑ranking teacher, delivering up to +30‰ GAUC gain and 3‑5 % online CTR and click improvements.

Model Optimizationdeep rankinge‑commerce
0 likes · 17 min read
Deep Ranking Optimization for E-commerce Recommendation
DaTaobao Tech
DaTaobao Tech
Apr 26, 2022 · Artificial Intelligence

Optimization of Recall, Ranking, and Downward Modeling for the "Every Square Every House" Infinite-Scroll Light App

This article details a year‑long series of experiments on the Taobao “Every Square Every House” infinite‑scroll light app, describing how added recall paths, a coarse‑ranking filter, multi‑task MMOE sorting, a lightweight down‑scroll predictor, and relevance‑enhanced features together boosted click‑through, scroll depth and per‑user engagement by double‑digit percentages.

A/B testingModel Optimizationinfinite scroll
0 likes · 14 min read
Optimization of Recall, Ranking, and Downward Modeling for the "Every Square Every House" Infinite-Scroll Light App
Tencent Cloud Developer
Tencent Cloud Developer
Apr 20, 2022 · Artificial Intelligence

Coarse Ranking in Recommendation Systems: Architecture, Models, and Optimization

Coarse ranking bridges recall and fine ranking by trimming tens of thousands of candidates to a few hundred or thousand using a three‑part framework—sample construction, ordinary and cross‑feature engineering, and evolving deep models—from rule‑based to lightweight MLPs, while employing distillation, feature crossing, pruning, quantization, and bias mitigation to balance accuracy with strict latency constraints.

Model OptimizationRecommendation Systemsartificial intelligence
0 likes · 9 min read
Coarse Ranking in Recommendation Systems: Architecture, Models, and Optimization
Tencent Cloud Developer
Tencent Cloud Developer
Mar 15, 2022 · Artificial Intelligence

Comprehensive Overview of Ranking Models in Recommendation Systems

The article provides a thorough guide to ranking in recommendation systems, detailing the pipeline architecture, sample handling challenges, extensive feature engineering categories, the evolution from collaborative filtering to deep and attention‑based models, and key optimization trade‑offs between memorization, generalization, and efficient user‑interest modeling.

CTR predictionDeep LearningModel Optimization
0 likes · 19 min read
Comprehensive Overview of Ranking Models in Recommendation Systems
Baidu Geek Talk
Baidu Geek Talk
Mar 9, 2022 · Artificial Intelligence

Communication Tower Recognition Using PaddlePaddle: An Industrial AI Practice

The article describes an industrial AI system that uses PaddlePaddle’s PP‑PicoDet model, enhanced with COCO pre‑training and quantization, to accurately recognize communication towers in diverse outdoor conditions, achieving 94.5% mAP at 78 ms inference and supporting edge deployment via PaddleLite and ONNX.

Industrial AIModel OptimizationPP-PicoDet
0 likes · 6 min read
Communication Tower Recognition Using PaddlePaddle: An Industrial AI Practice
DataFunTalk
DataFunTalk
Jan 26, 2022 · Artificial Intelligence

Exploring and Practicing Generative Chat in OPPO's XiaoBu Assistant

This article presents a comprehensive overview of OPPO's XiaoBu Assistant, detailing its research background, chat skill architecture, evolution from retrieval and rule‑based methods to generative models, industry model comparisons, decoding and ranking strategies, safety mechanisms, performance optimizations, and evaluation results.

ChatbotDialogue SystemsModel Optimization
0 likes · 20 min read
Exploring and Practicing Generative Chat in OPPO's XiaoBu Assistant
ByteDance Terminal Technology
ByteDance Terminal Technology
Nov 9, 2021 · Artificial Intelligence

Edge AI Video Preloading: Case Study and Implementation with ByteDance's Client AI Platform

This article presents a comprehensive case study of applying edge AI to video preloading on the Xigua Video platform, detailing scenario analysis, predictive modeling of user behavior, feature engineering, on‑device model inference, dynamic algorithm package deployment, experimental evaluation, and the resulting performance and cost improvements.

A/B testingModel Optimizationclient inference
0 likes · 18 min read
Edge AI Video Preloading: Case Study and Implementation with ByteDance's Client AI Platform
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 5, 2021 · Artificial Intelligence

Accelerating 4K Video Super‑Resolution with TensorRT: iQIYI’s Optimization and Production Practices

iQIYI optimized a 4K video super-resolution model using TensorRT, employing split of graph, operator fusion, custom CUDA kernels, and int8 quantization, achieving tenfold speedup (≈180 ms per 1080p frame) and demonstrating deep customization potential for large‑scale production.

INT8 QuantizationModel OptimizationTensorRT
0 likes · 17 min read
Accelerating 4K Video Super‑Resolution with TensorRT: iQIYI’s Optimization and Production Practices
DataFunTalk
DataFunTalk
Oct 22, 2021 · Artificial Intelligence

Applying AI Techniques to Credit Reporting and Risk Modeling: Model Structure, Pre‑training, Ranking and Interpretability

This article presents a comprehensive overview of how AI technologies are applied to credit reporting and loan risk modeling, detailing data characteristics, end‑to‑end model architectures, pre‑training strategies, risk‑ranking methods, and interpretability techniques for financial risk assessment.

AIInterpretabilityModel Optimization
0 likes · 17 min read
Applying AI Techniques to Credit Reporting and Risk Modeling: Model Structure, Pre‑training, Ranking and Interpretability
DataFunSummit
DataFunSummit
Oct 22, 2021 · Artificial Intelligence

Applying AI Techniques to Credit Reporting and Risk Modeling

This article presents a comprehensive overview of how AI technologies are applied to credit reporting, covering data characteristics, end‑to‑end model architectures, pre‑training strategies, risk ranking objectives, and interpretability methods to improve financial risk assessment.

AIInterpretabilityModel Optimization
0 likes · 16 min read
Applying AI Techniques to Credit Reporting and Risk Modeling
58 Tech
58 Tech
Sep 24, 2021 · Artificial Intelligence

58.com AI Algorithm Competition: Award Ceremony, Top Teams, and Solution Sharing

The 58.com AI algorithm competition showcased over 210 teams competing to improve job recommendation click‑through and conversion rates, featured an award ceremony with speeches, highlighted the ten winning teams, and presented detailed solution shares—including tree models, feature‑engineering techniques, and deep‑learning approaches—while offering GPU resources on the WPAI platform for continued participation.

AI competitionCTR predictionModel Optimization
0 likes · 10 min read
58.com AI Algorithm Competition: Award Ceremony, Top Teams, and Solution Sharing
HaoDF Tech Team
HaoDF Tech Team
Sep 15, 2021 · Artificial Intelligence

Optimizing Question‑Answer Search Similarity in Haodf Online: A Semantic Similarity Model Case Study

This article describes how Haodf Online improved its medical question‑answer search by analyzing search challenges, adopting semantic similarity models based on pre‑trained language embeddings, designing contrastive training tasks, and evaluating the resulting increase in click‑through rate and user engagement.

Model Optimizationmedical-ainatural language processing
0 likes · 12 min read
Optimizing Question‑Answer Search Similarity in Haodf Online: A Semantic Similarity Model Case Study
Meituan Technology Team
Meituan Technology Team
Sep 9, 2021 · Artificial Intelligence

GPU Optimization Practices for CTR Models at Meituan

Meituan accelerates CTR model inference by fusing operators with TVM, optimizing CPU‑GPU data transfers, manually tuning high‑frequency subgraphs, and dynamically offloading workloads, achieving up to ten‑fold throughput gains on Tesla T4 GPUs while keeping latency stable and only modestly increasing beyond 128 QPS, though compilation remains slow and large‑model support needs improvement.

CTRDeep LearningGPU
0 likes · 16 min read
GPU Optimization Practices for CTR Models at Meituan
Alimama Tech
Alimama Tech
Sep 8, 2021 · Artificial Intelligence

Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening

Alibaba Mama’s advertising platform modernized its Tree‑based Deep Model by introducing a dual‑tower full‑library DNN with aggressive pre‑filtering and custom GPU TopK kernels, and a flattened‑tree model that retains beam search with multi‑head attention, while applying memory‑aware tricks such as attention swapping, softmax approximation, tiled‑matmul splitting, TensorCore batching, INT8 quantization and cache‑resident ad vectors, enabling multi‑fold latency reductions with minimal recall loss.

Beam SearchGPU AccelerationModel Optimization
0 likes · 15 min read
Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening
Baidu Geek Talk
Baidu Geek Talk
Sep 8, 2021 · Artificial Intelligence

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

The article provides a comprehensive technical overview of PaddleOCR's PP‑OCRv2, detailing its five major algorithmic enhancements, performance improvements over previous versions, historical milestones, core capabilities, and links to the open‑source repositories for developers interested in state‑of‑the‑art OCR solutions.

Computer VisionModel OptimizationOCR
0 likes · 10 min read
How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations
58 Tech
58 Tech
Jul 7, 2021 · Artificial Intelligence

Multi‑Objective Modeling for CRM Opportunity Allocation: Iterative Deep Learning Approaches

This article details the development and iterative optimization of multi‑task deep learning models—including XGBoost‑based baselines, MMoE, ESMM‑enhanced MMoE, PLE, and bias‑aware ranking—to simultaneously improve call‑out and connect‑out rates in a CRM opportunity distribution system, presenting offline gains and online deployment results for each version.

CRMModel Optimizationmulti-task learning
0 likes · 33 min read
Multi‑Objective Modeling for CRM Opportunity Allocation: Iterative Deep Learning Approaches
DataFunTalk
DataFunTalk
Jun 4, 2021 · Artificial Intelligence

Advances in Ranking Algorithms for the "Good Goods" Recommendation Scenario

This article presents a comprehensive overview of recent advancements in ranking algorithms for the Good Goods recommendation scenario, covering long‑sequence modeling, category‑retrieval attention, multi‑objective ranking, model structure optimizations, loss functions, and LTR techniques, along with experimental results and practical insights.

LTRModel Optimizationattention
0 likes · 13 min read
Advances in Ranking Algorithms for the "Good Goods" Recommendation Scenario
AntTech
AntTech
Apr 13, 2021 · Artificial Intelligence

Ant Financial’s ZhiXiaoBao Team Achieves Human-Level Scores on SQuAD 2.0 and Advances Machine Reading Comprehension

The ZhiXiaoBao technical team at Ant Financial broke the SQuAD 2.0 leaderboard with a model that surpasses human performance, detailing the challenges of natural‑language understanding, the specific ranking and data‑augmentation techniques they employed, and the broader impact on fintech knowledge‑base automation and future AI research.

FinTechKnowledge BaseModel Optimization
0 likes · 9 min read
Ant Financial’s ZhiXiaoBao Team Achieves Human-Level Scores on SQuAD 2.0 and Advances Machine Reading Comprehension
DataFunTalk
DataFunTalk
Apr 5, 2021 · Artificial Intelligence

Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge

The article reviews the Chinese NLP pre‑training model generalization competition, detailing data preprocessing, augmentation, external data usage, model scaling and architecture tweaks, loss functions, learning‑rate and adversarial training strategies, regularization techniques, post‑processing optimizations, and ineffective methods, highlighting their impact on performance metrics.

Loss FunctionsModel OptimizationNLP
0 likes · 15 min read
Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge
58 Tech
58 Tech
Mar 24, 2021 · Artificial Intelligence

Automated Detection of Illegal Watermarks in Images Using Deep Learning at 58.com

This article describes how 58.com built an end‑to‑end deep‑learning watermark detection service, covering business needs, data collection and augmentation, model selection and iterative improvements (Faster‑RCNN, SSD, YOLOv3, anchor‑free methods), deployment results, and future research directions.

Computer VisionImage ModerationModel Optimization
0 likes · 14 min read
Automated Detection of Illegal Watermarks in Images Using Deep Learning at 58.com
DataFunTalk
DataFunTalk
Mar 17, 2021 · Artificial Intelligence

Deep Ranking Model Evolution and Applications in Taobao Live: DBMTL, DMR, and RUI Ranking

This article presents a comprehensive overview of Taobao Live's deep ranking system evolution, detailing the DBMTL multi‑task learning framework, the two‑tower DMR matching‑ranking architecture, and the RUI Ranking refer‑item model, together with their offline formulas, online deployment scenarios, and measured performance gains across click‑through, watch‑time, and conversion metrics.

AIDeep LearningModel Optimization
0 likes · 27 min read
Deep Ranking Model Evolution and Applications in Taobao Live: DBMTL, DMR, and RUI Ranking
360 Tech Engineering
360 Tech Engineering
Mar 1, 2021 · Artificial Intelligence

Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search

This article details the engineering challenges of serving a large BERT model in real‑time for 360 Search and describes a series of optimizations—including TensorRT‑based kernel fusion, model quantization, knowledge distillation, multi‑stream execution, caching, and dynamic sequence handling—that together achieve low latency, high throughput, and stable deployment on GPU clusters.

BERTGPUModel Optimization
0 likes · 10 min read
Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search
DataFunTalk
DataFunTalk
Feb 10, 2021 · Artificial Intelligence

Deep Learning Based Search Ranking Optimization for 58.com Rental Services

This article describes how 58.com’s rental platform leverages deep learning models such as Wide&Deep, DeepFM, DCN, DIN, and DIEN to improve search ranking, detailing data pipelines, feature engineering, model iteration, multi‑task training, prediction optimizations, and resulting online performance gains.

Deep LearningModel OptimizationRecommendation Systems
0 likes · 27 min read
Deep Learning Based Search Ranking Optimization for 58.com Rental Services
58 Tech
58 Tech
Jan 25, 2021 · Artificial Intelligence

Deep Learning Ranking Models for 58.com Rental Search: Architecture, Model Iterations, and Optimization

This article presents the end‑to‑end design, feature engineering, model evolution (Wide&Deep, DeepFM, DCN, DIN, DIEN), multi‑task training, and deployment optimizations that 58.com applied to improve search ranking for its rental business, demonstrating significant gains in click‑through and conversion rates.

Model Optimizationfeature engineeringmulti-task learning
0 likes · 28 min read
Deep Learning Ranking Models for 58.com Rental Search: Architecture, Model Iterations, and Optimization
DeWu Technology
DeWu Technology
Nov 18, 2020 · Artificial Intelligence

Evolution and Technical Analysis of Dewu Photo Search

Dewu Photo Search evolved from a limited Aliyun‑based prototype to a self‑developed pipeline using EfficientNet detection and 128‑dim embeddings, boosting top‑1 shoe accuracy over 100 % and overall precision by up to 41 %, while reducing latency and improving scalability despite remaining stability challenges.

Deep LearningModel Optimizationfeature extraction
0 likes · 10 min read
Evolution and Technical Analysis of Dewu Photo Search
Ctrip Technology
Ctrip Technology
Nov 12, 2020 · Artificial Intelligence

Ctrip Machine Translation Platform: Architecture, Data Construction, Algorithm Design, and Performance Optimization

This article presents a comprehensive overview of Ctrip's multilingual machine translation platform, detailing demand analysis, system architecture, data pipeline, algorithmic innovations such as task‑space fusion and term‑translation interventions, as well as extensive performance optimizations for low‑resource languages.

AICtripModel Optimization
0 likes · 20 min read
Ctrip Machine Translation Platform: Architecture, Data Construction, Algorithm Design, and Performance Optimization
DataFunTalk
DataFunTalk
Aug 18, 2020 · Artificial Intelligence

COLD: A Next‑Generation Pre‑Ranking System for Online Advertising

The article introduces COLD, a computing‑power‑aware online and lightweight deep pre‑ranking system for Alibaba's targeted ads, detailing its evolution from static CTR models to vector‑inner‑product models, its flexible network architecture with feature‑selection via SE blocks, engineering optimizations such as parallelism, column‑wise computation, Float16 and MPS, and demonstrates superior offline and online performance through extensive experiments.

COLDModel Optimizationfeature selection
0 likes · 11 min read
COLD: A Next‑Generation Pre‑Ranking System for Online Advertising
360 Quality & Efficiency
360 Quality & Efficiency
Aug 7, 2020 · Artificial Intelligence

Replacing Fully Connected Layers with Fully Convolutional Networks for Variable‑Scale Image Tasks

This article analyses the drawbacks of using fully‑connected layers in convolutional neural networks for image tasks, proposes fully‑convolutional alternatives with 1×1 convolutions and strategic max‑pooling, provides TensorFlow code examples, compares model sizes and performance, and discusses deployment considerations for variable‑size inputs.

CNNFully Convolutional NetworkImage Classification
0 likes · 7 min read
Replacing Fully Connected Layers with Fully Convolutional Networks for Variable‑Scale Image Tasks
Qunar Tech Salon
Qunar Tech Salon
May 13, 2020 · Artificial Intelligence

Intelligent Hotel Post‑Sale QA System: Model Selection, Evaluation, and Engineering Optimization

This article describes the design, model selection, experimental evaluation, and engineering optimization of an AI‑driven post‑sale question‑answering system for hotel services, covering FAQ construction, intent detection, deep‑learning matching models such as DSSM, ESIM, BERT, and their performance and latency trade‑offs.

AIBERTDSSM
0 likes · 14 min read
Intelligent Hotel Post‑Sale QA System: Model Selection, Evaluation, and Engineering Optimization