Tagged articles
137 articles
Page 2 of 2
Alibaba Terminal Technology
Alibaba Terminal Technology
Dec 15, 2021 · Artificial Intelligence

Unlock Real-Time Mobile OCR: Inside Ant’s xNN-OCR Engine and Its Tiny, Fast AI

Ant’s self‑developed xNN‑OCR demonstrates how advanced OCR can run offline on smartphones by combining GAN‑based data synthesis, lightweight ShuffleNet‑inspired detection, NAS‑optimized recognition, and aggressive model compression, delivering near‑real‑time accuracy for diverse mobile scenarios while preserving privacy and low cost.

NASdata synthesisedge AI
0 likes · 11 min read
Unlock Real-Time Mobile OCR: Inside Ant’s xNN-OCR Engine and Its Tiny, Fast AI
Alimama Tech
Alimama Tech
Nov 17, 2021 · Artificial Intelligence

Binary Code Based Hash Embedding for Efficient Deep Recommendation Models

Binary Code based Hash Embedding (BH) dramatically compresses deep recommendation model storage by converting feature IDs to binary codes and partitioning them into flexible blocks, yielding deterministic, collision‑free indices that achieve up to 1,000× size reduction while retaining about 99% of original accuracy, making it ideal for storage‑constrained deployments.

Embedding Storagebinary codedeep recommendation
0 likes · 13 min read
Binary Code Based Hash Embedding for Efficient Deep Recommendation Models
Alimama Tech
Alimama Tech
Nov 17, 2021 · Artificial Intelligence

Low‑Carbon Model Compression for Alibaba Mama Search Advertising CTR: Feature Volume and Embedding Dimension Optimizations

The article details Alibaba’s low‑carbon CTR model slimming, showing how binary‑code hash embeddings compress massive feature volumes while the Adaptive‑Masked Twins‑based Layer dynamically reduces embedding dimensions, together cutting storage and compute, lowering collisions, and preserving accuracy for large‑scale search advertising.

CTREmbeddingfeature volume
0 likes · 11 min read
Low‑Carbon Model Compression for Alibaba Mama Search Advertising CTR: Feature Volume and Embedding Dimension Optimizations
Aotu Lab
Aotu Lab
Sep 30, 2021 · Artificial Intelligence

Bringing AI to the Browser: Edge Intelligence, Frameworks & Model Compression

This article explains how AI is extending into front‑end development, defines edge AI, outlines its application scenarios, discusses advantages and limitations, reviews web‑based inference frameworks and hardware acceleration, and details model compression techniques for deploying AI directly in browsers.

AITensorFlow.jsWeb
0 likes · 15 min read
Bringing AI to the Browser: Edge Intelligence, Frameworks & Model Compression
Ctrip Technology
Ctrip Technology
Sep 16, 2021 · Artificial Intelligence

Automated AI Model Optimization Platform for Travel Services

This article describes the design, automated workflow, functional modules, and performance results of a comprehensive AI model optimization platform built for Ctrip's travel business, covering operator libraries, graph optimization, model compression techniques such as distillation, quantization, pruning, and deployment integration.

AI OptimizationAutoMLInference Acceleration
0 likes · 16 min read
Automated AI Model Optimization Platform for Travel Services
DataFunTalk
DataFunTalk
Sep 14, 2021 · Artificial Intelligence

AI Model Deployment on Edge Devices: Adaptation, Optimization, and Continuous Iteration – Interview Insights

The article shares a programmer's interview experience at Baidu, discussing how to adapt AI algorithms for edge deployment, balance model performance and efficiency, apply model compression techniques, and continuously iterate models, while also promoting an upcoming AI deployment online course.

AI deploymentEdge Computingframework support
0 likes · 6 min read
AI Model Deployment on Edge Devices: Adaptation, Optimization, and Continuous Iteration – Interview Insights
DataFunTalk
DataFunTalk
Jun 14, 2021 · Artificial Intelligence

From Massive to Compact: Model Compression Strategies for Large‑Scale CTR Prediction in Alibaba Search Advertising

This article describes how Alibaba's search advertising team transformed trillion‑parameter CTR models into lightweight, high‑precision systems by compressing embedding layers through feature‑space reduction, dimension quantization, and multi‑hash techniques, while also introducing graph‑based pre‑training and dropout‑driven feature selection to maintain accuracy.

CTR predictionembedding reductionfeature selection
0 likes · 15 min read
From Massive to Compact: Model Compression Strategies for Large‑Scale CTR Prediction in Alibaba Search Advertising
DataFunSummit
DataFunSummit
Jun 5, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods

This article reviews BERT’s architecture, analyzes the storage and compute costs of each layer, and systematically presents compression methods—including quantization, pruning, knowledge distillation (Distilled BiLSTM and MobileBERT), and structure‑preserving techniques—aimed at enabling efficient deployment on resource‑constrained mobile devices.

BERTMobile Deploymentknowledge distillation
0 likes · 15 min read
Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods
DataFunTalk
DataFunTalk
Jun 3, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods

This article examines the internal structure of BERT and systematically presents various model‑compression strategies—including quantization, pruning, knowledge distillation, and structure‑preserving techniques—highlighting their impact on storage, computational cost, and inference speed for deployment on resource‑constrained mobile devices.

BERTMobile AIknowledge distillation
0 likes · 16 min read
Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods
Alimama Tech
Alimama Tech
Jun 2, 2021 · Artificial Intelligence

Model Compression and Feature Optimization for Large-Scale CTR Prediction in Advertising

Alibaba‑Mama’s advertising team shrank multi‑terabyte CTR models to just tens of gigabytes by applying row‑dimension embedding compression, multi‑hash embeddings, graph‑based relationship networks, PCF‑GNN pre‑training, and droprank feature selection, preserving accuracy while halving training time, doubling online QPS, and retiring hundreds of servers.

Large-scale MLembedding reductionfeature selection
0 likes · 14 min read
Model Compression and Feature Optimization for Large-Scale CTR Prediction in Advertising
Kuaishou Tech
Kuaishou Tech
May 27, 2021 · Artificial Intelligence

Kuaishou’s Award‑Winning AI Research Projects and Their Industry Impact

Kuaishou’s R&D team has earned top national science and AI awards for its video transcoding and adaptive visual perception projects, which have been open‑sourced, adopted by major cloud CDN providers, and produced notable model‑compression research published at ICLR 2021, illustrating strong industry‑academic collaboration and contribution to China’s technology goals.

AIAcademic PublishingIndustry collaboration
0 likes · 5 min read
Kuaishou’s Award‑Winning AI Research Projects and Their Industry Impact
DataFunTalk
DataFunTalk
Feb 3, 2021 · Artificial Intelligence

Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework

The lecture presented by Assistant Professor Yanzhi Wang introduces a compression‑compilation co‑design framework (CoCoPIE) that achieves real‑time deep‑learning inference on edge devices through novel pruning and quantization techniques, delivering up to 180× speedup without accuracy loss.

AIDeep LearningEdge Computing
0 likes · 5 min read
Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework
DataFunTalk
DataFunTalk
Jan 15, 2021 · Artificial Intelligence

Zhihu Search Text Relevance Evolution and BERT Knowledge Distillation Practices

This talk by Zhihu search algorithm engineer Shen Zhan details the evolution of text relevance models from TF‑IDF/BM25 to deep semantic matching and BERT, explains the challenges of deploying BERT at scale, and describes practical knowledge‑distillation techniques that improve both online latency and offline storage while maintaining search quality.

BERTknowledge distillationmachine learning
0 likes · 14 min read
Zhihu Search Text Relevance Evolution and BERT Knowledge Distillation Practices
Sohu Tech Products
Sohu Tech Products
Jan 6, 2021 · Artificial Intelligence

Overview of Main Model Compression and Acceleration Techniques: Structural Optimization, Pruning, Quantization, and Knowledge Distillation

This article reviews four mainstream model compression and acceleration methods—structural optimization, pruning, quantization, and knowledge distillation—explaining their principles, implementations, and performance, and presents practical examples such as DistillBERT, TinyBERT, and FastBERT with comparative results.

AIDeep Learningknowledge distillation
0 likes · 14 min read
Overview of Main Model Compression and Acceleration Techniques: Structural Optimization, Pruning, Quantization, and Knowledge Distillation
Amap Tech
Amap Tech
Dec 30, 2020 · Artificial Intelligence

LRC-BERT: Contrastive Learning based Knowledge Distillation with COS‑NCE Loss for Efficient NLP Models

The Amap team introduced LRC‑BERT, a contrastive‑learning‑based knowledge‑distillation framework that employs a novel COS‑NCE loss, gradient‑perturbation, and a two‑stage training schedule, enabling a 4‑layer student model to retain about 97 % of BERT‑Base accuracy while being 7.5× smaller and 9.6× faster, and it has already improved real‑world traffic‑event extraction performance.

BERTCOS-NCE lossNLP
0 likes · 16 min read
LRC-BERT: Contrastive Learning based Knowledge Distillation with COS‑NCE Loss for Efficient NLP Models
Suning Technology
Suning Technology
Oct 29, 2020 · Artificial Intelligence

Accelerating Deep Learning for Retail: Model Compression, Speed & Energy

This lecture outlines the key challenges of deep learning in retail—growing model size, speed, and energy consumption—and presents a comprehensive acceleration framework covering algorithmic optimizations like network design, pruning, and hardware acceleration, with practical examples such as MobileNet, model compression, and edge deployment.

Deep LearningHardware Optimizationmodel acceleration
0 likes · 15 min read
Accelerating Deep Learning for Retail: Model Compression, Speed & Energy
Didi Tech
Didi Tech
Oct 21, 2020 · Artificial Intelligence

Deep Model Compression Techniques for Intelligent Automotive Cockpits

The article reviews deep‑model compression methods—ADMM‑based structured pruning, low‑bit quantization, and teacher‑student knowledge distillation—and their automated AutoCompress workflow, demonstrating how these techniques shrink neural networks enough to run real‑time driver‑monitoring and other intelligent cockpit functions on resource‑limited automotive hardware while preserving accuracy.

ADMMDeep Learningedge AI
0 likes · 16 min read
Deep Model Compression Techniques for Intelligent Automotive Cockpits
Didi Tech
Didi Tech
Oct 16, 2020 · Artificial Intelligence

Mask Detection System and Visual AI Competition Achievements

Didi’s COVID‑19 mask‑detection system, built on a DFS‑based face detector and an attention‑enhanced ResNet‑50 mask classifier achieving over 99.5 % accuracy, has been deployed in vehicles, open‑sourced, and complemented by top‑ranked results in international visual AI contests, including first place in driver‑gaze prediction and podium finishes in emotion recognition and model‑compression challenges.

AIComputer VisionDeep Learning
0 likes · 22 min read
Mask Detection System and Visual AI Competition Achievements
DataFunTalk
DataFunTalk
Sep 23, 2020 · Artificial Intelligence

PaddleOCR: 2020’s Outstanding Open‑Source OCR Suite with a 3.5 MB Ultra‑Light Model

PaddleOCR, the 2020 breakthrough in open‑source OCR, offers ultra‑light 3.5 MB multilingual models, high F1‑score performance across diverse scenarios, easy installation via pip, comprehensive documentation, custom training support, and deployment options for both server and mobile platforms, all backed by detailed benchmarks and code examples.

OCRPaddleOCRPython
0 likes · 8 min read
PaddleOCR: 2020’s Outstanding Open‑Source OCR Suite with a 3.5 MB Ultra‑Light Model
Meituan Technology Team
Meituan Technology Team
Aug 6, 2020 · Artificial Intelligence

Meituan SIGIR2020 Workshop: MT‑BERT, KDD Cup Solutions, and Knowledge Graph Applications

At the SIGIR 2020 Meituan workshop, researchers unveiled MT‑BERT’s large‑scale pre‑training and compression techniques, a KDD Cup winning solution that tackles bias with graph‑ and multimodal learning for search advertising, and a massive food‑delivery knowledge graph powering personalized recommendations, all demonstrating significant real‑world performance gains.

Multimodal Learningmodel compressionpretrained language models
0 likes · 18 min read
Meituan SIGIR2020 Workshop: MT‑BERT, KDD Cup Solutions, and Knowledge Graph Applications
Didi Tech
Didi Tech
Aug 5, 2020 · Artificial Intelligence

DiDi IFX AI Inference Platform: Architecture, Performance, and Productization

DiDi’s IFX AI inference platform, built since 2018, uses a four‑layer architecture spanning access, software, engine, and compute to deliver cloud, edge, and device inference with high‑performance kernel optimizations, model and binary compression, uniform multi‑framework deployment, automated testing, and end‑to‑end security for billions of daily calls.

AI inferenceEdge ComputingPerformance Optimization
0 likes · 9 min read
DiDi IFX AI Inference Platform: Architecture, Performance, and Productization
Ctrip Technology
Ctrip Technology
Jul 23, 2020 · Artificial Intelligence

Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions

This article examines the challenges of deep learning inference, outlines general optimization methodologies—including system-level and model-level techniques—presents practical case studies such as Transformer translation model improvements, and discusses future trends in automated compilation and performance tuning for AI services.

AI inferenceDeep LearningPerformance Optimization
0 likes · 15 min read
Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions
AntTech
AntTech
Jun 9, 2020 · Artificial Intelligence

Deep Learning Model Compression and Acceleration Techniques for Mobile AI

This article reviews the motivations, challenges, and a comprehensive set of algorithmic, framework, and hardware methods—including structural optimization, quantization, pruning, and knowledge distillation—to compress and accelerate deep learning models for deployment on mobile devices, highlighting benefits such as reduced server load, lower latency, improved reliability, and enhanced privacy.

Mobile AIknowledge distillationmodel compression
0 likes · 17 min read
Deep Learning Model Compression and Acceleration Techniques for Mobile AI
DataFunTalk
DataFunTalk
May 26, 2020 · Artificial Intelligence

Knowledge Distillation Techniques for Recommendation Systems: Methods, Scenarios, and Practical Insights

This article reviews how knowledge distillation—using a large teacher model to guide a smaller student model—can be applied across the recall, coarse‑ranking, and fine‑ranking stages of recommendation systems, detailing logits‑based and feature‑based approaches, joint and two‑stage training, and point‑wise, pair‑wise, and list‑wise loss designs.

Recommendation Systemsknowledge distillationmachine learning
0 likes · 31 min read
Knowledge Distillation Techniques for Recommendation Systems: Methods, Scenarios, and Practical Insights
DataFunTalk
DataFunTalk
Apr 16, 2020 · Artificial Intelligence

Comprehensive Survey of Pre-trained Models for Natural Language Processing

This article provides a detailed survey of pre‑trained models (PTMs) for natural language processing, classifying them into shallow embeddings and contextual encoders, discussing training paradigms such as knowledge integration and model compression, and offering guidance on transfer learning and future challenges.

knowledge integrationmodel compressionnatural language processing
0 likes · 25 min read
Comprehensive Survey of Pre-trained Models for Natural Language Processing
DataFunTalk
DataFunTalk
Nov 15, 2019 · Artificial Intelligence

MT-BERT: Domain‑Adapted BERT Pre‑training and Fine‑tuning for Meituan‑Dianping NLP Tasks

This article describes the development of MT‑BERT, a BERT‑based language model pre‑trained on Meituan‑Dianping business data, its distributed mixed‑precision training pipeline, domain adaptation, knowledge‑graph integration, model compression techniques, and the wide range of downstream NLP applications achieved in the platform.

BERTKnowledge GraphMeituan
0 likes · 31 min read
MT-BERT: Domain‑Adapted BERT Pre‑training and Fine‑tuning for Meituan‑Dianping NLP Tasks
Meituan Technology Team
Meituan Technology Team
Nov 14, 2019 · Artificial Intelligence

MT-BERT: Pre‑training and Fine‑tuning Practices at Meituan‑Dianping

MT‑BERT at Meituan‑Dianping combines mixed‑precision, domain‑adapted continual pre‑training, knowledge‑graph‑aware masking, and extensive compression techniques to produce fast, accurate BERT models that power fine‑grained sentiment analysis, intent classification, recommendation reasoning, and other NLP tasks across the platform.

BERTKnowledge GraphMT-BERT
0 likes · 33 min read
MT-BERT: Pre‑training and Fine‑tuning Practices at Meituan‑Dianping
Alibaba Cloud Developer
Alibaba Cloud Developer
May 21, 2019 · Artificial Intelligence

How Alibaba’s Offline AI Advances Model Compression and Edge Inference

Alibaba’s Machine Intelligence Lab shares two years of breakthroughs in offline AI, detailing low‑bit quantization, unified sparsity frameworks, hardware‑software co‑design, lightweight networks, and on‑device detection, alongside standardized training tools, multi‑platform inference engines, and productized edge solutions such as smart boxes and integrated cameras.

AIedge inferencehardware-software co-design
0 likes · 16 min read
How Alibaba’s Offline AI Advances Model Compression and Edge Inference
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 2, 2019 · Mobile Development

How xNN-OCR Brings High‑Precision, Real‑Time OCR to Mobile Devices

This article explains how the lightweight xNN-OCR engine achieves high accuracy and real‑time performance on mobile devices through deep‑learning model compression, novel detection and recognition techniques, and showcases its practical applications such as bank‑card, gas‑meter, license‑plate, and ID recognition.

Deep Learningedge AImobile OCR
0 likes · 12 min read
How xNN-OCR Brings High‑Precision, Real‑Time OCR to Mobile Devices
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 28, 2018 · Artificial Intelligence

Elastic Feature Scaling: Boosting Alibaba’s Online Recommendation CTR by 4%

This article describes how Ant Financial’s AI team redesigned TensorFlow to enable elastic feature scaling, introduced a Group‑Lasso optimizer and streaming frequency filtering, compressed models by 90%, and achieved significant CTR and efficiency gains in Alipay’s online recommendation system.

Online LearningRecommendation SystemsTensorFlow
0 likes · 20 min read
Elastic Feature Scaling: Boosting Alibaba’s Online Recommendation CTR by 4%
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 4, 2018 · Artificial Intelligence

Unlocking Elastic TensorFlow: Boosting Online Recommendation CTR by 30%

This article presents a comprehensive set of innovations—including elastic feature scaling, a Group Lasso optimizer, streaming frequency filtering, and graph‑cut model compression—that transform TensorFlow for large‑scale online learning, delivering significant CTR gains and up to 90% model size reduction in Alibaba's recommendation systems.

Online LearningRecommendation Systemsfeature engineering
0 likes · 19 min read
Unlocking Elastic TensorFlow: Boosting Online Recommendation CTR by 30%
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 9, 2018 · Artificial Intelligence

How Rocket Launching Boosts Online CTR Prediction Without Slowing Inference

Rocket Launching introduces a novel co‑training framework that jointly trains a lightweight network and a more powerful booster network, sharing parameters and using gradient‑blocking and hint loss to improve click‑through‑rate prediction accuracy while keeping online inference latency unchanged, validated on public datasets and Alibaba’s ad system.

CTR predictionco-traininggradient block
0 likes · 13 min read
How Rocket Launching Boosts Online CTR Prediction Without Slowing Inference
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 20, 2018 · Mobile Development

How to Supercharge Mobile Deep Learning: Model Compression & Engine Optimizations

This article explains how to overcome the performance, size, memory, and compatibility challenges of deploying deep‑learning inference engines on mobile devices by jointly optimizing model compression and engine implementation, covering speed tricks, cache‑friendly coding, multithreading, sparsity, quantization, NEON intrinsics, package size reduction, memory pooling, and reliability techniques.

Memory ManagementNEON SIMDmobile deep learning
0 likes · 22 min read
How to Supercharge Mobile Deep Learning: Model Compression & Engine Optimizations
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 15, 2018 · Mobile Development

How Alipay’s xNN Engine Brings Deep Learning to Mobile Apps

This article explains how Alipay’s xNN deep‑learning engine tackles the challenges of deploying AI on billions of mobile devices by using aggressive model compression, a lightweight SDK, and joint algorithm‑ and instruction‑level optimizations to achieve high accuracy, tiny package size, and real‑time performance.

AlipayDeep LearningMobile AI
0 likes · 10 min read
How Alipay’s xNN Engine Brings Deep Learning to Mobile Apps
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 28, 2017 · Artificial Intelligence

How Alipay’s xNN Brings Deep Learning to Millions of Mobile Devices

This article explains how Alipay’s xNN engine overcomes mobile deep‑learning challenges through aggressive model compression, lightweight SDK design, algorithm‑ and instruction‑level optimizations, enabling high‑accuracy AI inference on a wide range of Android and iOS devices with minimal app‑size impact.

AlipayDeep LearningInference Optimization
0 likes · 13 min read
How Alipay’s xNN Brings Deep Learning to Millions of Mobile Devices