Tagged articles

model compression

146 articles · Page 2 of 2

Apr 14, 2022 · Artificial Intelligence

PaddlePaddle Deep Learning Platform: Architecture, Core Technologies, and Real‑World Applications

The article presents a comprehensive overview of Baidu's open‑source deep learning platform PaddlePaddle, detailing its full‑stack architecture, core technologies such as unified dynamic‑static graph, large‑scale distributed training, multi‑platform inference, an extensive model zoo, hardware adaptation, and showcases a real‑world deployment case in power‑grid monitoring.

AI FrameworkPaddlePaddledistributed training

0 likes · 15 min read

PaddlePaddle Deep Learning Platform: Architecture, Core Technologies, and Real‑World Applications

DataFunTalk

Apr 5, 2022 · Artificial Intelligence

Applying AI Technologies in the Youdao Dictionary Pen: Scanning, Offline Translation, and Edge ML Library

This article presents a technical overview of the Youdao Dictionary Pen, describing its hardware design, real‑time scanning and point‑query image processing, on‑device offline translation with model compression techniques, and the high‑performance Edge ML Library (EMLL) that enables efficient AI inference on constrained edge hardware.

AIEdge ML LibraryOCR

0 likes · 18 min read

Applying AI Technologies in the Youdao Dictionary Pen: Scanning, Offline Translation, and Edge ML Library

Baidu Geek Talk

Apr 1, 2022 · Artificial Intelligence

How Paddle Lite & PaddleSlim Supercharge Edge AI Inference Performance

With the rapid rise of edge computing, deploying AI models for tasks like object detection, OCR, and speech recognition on resource‑constrained devices faces speed challenges; the upgraded Paddle Lite inference engine and PaddleSlim compression tools claim up to 23% faster inference and significant model size reductions, offering a practical solution.

AI DeploymentInference OptimizationPaddle-Lite

0 likes · 6 min read

How Paddle Lite & PaddleSlim Supercharge Edge AI Inference Performance

Tencent Cloud Developer

Mar 3, 2022 · Artificial Intelligence

Model Distillation for Query-Document Matching: Techniques and Optimizations

We applied knowledge distillation to a video query‑document BERT matcher, compressing the 12‑layer teacher into production‑ready 1‑layer ALBERT and tiny TextCNN students using combined soft, hard, and relevance losses plus AutoML‑tuned hyper‑parameters, achieving sub‑5 ms latency and up to 2.4% AUC improvement over the original model.

ALBERTAutoMLBERT

0 likes · 12 min read

Model Distillation for Query-Document Matching: Techniques and Optimizations

DataFunSummit

Jan 29, 2022 · Artificial Intelligence

Survey of Model Pruning and Quantization Techniques for Deep Learning

This article provides a comprehensive overview of recent advances in deep learning model compression, focusing on pruning methods—including unstructured, structured, filter-wise, channel-wise, shape-wise, and stripe-wise approaches—and quantization techniques such as linear, non‑linear, clustering, power‑of‑two, binary, and 8‑bit quantization, while discussing evaluation criteria, sparsity ratios, fine‑tuning, and training‑aware quantization.

Quantizationdeep learningmodel compression

0 likes · 23 min read

Survey of Model Pruning and Quantization Techniques for Deep Learning

Laiye Technology Team

Jan 28, 2022 · Artificial Intelligence

Survey of Model Compression and Quantization Techniques for Deep Neural Networks

This article provides a comprehensive overview of deep learning model compression and acceleration methods, detailing pruning strategies, various pruning types, evaluation criteria, sparsity ratios, fine‑tuning procedures, as well as linear and non‑linear quantization approaches, their implementations, and practical considerations.

EfficiencyQuantizationdeep learning

0 likes · 26 min read

Survey of Model Compression and Quantization Techniques for Deep Neural Networks

Code DAO

Jan 15, 2022 · Artificial Intelligence

Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance

This article shows how the compress‑fasttext Python library can shrink a 7 GB fastText word‑embedding model to about 21 MB—a 300‑fold reduction—while preserving almost the same accuracy on downstream NLP tasks, and explains the underlying compression techniques, usage examples, and evaluation results.

NLPcompress-fasttextfastText

0 likes · 9 min read

Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance

DataFunTalk

Dec 24, 2021 · Artificial Intelligence

Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This article reviews three consecutive works from Alibaba DAMO Academy on compressing and distilling large pretrained language models—AdaBERT, L2A, and Meta‑KD—detailing their motivations, neural‑architecture‑search‑based designs, loss formulations, experimental results, and insights from a Q&A session.

AINeural Architecture Searchknowledge distillation

0 likes · 10 min read

Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

DataFunSummit

Dec 21, 2021 · Artificial Intelligence

Large‑Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This talk presents Alibaba DAMO Academy’s recent work on compressing large pretrained language models, covering task‑adaptive AdaBERT, data‑augmented L2A, and meta‑knowledge distillation Meta‑KD, describing their motivations, architectures, NAS‑based search, loss designs, and experimental results across multiple NLP tasks.

NLPNeural Architecture Searchknowledge distillation

0 likes · 13 min read

Alibaba Terminal Technology

Dec 15, 2021 · Artificial Intelligence

Unlock Real-Time Mobile OCR: Inside Ant’s xNN-OCR Engine and Its Tiny, Fast AI

Ant’s self‑developed xNN‑OCR demonstrates how advanced OCR can run offline on smartphones by combining GAN‑based data synthesis, lightweight ShuffleNet‑inspired detection, NAS‑optimized recognition, and aggressive model compression, delivering near‑real‑time accuracy for diverse mobile scenarios while preserving privacy and low cost.

Data SynthesisNASedge AI

0 likes · 11 min read

Unlock Real-Time Mobile OCR: Inside Ant’s xNN-OCR Engine and Its Tiny, Fast AI

Alimama Tech

Nov 17, 2021 · Artificial Intelligence

Binary Code Based Hash Embedding for Efficient Deep Recommendation Models

Binary Code based Hash Embedding (BH) dramatically compresses deep recommendation model storage by converting feature IDs to binary codes and partitioning them into flexible blocks, yielding deterministic, collision‑free indices that achieve up to 1,000× size reduction while retaining about 99% of original accuracy, making it ideal for storage‑constrained deployments.

Embedding Storagebinary codedeep recommendation

0 likes · 13 min read

Binary Code Based Hash Embedding for Efficient Deep Recommendation Models

Alimama Tech

Nov 17, 2021 · Artificial Intelligence

Low‑Carbon Model Compression for Alibaba Mama Search Advertising CTR: Feature Volume and Embedding Dimension Optimizations

The article details Alibaba’s low‑carbon CTR model slimming, showing how binary‑code hash embeddings compress massive feature volumes while the Adaptive‑Masked Twins‑based Layer dynamically reduces embedding dimensions, together cutting storage and compute, lowering collisions, and preserving accuracy for large‑scale search advertising.

CTREmbeddingfeature volume

0 likes · 11 min read

Low‑Carbon Model Compression for Alibaba Mama Search Advertising CTR: Feature Volume and Embedding Dimension Optimizations

Aotu Lab

Sep 30, 2021 · Artificial Intelligence

Bringing AI to the Browser: Edge Intelligence, Frameworks & Model Compression

This article explains how AI is extending into front‑end development, defines edge AI, outlines its application scenarios, discusses advantages and limitations, reviews web‑based inference frameworks and hardware acceleration, and details model compression techniques for deploying AI directly in browsers.

AITensorFlow.jsedge AI

0 likes · 15 min read

Bringing AI to the Browser: Edge Intelligence, Frameworks & Model Compression

Ctrip Technology

Sep 16, 2021 · Artificial Intelligence

Automated AI Model Optimization Platform for Travel Services

This article describes the design, automated workflow, functional modules, and performance results of a comprehensive AI model optimization platform built for Ctrip's travel business, covering operator libraries, graph optimization, model compression techniques such as distillation, quantization, pruning, and deployment integration.

AutoMLai-optimizationinference acceleration

0 likes · 16 min read

Automated AI Model Optimization Platform for Travel Services

DataFunTalk

Sep 14, 2021 · Artificial Intelligence

AI Model Deployment on Edge Devices: Adaptation, Optimization, and Continuous Iteration – Interview Insights

The article shares a programmer's interview experience at Baidu, discussing how to adapt AI algorithms for edge deployment, balance model performance and efficiency, apply model compression techniques, and continuously iterate models, while also promoting an upcoming AI deployment online course.

AI Deploymentedge computingframework support

0 likes · 6 min read

AI Model Deployment on Edge Devices: Adaptation, Optimization, and Continuous Iteration – Interview Insights

DataFunTalk

Jun 14, 2021 · Artificial Intelligence

From Massive to Compact: Model Compression Strategies for Large‑Scale CTR Prediction in Alibaba Search Advertising

This article describes how Alibaba's search advertising team transformed trillion‑parameter CTR models into lightweight, high‑precision systems by compressing embedding layers through feature‑space reduction, dimension quantization, and multi‑hash techniques, while also introducing graph‑based pre‑training and dropout‑driven feature selection to maintain accuracy.

CTR Predictionembedding reductionfeature selection

0 likes · 15 min read

From Massive to Compact: Model Compression Strategies for Large‑Scale CTR Prediction in Alibaba Search Advertising

DataFunSummit

Jun 5, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods

This article reviews BERT’s architecture, analyzes the storage and compute costs of each layer, and systematically presents compression methods—including quantization, pruning, knowledge distillation (Distilled BiLSTM and MobileBERT), and structure‑preserving techniques—aimed at enabling efficient deployment on resource‑constrained mobile devices.

BERTMobile DeploymentQuantization

0 likes · 15 min read

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods

DataFunTalk

Jun 3, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods

This article examines the internal structure of BERT and systematically presents various model‑compression strategies—including quantization, pruning, knowledge distillation, and structure‑preserving techniques—highlighting their impact on storage, computational cost, and inference speed for deployment on resource‑constrained mobile devices.

BERTQuantizationknowledge distillation

0 likes · 16 min read

Alimama Tech

Jun 2, 2021 · Artificial Intelligence

Model Compression and Feature Optimization for Large-Scale CTR Prediction in Advertising

Alibaba‑Mama’s advertising team shrank multi‑terabyte CTR models to just tens of gigabytes by applying row‑dimension embedding compression, multi‑hash embeddings, graph‑based relationship networks, PCF‑GNN pre‑training, and droprank feature selection, preserving accuracy while halving training time, doubling online QPS, and retiring hundreds of servers.

Large-scale MLembedding reductionfeature selection

0 likes · 14 min read

Model Compression and Feature Optimization for Large-Scale CTR Prediction in Advertising

Kuaishou Tech

May 27, 2021 · Artificial Intelligence

Kuaishou’s Award‑Winning AI Research Projects and Their Industry Impact

Kuaishou’s R&D team has earned top national science and AI awards for its video transcoding and adaptive visual perception projects, which have been open‑sourced, adopted by major cloud CDN providers, and produced notable model‑compression research published at ICLR 2021, illustrating strong industry‑academic collaboration and contribution to China’s technology goals.

AIIndustry collaborationacademic publishing

0 likes · 5 min read

Kuaishou’s Award‑Winning AI Research Projects and Their Industry Impact

DataFunTalk

Feb 3, 2021 · Artificial Intelligence

Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework

The lecture presented by Assistant Professor Yanzhi Wang introduces a compression‑compilation co‑design framework (CoCoPIE) that achieves real‑time deep‑learning inference on edge devices through novel pruning and quantization techniques, delivering up to 180× speedup without accuracy loss.

AIdeep learningedge computing

0 likes · 5 min read

Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework

DataFunTalk

Jan 15, 2021 · Artificial Intelligence

Zhihu Search Text Relevance Evolution and BERT Knowledge Distillation Practices

This talk by Zhihu search algorithm engineer Shen Zhan details the evolution of text relevance models from TF‑IDF/BM25 to deep semantic matching and BERT, explains the challenges of deploying BERT at scale, and describes practical knowledge‑distillation techniques that improve both online latency and offline storage while maintaining search quality.

BERTSemantic Retrievalknowledge distillation

0 likes · 14 min read

Zhihu Search Text Relevance Evolution and BERT Knowledge Distillation Practices

Sohu Tech Products

Jan 6, 2021 · Artificial Intelligence

Overview of Main Model Compression and Acceleration Techniques: Structural Optimization, Pruning, Quantization, and Knowledge Distillation

This article reviews four mainstream model compression and acceleration methods—structural optimization, pruning, quantization, and knowledge distillation—explaining their principles, implementations, and performance, and presents practical examples such as DistillBERT, TinyBERT, and FastBERT with comparative results.

AIQuantizationdeep learning

0 likes · 14 min read

Overview of Main Model Compression and Acceleration Techniques: Structural Optimization, Pruning, Quantization, and Knowledge Distillation

Amap Tech

Dec 30, 2020 · Artificial Intelligence

LRC-BERT: Contrastive Learning based Knowledge Distillation with COS‑NCE Loss for Efficient NLP Models

The Amap team introduced LRC‑BERT, a contrastive‑learning‑based knowledge‑distillation framework that employs a novel COS‑NCE loss, gradient‑perturbation, and a two‑stage training schedule, enabling a 4‑layer student model to retain about 97 % of BERT‑Base accuracy while being 7.5× smaller and 9.6× faster, and it has already improved real‑world traffic‑event extraction performance.

BERTCOS-NCE lossNLP

0 likes · 16 min read

LRC-BERT: Contrastive Learning based Knowledge Distillation with COS‑NCE Loss for Efficient NLP Models

Suning Technology

Oct 29, 2020 · Artificial Intelligence

Accelerating Deep Learning for Retail: Model Compression, Speed & Energy

This lecture outlines the key challenges of deep learning in retail—growing model size, speed, and energy consumption—and presents a comprehensive acceleration framework covering algorithmic optimizations like network design, pruning, and hardware acceleration, with practical examples such as MobileNet, model compression, and edge deployment.

deep learninghardware optimizationmodel acceleration

0 likes · 15 min read

Accelerating Deep Learning for Retail: Model Compression, Speed & Energy

Didi Tech

Oct 21, 2020 · Artificial Intelligence

Deep Model Compression Techniques for Intelligent Automotive Cockpits

The article reviews deep‑model compression methods—ADMM‑based structured pruning, low‑bit quantization, and teacher‑student knowledge distillation—and their automated AutoCompress workflow, demonstrating how these techniques shrink neural networks enough to run real‑time driver‑monitoring and other intelligent cockpit functions on resource‑limited automotive hardware while preserving accuracy.

ADMMQuantizationdeep learning

0 likes · 16 min read

Deep Model Compression Techniques for Intelligent Automotive Cockpits

Didi Tech

Oct 16, 2020 · Artificial Intelligence

Mask Detection System and Visual AI Competition Achievements

Didi’s COVID‑19 mask‑detection system, built on a DFS‑based face detector and an attention‑enhanced ResNet‑50 mask classifier achieving over 99.5 % accuracy, has been deployed in vehicles, open‑sourced, and complemented by top‑ranked results in international visual AI contests, including first place in driver‑gaze prediction and podium finishes in emotion recognition and model‑compression challenges.

AIcomputer visiondeep learning

0 likes · 22 min read

Mask Detection System and Visual AI Competition Achievements

DataFunTalk

Sep 23, 2020 · Artificial Intelligence

PaddleOCR: 2020’s Outstanding Open‑Source OCR Suite with a 3.5 MB Ultra‑Light Model

PaddleOCR, the 2020 breakthrough in open‑source OCR, offers ultra‑light 3.5 MB multilingual models, high F1‑score performance across diverse scenarios, easy installation via pip, comprehensive documentation, custom training support, and deployment options for both server and mobile platforms, all backed by detailed benchmarks and code examples.

OCRPaddleOCRPython

0 likes · 8 min read

PaddleOCR: 2020’s Outstanding Open‑Source OCR Suite with a 3.5 MB Ultra‑Light Model

Meituan Technology Team

Aug 6, 2020 · Artificial Intelligence

Meituan SIGIR2020 Workshop: MT‑BERT, KDD Cup Solutions, and Knowledge Graph Applications

At the SIGIR 2020 Meituan workshop, researchers unveiled MT‑BERT’s large‑scale pre‑training and compression techniques, a KDD Cup winning solution that tackles bias with graph‑ and multimodal learning for search advertising, and a massive food‑delivery knowledge graph powering personalized recommendations, all demonstrating significant real‑world performance gains.

Multimodal Learningmodel compressionpretrained language models

0 likes · 18 min read

Meituan SIGIR2020 Workshop: MT‑BERT, KDD Cup Solutions, and Knowledge Graph Applications

Didi Tech

Aug 5, 2020 · Artificial Intelligence

DiDi IFX AI Inference Platform: Architecture, Performance, and Productization

DiDi’s IFX AI inference platform, built since 2018, uses a four‑layer architecture spanning access, software, engine, and compute to deliver cloud, edge, and device inference with high‑performance kernel optimizations, model and binary compression, uniform multi‑framework deployment, automated testing, and end‑to‑end security for billions of daily calls.

AI inferencePerformance Optimizationedge computing

0 likes · 9 min read

DiDi IFX AI Inference Platform: Architecture, Performance, and Productization

Ctrip Technology

Jul 23, 2020 · Artificial Intelligence

Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions

This article examines the challenges of deep learning inference, outlines general optimization methodologies—including system-level and model-level techniques—presents practical case studies such as Transformer translation model improvements, and discusses future trends in automated compilation and performance tuning for AI services.

AI inferencePerformance OptimizationTVM

0 likes · 15 min read

Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions

Alibaba Cloud Developer

Jul 16, 2020 · Artificial Intelligence

How BERT‑to‑TextCNN Knowledge Distillation Boosts Spam Opinion Detection

This article examines how large pretrained BERT models can be compressed via knowledge distillation into a lightweight TextCNN classifier for efficient garbage opinion detection, detailing traditional distillation methods, several practical schemes, experimental results, and the advantages of the approach.

BERTNLPTextCNN

0 likes · 9 min read

How BERT‑to‑TextCNN Knowledge Distillation Boosts Spam Opinion Detection

AntTech

Jun 9, 2020 · Artificial Intelligence

Deep Learning Model Compression and Acceleration Techniques for Mobile AI

This article reviews the motivations, challenges, and a comprehensive set of algorithmic, framework, and hardware methods—including structural optimization, quantization, pruning, and knowledge distillation—to compress and accelerate deep learning models for deployment on mobile devices, highlighting benefits such as reduced server load, lower latency, improved reliability, and enhanced privacy.

Quantizationknowledge distillationmobile AI

0 likes · 17 min read

Deep Learning Model Compression and Acceleration Techniques for Mobile AI

DataFunTalk

May 26, 2020 · Artificial Intelligence

Knowledge Distillation Techniques for Recommendation Systems: Methods, Scenarios, and Practical Insights

This article reviews how knowledge distillation—using a large teacher model to guide a smaller student model—can be applied across the recall, coarse‑ranking, and fine‑ranking stages of recommendation systems, detailing logits‑based and feature‑based approaches, joint and two‑stage training, and point‑wise, pair‑wise, and list‑wise loss designs.

RankingRecommendation Systemsknowledge distillation

0 likes · 31 min read

Knowledge Distillation Techniques for Recommendation Systems: Methods, Scenarios, and Practical Insights

DataFunTalk

Apr 16, 2020 · Artificial Intelligence

Comprehensive Survey of Pre-trained Models for Natural Language Processing

This article provides a detailed survey of pre‑trained models (PTMs) for natural language processing, classifying them into shallow embeddings and contextual encoders, discussing training paradigms such as knowledge integration and model compression, and offering guidance on transfer learning and future challenges.

Knowledge IntegrationPretrained Modelsmodel compression

0 likes · 25 min read

Comprehensive Survey of Pre-trained Models for Natural Language Processing

DataFunTalk

Nov 15, 2019 · Artificial Intelligence

MT-BERT: Domain‑Adapted BERT Pre‑training and Fine‑tuning for Meituan‑Dianping NLP Tasks

This article describes the development of MT‑BERT, a BERT‑based language model pre‑trained on Meituan‑Dianping business data, its distributed mixed‑precision training pipeline, domain adaptation, knowledge‑graph integration, model compression techniques, and the wide range of downstream NLP applications achieved in the platform.

BERTDomain AdaptationKnowledge Graph

0 likes · 31 min read

MT-BERT: Domain‑Adapted BERT Pre‑training and Fine‑tuning for Meituan‑Dianping NLP Tasks

Meituan Technology Team

Nov 14, 2019 · Artificial Intelligence

MT-BERT: Pre‑training and Fine‑tuning Practices at Meituan‑Dianping

MT‑BERT at Meituan‑Dianping combines mixed‑precision, domain‑adapted continual pre‑training, knowledge‑graph‑aware masking, and extensive compression techniques to produce fast, accurate BERT models that power fine‑grained sentiment analysis, intent classification, recommendation reasoning, and other NLP tasks across the platform.

BERTKnowledge GraphMT-BERT

0 likes · 33 min read

MT-BERT: Pre‑training and Fine‑tuning Practices at Meituan‑Dianping

Alibaba Cloud Developer

May 21, 2019 · Artificial Intelligence

How Alibaba’s Offline AI Advances Model Compression and Edge Inference

Alibaba’s Machine Intelligence Lab shares two years of breakthroughs in offline AI, detailing low‑bit quantization, unified sparsity frameworks, hardware‑software co‑design, lightweight networks, and on‑device detection, alongside standardized training tools, multi‑platform inference engines, and productized edge solutions such as smart boxes and integrated cameras.

AIQuantizationedge inference

0 likes · 16 min read

How Alibaba’s Offline AI Advances Model Compression and Edge Inference

Alibaba Cloud Developer

Apr 2, 2019 · Mobile Development

How xNN-OCR Brings High‑Precision, Real‑Time OCR to Mobile Devices

This article explains how the lightweight xNN-OCR engine achieves high accuracy and real‑time performance on mobile devices through deep‑learning model compression, novel detection and recognition techniques, and showcases its practical applications such as bank‑card, gas‑meter, license‑plate, and ID recognition.

deep learningedge AImobile OCR

0 likes · 12 min read

How xNN-OCR Brings High‑Precision, Real‑Time OCR to Mobile Devices

Alibaba Cloud Developer

Dec 28, 2018 · Artificial Intelligence

Elastic Feature Scaling: Boosting Alibaba’s Online Recommendation CTR by 4%

This article describes how Ant Financial’s AI team redesigned TensorFlow to enable elastic feature scaling, introduced a Group‑Lasso optimizer and streaming frequency filtering, compressed models by 90%, and achieved significant CTR and efficiency gains in Alipay’s online recommendation system.

Recommendation SystemsTensorFlowfeature scaling

0 likes · 20 min read

Elastic Feature Scaling: Boosting Alibaba’s Online Recommendation CTR by 4%

Alibaba Cloud Developer

Dec 4, 2018 · Artificial Intelligence

Unlocking Elastic TensorFlow: Boosting Online Recommendation CTR by 30%

This article presents a comprehensive set of innovations—including elastic feature scaling, a Group Lasso optimizer, streaming frequency filtering, and graph‑cut model compression—that transform TensorFlow for large‑scale online learning, delivering significant CTR gains and up to 90% model size reduction in Alibaba's recommendation systems.

Recommendation Systemsfeature engineeringgroup lasso

0 likes · 19 min read

Unlocking Elastic TensorFlow: Boosting Online Recommendation CTR by 30%

Alibaba Cloud Developer

Oct 23, 2018 · Artificial Intelligence

How DFSMN Cuts Speech Synthesis Model Size by 75% While Quadrupling Speed

This paper introduces a Deep Feedforward Sequential Memory Network (DFSMN) for statistical parametric speech synthesis that matches BLSTM quality with only a quarter of the model size and four times faster inference, making it ideal for memory‑constrained, real‑time IoT devices.

DFSMNIoT devicesReal-time inference

0 likes · 10 min read

How DFSMN Cuts Speech Synthesis Model Size by 75% While Quadrupling Speed

Alibaba Cloud Developer

Oct 9, 2018 · Artificial Intelligence

How Rocket Launching Boosts Online CTR Prediction Without Slowing Inference

Rocket Launching introduces a novel co‑training framework that jointly trains a lightweight network and a more powerful booster network, sharing parameters and using gradient‑blocking and hint loss to improve click‑through‑rate prediction accuracy while keeping online inference latency unchanged, validated on public datasets and Alibaba’s ad system.

CTR PredictionOnline Advertisingco-training

0 likes · 13 min read

How Rocket Launching Boosts Online CTR Prediction Without Slowing Inference

Alibaba Cloud Developer

Jun 20, 2018 · Mobile Development

How to Supercharge Mobile Deep Learning: Model Compression & Engine Optimizations

This article explains how to overcome the performance, size, memory, and compatibility challenges of deploying deep‑learning inference engines on mobile devices by jointly optimizing model compression and engine implementation, covering speed tricks, cache‑friendly coding, multithreading, sparsity, quantization, NEON intrinsics, package size reduction, memory pooling, and reliability techniques.

Memory ManagementNEON SIMDmobile deep learning

0 likes · 22 min read

How to Supercharge Mobile Deep Learning: Model Compression & Engine Optimizations

Alibaba Cloud Developer

Jun 15, 2018 · Mobile Development

How Alipay’s xNN Engine Brings Deep Learning to Mobile Apps

This article explains how Alipay’s xNN deep‑learning engine tackles the challenges of deploying AI on billions of mobile devices by using aggressive model compression, a lightweight SDK, and joint algorithm‑ and instruction‑level optimizations to achieve high accuracy, tiny package size, and real‑time performance.

Alipaydeep learningmobile AI

0 likes · 10 min read

How Alipay’s xNN Engine Brings Deep Learning to Mobile Apps

Alibaba Cloud Developer

Sep 28, 2017 · Artificial Intelligence

How Alipay’s xNN Brings Deep Learning to Millions of Mobile Devices

This article explains how Alipay’s xNN engine overcomes mobile deep‑learning challenges through aggressive model compression, lightweight SDK design, algorithm‑ and instruction‑level optimizations, enabling high‑accuracy AI inference on a wide range of Android and iOS devices with minimal app‑size impact.

AlipayInference Optimizationdeep learning

0 likes · 13 min read

How Alipay’s xNN Brings Deep Learning to Millions of Mobile Devices