Tagged articles

model compression

146 articles · Page 2 of 2
DataFunTalk
DataFunTalk
Apr 14, 2022 · Artificial Intelligence

PaddlePaddle Deep Learning Platform: Architecture, Core Technologies, and Real‑World Applications

The article presents a comprehensive overview of Baidu's open‑source deep learning platform PaddlePaddle, detailing its full‑stack architecture, core technologies such as unified dynamic‑static graph, large‑scale distributed training, multi‑platform inference, an extensive model zoo, hardware adaptation, and showcases a real‑world deployment case in power‑grid monitoring.

AI FrameworkPaddlePaddledistributed training
0 likes · 15 min read
PaddlePaddle Deep Learning Platform: Architecture, Core Technologies, and Real‑World Applications
DataFunTalk
DataFunTalk
Apr 5, 2022 · Artificial Intelligence

Applying AI Technologies in the Youdao Dictionary Pen: Scanning, Offline Translation, and Edge ML Library

This article presents a technical overview of the Youdao Dictionary Pen, describing its hardware design, real‑time scanning and point‑query image processing, on‑device offline translation with model compression techniques, and the high‑performance Edge ML Library (EMLL) that enables efficient AI inference on constrained edge hardware.

AIEdge ML LibraryOCR
0 likes · 18 min read
Applying AI Technologies in the Youdao Dictionary Pen: Scanning, Offline Translation, and Edge ML Library
Baidu Geek Talk
Baidu Geek Talk
Apr 1, 2022 · Artificial Intelligence

How Paddle Lite & PaddleSlim Supercharge Edge AI Inference Performance

With the rapid rise of edge computing, deploying AI models for tasks like object detection, OCR, and speech recognition on resource‑constrained devices faces speed challenges; the upgraded Paddle Lite inference engine and PaddleSlim compression tools claim up to 23% faster inference and significant model size reductions, offering a practical solution.

AI DeploymentInference OptimizationPaddle-Lite
0 likes · 6 min read
How Paddle Lite & PaddleSlim Supercharge Edge AI Inference Performance
Tencent Cloud Developer
Tencent Cloud Developer
Mar 3, 2022 · Artificial Intelligence

Model Distillation for Query-Document Matching: Techniques and Optimizations

We applied knowledge distillation to a video query‑document BERT matcher, compressing the 12‑layer teacher into production‑ready 1‑layer ALBERT and tiny TextCNN students using combined soft, hard, and relevance losses plus AutoML‑tuned hyper‑parameters, achieving sub‑5 ms latency and up to 2.4% AUC improvement over the original model.

ALBERTAutoMLBERT
0 likes · 12 min read
Model Distillation for Query-Document Matching: Techniques and Optimizations
DataFunSummit
DataFunSummit
Jan 29, 2022 · Artificial Intelligence

Survey of Model Pruning and Quantization Techniques for Deep Learning

This article provides a comprehensive overview of recent advances in deep learning model compression, focusing on pruning methods—including unstructured, structured, filter-wise, channel-wise, shape-wise, and stripe-wise approaches—and quantization techniques such as linear, non‑linear, clustering, power‑of‑two, binary, and 8‑bit quantization, while discussing evaluation criteria, sparsity ratios, fine‑tuning, and training‑aware quantization.

Quantizationdeep learningmodel compression
0 likes · 23 min read
Survey of Model Pruning and Quantization Techniques for Deep Learning
Laiye Technology Team
Laiye Technology Team
Jan 28, 2022 · Artificial Intelligence

Survey of Model Compression and Quantization Techniques for Deep Neural Networks

This article provides a comprehensive overview of deep learning model compression and acceleration methods, detailing pruning strategies, various pruning types, evaluation criteria, sparsity ratios, fine‑tuning procedures, as well as linear and non‑linear quantization approaches, their implementations, and practical considerations.

EfficiencyQuantizationdeep learning
0 likes · 26 min read
Survey of Model Compression and Quantization Techniques for Deep Neural Networks
Code DAO
Code DAO
Jan 15, 2022 · Artificial Intelligence

Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance

This article shows how the compress‑fasttext Python library can shrink a 7 GB fastText word‑embedding model to about 21 MB—a 300‑fold reduction—while preserving almost the same accuracy on downstream NLP tasks, and explains the underlying compression techniques, usage examples, and evaluation results.

NLPcompress-fasttextfastText
0 likes · 9 min read
Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance
DataFunTalk
DataFunTalk
Dec 24, 2021 · Artificial Intelligence

Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This article reviews three consecutive works from Alibaba DAMO Academy on compressing and distilling large pretrained language models—AdaBERT, L2A, and Meta‑KD—detailing their motivations, neural‑architecture‑search‑based designs, loss formulations, experimental results, and insights from a Q&A session.

AINeural Architecture Searchknowledge distillation
0 likes · 10 min read
Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD
DataFunSummit
DataFunSummit
Dec 21, 2021 · Artificial Intelligence

Large‑Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This talk presents Alibaba DAMO Academy’s recent work on compressing large pretrained language models, covering task‑adaptive AdaBERT, data‑augmented L2A, and meta‑knowledge distillation Meta‑KD, describing their motivations, architectures, NAS‑based search, loss designs, and experimental results across multiple NLP tasks.

NLPNeural Architecture Searchknowledge distillation
0 likes · 13 min read
Large‑Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD
Alibaba Terminal Technology
Alibaba Terminal Technology
Dec 15, 2021 · Artificial Intelligence

Unlock Real-Time Mobile OCR: Inside Ant’s xNN-OCR Engine and Its Tiny, Fast AI

Ant’s self‑developed xNN‑OCR demonstrates how advanced OCR can run offline on smartphones by combining GAN‑based data synthesis, lightweight ShuffleNet‑inspired detection, NAS‑optimized recognition, and aggressive model compression, delivering near‑real‑time accuracy for diverse mobile scenarios while preserving privacy and low cost.

Data SynthesisNASedge AI
0 likes · 11 min read
Unlock Real-Time Mobile OCR: Inside Ant’s xNN-OCR Engine and Its Tiny, Fast AI
Alimama Tech
Alimama Tech
Nov 17, 2021 · Artificial Intelligence

Binary Code Based Hash Embedding for Efficient Deep Recommendation Models

Binary Code based Hash Embedding (BH) dramatically compresses deep recommendation model storage by converting feature IDs to binary codes and partitioning them into flexible blocks, yielding deterministic, collision‑free indices that achieve up to 1,000× size reduction while retaining about 99% of original accuracy, making it ideal for storage‑constrained deployments.

Embedding Storagebinary codedeep recommendation
0 likes · 13 min read
Binary Code Based Hash Embedding for Efficient Deep Recommendation Models
Alimama Tech
Alimama Tech
Nov 17, 2021 · Artificial Intelligence

Low‑Carbon Model Compression for Alibaba Mama Search Advertising CTR: Feature Volume and Embedding Dimension Optimizations

The article details Alibaba’s low‑carbon CTR model slimming, showing how binary‑code hash embeddings compress massive feature volumes while the Adaptive‑Masked Twins‑based Layer dynamically reduces embedding dimensions, together cutting storage and compute, lowering collisions, and preserving accuracy for large‑scale search advertising.

CTREmbeddingfeature volume
0 likes · 11 min read
Low‑Carbon Model Compression for Alibaba Mama Search Advertising CTR: Feature Volume and Embedding Dimension Optimizations
Aotu Lab
Aotu Lab
Sep 30, 2021 · Artificial Intelligence

Bringing AI to the Browser: Edge Intelligence, Frameworks & Model Compression

This article explains how AI is extending into front‑end development, defines edge AI, outlines its application scenarios, discusses advantages and limitations, reviews web‑based inference frameworks and hardware acceleration, and details model compression techniques for deploying AI directly in browsers.

AITensorFlow.jsedge AI
0 likes · 15 min read
Bringing AI to the Browser: Edge Intelligence, Frameworks & Model Compression
Ctrip Technology
Ctrip Technology
Sep 16, 2021 · Artificial Intelligence

Automated AI Model Optimization Platform for Travel Services

This article describes the design, automated workflow, functional modules, and performance results of a comprehensive AI model optimization platform built for Ctrip's travel business, covering operator libraries, graph optimization, model compression techniques such as distillation, quantization, pruning, and deployment integration.

AutoMLai-optimizationinference acceleration
0 likes · 16 min read
Automated AI Model Optimization Platform for Travel Services
DataFunTalk
DataFunTalk
Sep 14, 2021 · Artificial Intelligence

AI Model Deployment on Edge Devices: Adaptation, Optimization, and Continuous Iteration – Interview Insights

The article shares a programmer's interview experience at Baidu, discussing how to adapt AI algorithms for edge deployment, balance model performance and efficiency, apply model compression techniques, and continuously iterate models, while also promoting an upcoming AI deployment online course.

AI Deploymentedge computingframework support
0 likes · 6 min read
AI Model Deployment on Edge Devices: Adaptation, Optimization, and Continuous Iteration – Interview Insights
DataFunTalk
DataFunTalk
Jun 14, 2021 · Artificial Intelligence

From Massive to Compact: Model Compression Strategies for Large‑Scale CTR Prediction in Alibaba Search Advertising

This article describes how Alibaba's search advertising team transformed trillion‑parameter CTR models into lightweight, high‑precision systems by compressing embedding layers through feature‑space reduction, dimension quantization, and multi‑hash techniques, while also introducing graph‑based pre‑training and dropout‑driven feature selection to maintain accuracy.

CTR Predictionembedding reductionfeature selection
0 likes · 15 min read
From Massive to Compact: Model Compression Strategies for Large‑Scale CTR Prediction in Alibaba Search Advertising
DataFunSummit
DataFunSummit
Jun 5, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods

This article reviews BERT’s architecture, analyzes the storage and compute costs of each layer, and systematically presents compression methods—including quantization, pruning, knowledge distillation (Distilled BiLSTM and MobileBERT), and structure‑preserving techniques—aimed at enabling efficient deployment on resource‑constrained mobile devices.

BERTMobile DeploymentQuantization
0 likes · 15 min read
Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods
DataFunTalk
DataFunTalk
Jun 3, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods

This article examines the internal structure of BERT and systematically presents various model‑compression strategies—including quantization, pruning, knowledge distillation, and structure‑preserving techniques—highlighting their impact on storage, computational cost, and inference speed for deployment on resource‑constrained mobile devices.

BERTQuantizationknowledge distillation
0 likes · 16 min read
Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods
Alimama Tech
Alimama Tech
Jun 2, 2021 · Artificial Intelligence

Model Compression and Feature Optimization for Large-Scale CTR Prediction in Advertising

Alibaba‑Mama’s advertising team shrank multi‑terabyte CTR models to just tens of gigabytes by applying row‑dimension embedding compression, multi‑hash embeddings, graph‑based relationship networks, PCF‑GNN pre‑training, and droprank feature selection, preserving accuracy while halving training time, doubling online QPS, and retiring hundreds of servers.

Large-scale MLembedding reductionfeature selection
0 likes · 14 min read
Model Compression and Feature Optimization for Large-Scale CTR Prediction in Advertising
Kuaishou Tech
Kuaishou Tech
May 27, 2021 · Artificial Intelligence

Kuaishou’s Award‑Winning AI Research Projects and Their Industry Impact

Kuaishou’s R&D team has earned top national science and AI awards for its video transcoding and adaptive visual perception projects, which have been open‑sourced, adopted by major cloud CDN providers, and produced notable model‑compression research published at ICLR 2021, illustrating strong industry‑academic collaboration and contribution to China’s technology goals.

AIIndustry collaborationacademic publishing
0 likes · 5 min read
Kuaishou’s Award‑Winning AI Research Projects and Their Industry Impact
DataFunTalk
DataFunTalk
Feb 3, 2021 · Artificial Intelligence

Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework

The lecture presented by Assistant Professor Yanzhi Wang introduces a compression‑compilation co‑design framework (CoCoPIE) that achieves real‑time deep‑learning inference on edge devices through novel pruning and quantization techniques, delivering up to 180× speedup without accuracy loss.

AIdeep learningedge computing
0 likes · 5 min read
Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework
DataFunTalk
DataFunTalk
Jan 15, 2021 · Artificial Intelligence

Zhihu Search Text Relevance Evolution and BERT Knowledge Distillation Practices

This talk by Zhihu search algorithm engineer Shen Zhan details the evolution of text relevance models from TF‑IDF/BM25 to deep semantic matching and BERT, explains the challenges of deploying BERT at scale, and describes practical knowledge‑distillation techniques that improve both online latency and offline storage while maintaining search quality.

BERTSemantic Retrievalknowledge distillation
0 likes · 14 min read
Zhihu Search Text Relevance Evolution and BERT Knowledge Distillation Practices
Sohu Tech Products
Sohu Tech Products
Jan 6, 2021 · Artificial Intelligence

Overview of Main Model Compression and Acceleration Techniques: Structural Optimization, Pruning, Quantization, and Knowledge Distillation

This article reviews four mainstream model compression and acceleration methods—structural optimization, pruning, quantization, and knowledge distillation—explaining their principles, implementations, and performance, and presents practical examples such as DistillBERT, TinyBERT, and FastBERT with comparative results.

AIQuantizationdeep learning
0 likes · 14 min read
Overview of Main Model Compression and Acceleration Techniques: Structural Optimization, Pruning, Quantization, and Knowledge Distillation
Amap Tech
Amap Tech
Dec 30, 2020 · Artificial Intelligence

LRC-BERT: Contrastive Learning based Knowledge Distillation with COS‑NCE Loss for Efficient NLP Models

The Amap team introduced LRC‑BERT, a contrastive‑learning‑based knowledge‑distillation framework that employs a novel COS‑NCE loss, gradient‑perturbation, and a two‑stage training schedule, enabling a 4‑layer student model to retain about 97 % of BERT‑Base accuracy while being 7.5× smaller and 9.6× faster, and it has already improved real‑world traffic‑event extraction performance.

BERTCOS-NCE lossNLP
0 likes · 16 min read
LRC-BERT: Contrastive Learning based Knowledge Distillation with COS‑NCE Loss for Efficient NLP Models
Suning Technology
Suning Technology
Oct 29, 2020 · Artificial Intelligence

Accelerating Deep Learning for Retail: Model Compression, Speed & Energy

This lecture outlines the key challenges of deep learning in retail—growing model size, speed, and energy consumption—and presents a comprehensive acceleration framework covering algorithmic optimizations like network design, pruning, and hardware acceleration, with practical examples such as MobileNet, model compression, and edge deployment.

deep learninghardware optimizationmodel acceleration
0 likes · 15 min read
Accelerating Deep Learning for Retail: Model Compression, Speed & Energy
Didi Tech
Didi Tech
Oct 21, 2020 · Artificial Intelligence

Deep Model Compression Techniques for Intelligent Automotive Cockpits

The article reviews deep‑model compression methods—ADMM‑based structured pruning, low‑bit quantization, and teacher‑student knowledge distillation—and their automated AutoCompress workflow, demonstrating how these techniques shrink neural networks enough to run real‑time driver‑monitoring and other intelligent cockpit functions on resource‑limited automotive hardware while preserving accuracy.

ADMMQuantizationdeep learning
0 likes · 16 min read
Deep Model Compression Techniques for Intelligent Automotive Cockpits
Didi Tech
Didi Tech
Oct 16, 2020 · Artificial Intelligence

Mask Detection System and Visual AI Competition Achievements

Didi’s COVID‑19 mask‑detection system, built on a DFS‑based face detector and an attention‑enhanced ResNet‑50 mask classifier achieving over 99.5 % accuracy, has been deployed in vehicles, open‑sourced, and complemented by top‑ranked results in international visual AI contests, including first place in driver‑gaze prediction and podium finishes in emotion recognition and model‑compression challenges.

AIcomputer visiondeep learning
0 likes · 22 min read
Mask Detection System and Visual AI Competition Achievements
DataFunTalk
DataFunTalk
Sep 23, 2020 · Artificial Intelligence

PaddleOCR: 2020’s Outstanding Open‑Source OCR Suite with a 3.5 MB Ultra‑Light Model

PaddleOCR, the 2020 breakthrough in open‑source OCR, offers ultra‑light 3.5 MB multilingual models, high F1‑score performance across diverse scenarios, easy installation via pip, comprehensive documentation, custom training support, and deployment options for both server and mobile platforms, all backed by detailed benchmarks and code examples.

OCRPaddleOCRPython
0 likes · 8 min read
PaddleOCR: 2020’s Outstanding Open‑Source OCR Suite with a 3.5 MB Ultra‑Light Model
Meituan Technology Team
Meituan Technology Team
Aug 6, 2020 · Artificial Intelligence

Meituan SIGIR2020 Workshop: MT‑BERT, KDD Cup Solutions, and Knowledge Graph Applications

At the SIGIR 2020 Meituan workshop, researchers unveiled MT‑BERT’s large‑scale pre‑training and compression techniques, a KDD Cup winning solution that tackles bias with graph‑ and multimodal learning for search advertising, and a massive food‑delivery knowledge graph powering personalized recommendations, all demonstrating significant real‑world performance gains.

Multimodal Learningmodel compressionpretrained language models
0 likes · 18 min read
Meituan SIGIR2020 Workshop: MT‑BERT, KDD Cup Solutions, and Knowledge Graph Applications
Didi Tech
Didi Tech
Aug 5, 2020 · Artificial Intelligence

DiDi IFX AI Inference Platform: Architecture, Performance, and Productization

DiDi’s IFX AI inference platform, built since 2018, uses a four‑layer architecture spanning access, software, engine, and compute to deliver cloud, edge, and device inference with high‑performance kernel optimizations, model and binary compression, uniform multi‑framework deployment, automated testing, and end‑to‑end security for billions of daily calls.

AI inferencePerformance Optimizationedge computing
0 likes · 9 min read
DiDi IFX AI Inference Platform: Architecture, Performance, and Productization
Ctrip Technology
Ctrip Technology
Jul 23, 2020 · Artificial Intelligence

Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions

This article examines the challenges of deep learning inference, outlines general optimization methodologies—including system-level and model-level techniques—presents practical case studies such as Transformer translation model improvements, and discusses future trends in automated compilation and performance tuning for AI services.

AI inferencePerformance OptimizationTVM
0 likes · 15 min read
Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions
AntTech
AntTech
Jun 9, 2020 · Artificial Intelligence

Deep Learning Model Compression and Acceleration Techniques for Mobile AI

This article reviews the motivations, challenges, and a comprehensive set of algorithmic, framework, and hardware methods—including structural optimization, quantization, pruning, and knowledge distillation—to compress and accelerate deep learning models for deployment on mobile devices, highlighting benefits such as reduced server load, lower latency, improved reliability, and enhanced privacy.

Quantizationknowledge distillationmobile AI
0 likes · 17 min read
Deep Learning Model Compression and Acceleration Techniques for Mobile AI
DataFunTalk
DataFunTalk
May 26, 2020 · Artificial Intelligence

Knowledge Distillation Techniques for Recommendation Systems: Methods, Scenarios, and Practical Insights

This article reviews how knowledge distillation—using a large teacher model to guide a smaller student model—can be applied across the recall, coarse‑ranking, and fine‑ranking stages of recommendation systems, detailing logits‑based and feature‑based approaches, joint and two‑stage training, and point‑wise, pair‑wise, and list‑wise loss designs.

RankingRecommendation Systemsknowledge distillation
0 likes · 31 min read
Knowledge Distillation Techniques for Recommendation Systems: Methods, Scenarios, and Practical Insights
DataFunTalk
DataFunTalk
Apr 16, 2020 · Artificial Intelligence

Comprehensive Survey of Pre-trained Models for Natural Language Processing

This article provides a detailed survey of pre‑trained models (PTMs) for natural language processing, classifying them into shallow embeddings and contextual encoders, discussing training paradigms such as knowledge integration and model compression, and offering guidance on transfer learning and future challenges.

Knowledge IntegrationPretrained Modelsmodel compression
0 likes · 25 min read
Comprehensive Survey of Pre-trained Models for Natural Language Processing
DataFunTalk
DataFunTalk
Nov 15, 2019 · Artificial Intelligence

MT-BERT: Domain‑Adapted BERT Pre‑training and Fine‑tuning for Meituan‑Dianping NLP Tasks

This article describes the development of MT‑BERT, a BERT‑based language model pre‑trained on Meituan‑Dianping business data, its distributed mixed‑precision training pipeline, domain adaptation, knowledge‑graph integration, model compression techniques, and the wide range of downstream NLP applications achieved in the platform.

BERTDomain AdaptationKnowledge Graph
0 likes · 31 min read
MT-BERT: Domain‑Adapted BERT Pre‑training and Fine‑tuning for Meituan‑Dianping NLP Tasks
Meituan Technology Team
Meituan Technology Team
Nov 14, 2019 · Artificial Intelligence

MT-BERT: Pre‑training and Fine‑tuning Practices at Meituan‑Dianping

MT‑BERT at Meituan‑Dianping combines mixed‑precision, domain‑adapted continual pre‑training, knowledge‑graph‑aware masking, and extensive compression techniques to produce fast, accurate BERT models that power fine‑grained sentiment analysis, intent classification, recommendation reasoning, and other NLP tasks across the platform.

BERTKnowledge GraphMT-BERT
0 likes · 33 min read
MT-BERT: Pre‑training and Fine‑tuning Practices at Meituan‑Dianping
Alibaba Cloud Developer
Alibaba Cloud Developer
May 21, 2019 · Artificial Intelligence

How Alibaba’s Offline AI Advances Model Compression and Edge Inference

Alibaba’s Machine Intelligence Lab shares two years of breakthroughs in offline AI, detailing low‑bit quantization, unified sparsity frameworks, hardware‑software co‑design, lightweight networks, and on‑device detection, alongside standardized training tools, multi‑platform inference engines, and productized edge solutions such as smart boxes and integrated cameras.

AIQuantizationedge inference
0 likes · 16 min read
How Alibaba’s Offline AI Advances Model Compression and Edge Inference
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 2, 2019 · Mobile Development

How xNN-OCR Brings High‑Precision, Real‑Time OCR to Mobile Devices

This article explains how the lightweight xNN-OCR engine achieves high accuracy and real‑time performance on mobile devices through deep‑learning model compression, novel detection and recognition techniques, and showcases its practical applications such as bank‑card, gas‑meter, license‑plate, and ID recognition.

deep learningedge AImobile OCR
0 likes · 12 min read
How xNN-OCR Brings High‑Precision, Real‑Time OCR to Mobile Devices
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 28, 2018 · Artificial Intelligence

Elastic Feature Scaling: Boosting Alibaba’s Online Recommendation CTR by 4%

This article describes how Ant Financial’s AI team redesigned TensorFlow to enable elastic feature scaling, introduced a Group‑Lasso optimizer and streaming frequency filtering, compressed models by 90%, and achieved significant CTR and efficiency gains in Alipay’s online recommendation system.

Recommendation SystemsTensorFlowfeature scaling
0 likes · 20 min read
Elastic Feature Scaling: Boosting Alibaba’s Online Recommendation CTR by 4%
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 4, 2018 · Artificial Intelligence

Unlocking Elastic TensorFlow: Boosting Online Recommendation CTR by 30%

This article presents a comprehensive set of innovations—including elastic feature scaling, a Group Lasso optimizer, streaming frequency filtering, and graph‑cut model compression—that transform TensorFlow for large‑scale online learning, delivering significant CTR gains and up to 90% model size reduction in Alibaba's recommendation systems.

Recommendation Systemsfeature engineeringgroup lasso
0 likes · 19 min read
Unlocking Elastic TensorFlow: Boosting Online Recommendation CTR by 30%
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 9, 2018 · Artificial Intelligence

How Rocket Launching Boosts Online CTR Prediction Without Slowing Inference

Rocket Launching introduces a novel co‑training framework that jointly trains a lightweight network and a more powerful booster network, sharing parameters and using gradient‑blocking and hint loss to improve click‑through‑rate prediction accuracy while keeping online inference latency unchanged, validated on public datasets and Alibaba’s ad system.

CTR PredictionOnline Advertisingco-training
0 likes · 13 min read
How Rocket Launching Boosts Online CTR Prediction Without Slowing Inference
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 20, 2018 · Mobile Development

How to Supercharge Mobile Deep Learning: Model Compression & Engine Optimizations

This article explains how to overcome the performance, size, memory, and compatibility challenges of deploying deep‑learning inference engines on mobile devices by jointly optimizing model compression and engine implementation, covering speed tricks, cache‑friendly coding, multithreading, sparsity, quantization, NEON intrinsics, package size reduction, memory pooling, and reliability techniques.

Memory ManagementNEON SIMDmobile deep learning
0 likes · 22 min read
How to Supercharge Mobile Deep Learning: Model Compression & Engine Optimizations
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 15, 2018 · Mobile Development

How Alipay’s xNN Engine Brings Deep Learning to Mobile Apps

This article explains how Alipay’s xNN deep‑learning engine tackles the challenges of deploying AI on billions of mobile devices by using aggressive model compression, a lightweight SDK, and joint algorithm‑ and instruction‑level optimizations to achieve high accuracy, tiny package size, and real‑time performance.

Alipaydeep learningmobile AI
0 likes · 10 min read
How Alipay’s xNN Engine Brings Deep Learning to Mobile Apps
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 28, 2017 · Artificial Intelligence

How Alipay’s xNN Brings Deep Learning to Millions of Mobile Devices

This article explains how Alipay’s xNN engine overcomes mobile deep‑learning challenges through aggressive model compression, lightweight SDK design, algorithm‑ and instruction‑level optimizations, enabling high‑accuracy AI inference on a wide range of Android and iOS devices with minimal app‑size impact.

AlipayInference Optimizationdeep learning
0 likes · 13 min read
How Alipay’s xNN Brings Deep Learning to Millions of Mobile Devices