How AI Model Inference Optimization Boosted Address Standardization Speed by 4×

By applying high‑performance operators, quantization, and AI compiler optimizations with Alibaba Cloud PAI Blade and Intel Xeon back‑ends, the address‑standardization service’s deep‑learning models achieved up to 4.11× faster end‑to‑end inference without sacrificing accuracy, enabling more complex models and lower latency.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How AI Model Inference Optimization Boosted Address Standardization Speed by 4×

Overview

Deep learning inference performance is critical for services such as address standardization. Optimizing inference can reduce response time, lower cost, and allow more complex models without degrading latency.

Inference Optimization Methodology

Natural language processing tasks such as RNN and BERT face performance challenges on x86 CPUs. The proposed solution combines high‑performance operators, model quantization, and AI compiler optimizations to accelerate inference.

Key Techniques

Model compression: quantization, sparsity, pruning.

High‑performance operators tailored to the model graph.

AI compiler optimizations: graph fusion, operator fusion, code generation.

Address Standardization Service

The service addresses non‑standard address data across many industries. Alibaba DAMO‑Lab provides an address purification service that standardizes address inputs, builds a unified address library, and offers high‑performance search, vector recall, and re‑ranking models.

RNN structure diagram
RNN structure diagram
BERT model structure diagram
BERT model structure diagram

Blade Optimization Platform

PAI‑Blade offers a unified interface for all the above optimizations, integrating high‑performance operators, Intel Custom Backend, and BladeDISC compiler to deliver end‑to‑end inference acceleration.

Blade optimization architecture
Blade optimization architecture

High‑Performance Operators on Intel Xeon

Optimizations for LSTM on Intel Xeon leverage AVX‑512 instructions, operator fusion, and cache‑aware scheduling. Input batching is performed with pack_padded_sequence() to handle variable‑length sequences efficiently.

Original LSTM input data
Original LSTM input data

Custom Backend Features

The Intel Custom Backend introduces a primitive cache to reuse compiled primitives, graph fusion to eliminate intermediate tensors, and memory optimizations that reduce runtime overhead.

Custom backend architecture
Custom backend architecture

Performance Evaluation

Two representative address‑search models were evaluated on an Alibaba ECS g7.large instance equipped with an Intel Xeon Platinum 8369B CPU.

ESIM (LSTM‑based) – LSTM‑A latency improved from 0.199 ms to 0.066 ms (+3.02×) and overall end‑to‑end latency dropped from 6.3 ms to 3.4 ms (+1.85×) while maintaining accuracy.

BERT – The 4‑layer INT8‑quantized model reduced latency from 37.0 ms to 9.0 ms (+4.11×). Macro F1 score increased from 77.24 to 78.85, demonstrating that quantization and compiler optimizations can improve both speed and accuracy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

model inferenceAI Optimizationaddress standardizationhigh-performance operators
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.