How WeChat’s TFCC Boosts Deep Learning Inference Performance Across Platforms

The TFCC framework, developed by WeChat's backend team, delivers high‑performance, easy‑to‑use, and universal deep‑learning inference by supporting numerous ONNX and TensorFlow operations, optimizing model structures, constants, and operators, and providing a versatile runtime and math library for both CPU and GPU platforms.

WeChat Backend Team
WeChat Backend Team
WeChat Backend Team
How WeChat’s TFCC Boosts Deep Learning Inference Performance Across Platforms

Introduction

WeChat TFCC is a server‑side deep‑learning inference framework developed by the WeChat Technical Architecture Backend team, open‑sourced together with Tencent’s Oteam Cloud‑Fan.

Key Features

TFCC offers high performance, ease of use, and universality. It supports 81 ONNX operations and 108 TensorFlow operations, covering recommendation, NLP, and speech models, and is widely used in WeChat Video, Official Accounts, User Portrait, and voice services.

Performance Optimizations

Through model‑structure optimization, constant tracking, and operator optimization, TFCC achieves strong QPS on Intel CPUs and NVIDIA GPUs (example with BERT‑base). Model‑structure optimization includes constant folding, pruning, and operator fusion. Constant tracking aggressively identifies local constants for further gains. The Math Library provides AVX2/AVX512‑accelerated CPU kernels and assembly‑level optimized operators.

Ease of Use

TFCC supplies a complete toolchain; a single command converts ONNX or TensorFlow models to TFCC format. The runtime loads TFCC models, uses JIT to generate execution graphs, and offers a simple validation program.

Universality

TFCC works with both model and hardware platforms. It converts ONNX/TensorFlow models to a unified IR and supports X86‑64 CPUs and NVIDIA GPUs, with platform‑specific assembly optimizations.

Architecture Overview

The solution consists of three components: Code Generator (converts models to C++ code or tfccrt description), Runtime (loads tfccrt files and provides inference APIs), and Math Library (hardware‑agnostic matrix‑operation APIs backed by TFCC Core, MKL, and CUDA libraries).

Code Generator

Implemented as a compiler with Frontend (ONNX/TensorFlow to IR), Optimizer (pruning, pre‑computation, duplicate removal, node fusion, equivalent replacement), and Backend (Runtime Backend or C++ Code Backend).

Runtime

Loads tfccrt model description, generates machine code on the fly, and executes inference with minimal overhead.

Math Library

Provides platform‑independent core and platform‑specific MKL/CUDA libraries, exposing constants, variables, views, devices, and sessions.

Conclusion

TFCC focuses on server‑side deep‑learning inference and will continue to add features and performance improvements.

References

oneDNN – https://github.com/oneapi-src/oneDNN

Xbyak – https://github.com/herumi/xbyak

CUDA – https://developer.nvidia.com/

MKL – https://software.intel.com/content/www/us/en/develop/tools/oneapi.html

ONNX – https://github.com/onnx/onnx

TensorFlow – https://www.tensorflow.org/

PyTorch – https://pytorch.org/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

deep learningTensorFlowframeworkInferenceONNXTFCC
WeChat Backend Team
Written by

WeChat Backend Team

Official account of the WeChat backend development team, sharing their experience in large-scale distributed system development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.