Artificial Intelligence 14 min read

FPGA Acceleration for AI: Benefits, Comparisons, and Tencent Cloud Case Studies

Tencent Cloud’s FPGA service brings ASIC‑level performance with CPU‑like flexibility to AI workloads, cutting deployment time from months to minutes, delivering up to 30× CPU speed, 20‑100× latency reductions and 3‑5× throughput gains while lowering costs through pay‑as‑you‑go cloud access.

Tencent Cloud Developer

Jan 23, 2017

FPGA Acceleration for AI: Benefits, Comparisons, and Tencent Cloud Case Studies

Prospects

With the rapid rise of AI, industry enthusiasm for artificial intelligence is soaring, and AI is becoming a key trend for future development. Small and medium-sized enterprises (SMEs) also hope to hop on the AI train.

In the past, SMEs faced many difficulties when deploying FPGA: high hardware costs, low flexibility, large upfront procurement investment, opaque transaction prices, high operational costs for stable services, and the need for dedicated hardware and software engineers.

Furthermore, SMEs would encounter resource idle problems due to FPGA chip upgrades. Although FPGA IP provides hardware acceleration, its long development cycle, high investment, and risk deter many companies.

What is FPGA

Artificial intelligence consists of three elements: algorithms, computing, and data. The most prevalent AI algorithms are deep learning. The computing platforms include CPU, GPU, FPGA, and ASIC. With the explosion of user‑generated data from mobile internet services (e.g., QQ, WeChat), the amount of daily images reaches hundreds of millions. If we view data as a mineral, the computing platform is the excavator, and the efficiency of each platform becomes the benchmark.

General‑purpose processors (CPU) offer high flexibility and ease of use at low cost but lack efficiency for intensive workloads.

Application‑specific integrated circuits (ASIC) deliver high performance but are inflexible and expensive to produce.

Between CPU and ASIC, heterogeneous processors such as GPU and FPGA are widely used.

FPGA is a programmable logic device (PLD) that combines the performance advantage of ASICs with the reconfigurability of CPUs. In simple terms, an FPGA is a re‑configurable “general‑purpose integrated circuit”.

GPU sits between FPGA and CPU in flexibility. GPUs have many cores and excel at parallel computation, but their advantage diminishes when algorithms contain many branches or data dependencies.

Compared with GPUs, FPGA offers finer control granularity, higher flexibility, and better algorithm adaptability. By using flip‑flops (FF) for sequential logic and lookup tables (LUT) for combinational logic, FPGA can allocate most resources to computation or control as needed, providing superior efficiency for specific workloads.

FPGA (Field‑Programmable Gate Array) integrates a large number of basic gates and memory cells on a chip. Users can program the interconnections via configuration files, allowing the same chip to serve as an image codec one day and an audio codec the next. This re‑configurability reduces development risk and cost compared with ASICs.

Processor Comparison

Processor | Advantages | Disadvantages

CPU – High flexibility, low cost, reusable – Lower performance efficiency

ASIC – High performance – Low flexibility, difficult and costly to produce

GPU – Suited for parallel computation – Performance drops with many branches or data dependencies

FPGA – Combines ASIC performance with CPU flexibility, fine‑grained control – Performance not as strong as ASIC

Tencent Cloud’s First Domestic FPGA Service

On January 20, Tencent Cloud launched China’s first high‑performance heterogeneous computing infrastructure – the FPGA cloud server – making FPGA resources available to more enterprises via a cloud service model.

The breakthrough reduces FPGA deployment time from months to minutes, enables pay‑as‑you‑go usage, and dramatically lowers cost. The FPGA cloud can deliver more than 30× the performance of a general‑purpose CPU server. Tencent Cloud also opened the first third‑party FPGA IP marketplace, facilitating efficient IP trading.

For FPGA users, purchasing verified IP from the marketplace can save months of development time and reduce hardware investment through on‑demand billing.

For FPGA developers, Tencent Cloud’s FPGA framework boosts development efficiency, allowing focus on core functions while re‑using existing image‑processing or deep‑learning IP blocks.

FPGA Application Cases

Case 1 – Image Transcoding (JPEG → WEBP)

Background: Massive daily image uploads on QQ and WeChat require efficient transcoding. WEBP offers ~30% storage savings over JPEG but is ~10× more computationally intensive, making CPU‑based transcoding costly.

Result: FPGA reduced latency by 20×, achieved 6× higher throughput than CPU, and cut unit cost to one‑third of the CPU solution.

Performance comparison (CPU E3‑1230 V2 @3.3GHz vs. FPGA): Latency: 1170 ms (CPU) vs. 60 ms (FPGA) Throughput: 20 frames/s (CPU) vs. 133 frames/s (FPGA) Cost ratio: 1.00 (CPU) vs. 0.33 (FPGA)

Case 2 – DNN Acceleration for Search Advertising

Background: Search engines and recommendation systems use deep neural networks (DNN) that demand high compute. A 4‑layer DNN (17×200×20×1) required <5 ms latency for 4000 samples, which CPUs could not meet (120.55 ms).

Result: Using 50 % of FPGA resources, latency dropped to 1.2 ms (≈100× reduction), throughput reached 6000 samples/s (≈5× CPU), and cost fell to one‑fifth of the CPU solution.

Performance comparison (CPU E5‑2620 ×2 vs. FPGA): Latency: 120.55 ms (CPU) vs. 1.2 ms (FPGA) Throughput: 1200 samples/s (CPU) vs. 6000 samples/s (FPGA) Cost ratio: 1.00 (CPU) vs. 0.20 (FPGA)

Case 3 – CNN (AlexNet) Acceleration

Background: Convolutional neural networks (CNN) such as AlexNet are widely used for image classification. Accelerating CNN inference reduces latency and cost.

Result: FPGA achieved 4× the performance of a CPU server while costing only one‑third of the CPU solution.

Conclusion

AI’s current boom benefits from FPGA’s high‑density computation and low power consumption, leading to large‑scale deployments in online deep‑learning inference (advertising recommendation, image and speech recognition, etc.). Users often compare FPGA with GPU; GPUs excel at programmability and throughput, while FPGAs offer lower latency, lower power, and re‑configurability. Compared with ASICs, FPGAs provide flexibility and cost advantages.

With Tencent Cloud’s FPGA service, users can provision and deploy FPGA instances within minutes, program custom hardware accelerators, and iterate without redesigning hardware, allowing them to focus on business innovation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance comparison FPGA

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.