Artificial Intelligence 12 min read

Kunlun Core AI Chips: Making Computing Smarter

The 2022 Beijing Zhiyuan Conference report by Kunlun Core’s chip R&D director outlines AI chip market opportunities and challenges, describes the company’s shift from FPGA clusters to a programmable XPU‑R architecture with 7nm, 256 TOPS INT8 performance, GDDR6 memory and PCIe 4.0, and details current deployments and plans for third‑ and fourth‑generation chips.

Baidu Tech Salon
Baidu Tech Salon
Baidu Tech Salon
Kunlun Core AI Chips: Making Computing Smarter

This article presents a technical report from the 2022 Beijing Zhiyuan Conference, delivered by Qi Wei, Chip R&D Director at Kunlun Core Technology. The presentation explores the opportunities and challenges in AI chip development, along with the company's chip architecture and product roadmap.

AI Chip Opportunities: The AI ecosystem is experiencing unprecedented prosperity. Breakthroughs in AI algorithms across speech, vision, and natural language processing continue to advance. The emergence of large-scale models like GPT-3, Baidu's ERNIE, and Zhiyuan's WuDao demonstrates the industry's rapid growth. AI is now transforming industries beyond traditional applications, with examples including autonomous driving and AlphaFold's protein structure prediction. While Moore's Law is slowing, GPU architectures have evolved to meet computational demands, with companies like Google developing custom AI chips.

AI Chip Challenges: Four major challenges face AI chip development: 1) Algorithm diversification - different business scenarios require various models and precision requirements, with algorithms continuously evolving; 2) Industry giant barriers - established players have built strong ecosystems with over a decade of experience and comprehensive framework adaptations; 3) Demanding customer requirements - clients care about latency, throughput, and TCO (Total Cost of Ownership), not just single metrics; 4) Complex real deployment environments - challenges include hardware stability at scale, cost considerations, and software stack adaptation across different frameworks and operating systems.

From Customization to Generalization: Kunlun Core's development progressed through two phases. From 2011-2017, they developed FPGA-based AI acceleration clusters achieving large-scale deployment. In 2017-2018, they transitioned to developing general-purpose AI processors. The company recognized that achieving mass production requires a flexible, programmable solution that can adapt to evolving business needs while minimizing software costs and user migration overhead.

Kunlun Core 2nd Generation Architecture: The XPU-R architecture features two core components: Cluster and SDNN. Cluster is a general-purpose computing unit with custom instruction sets supporting scalar and vector computations, allowing software programming similar to processor development. SDNN (Software Defined Neural Network) is an AI acceleration unit supporting convolution, matrix multiplication, and other high-frequency, high-computing-demand operators. The memory design includes large on-chip Shared Memory for data exchange between Cluster and SDNN, while off-chip Device Memory uses GDDR6 - making Kunlun Core 2nd generation China's first AI chip to apply GDDR6. The chip integrates PCIe 4.0 and supports inter-chip communication for training and large-scale inference scenarios. A self-developed scheduling system enables nanosecond-level scheduling to ensure high hardware utilization.

Product Specifications and Performance: The 2nd generation AI chip uses 7nm process technology with computing power of 256TOPS@INT8. It introduces features including hardware virtualization and integrated video encoding/decoding and image processing capabilities. Performance benchmarks show leading results across GEMM, BERT/ERNIE, YOLOv3, and ResNet-50 models.

Deployment and Future Plans: Kunlun Core 2nd generation has been deployed across multiple scenarios including internet applications, intelligent computing centers (such as collaboration with Zhiyuan Research Institute), and emerging fields like bio-computing. The third-generation AI chip is already in development, with fourth-generation products also being planned.

chip designhardware accelerationAI acceleratorAI chipGDDR6Kunlun CoreNeural Network ProcessorXPU-R Architecture
Baidu Tech Salon
Written by

Baidu Tech Salon

Baidu Tech Salon, organized by Baidu's Technology Management Department, is a monthly offline event that shares cutting‑edge tech trends from Baidu and the industry, providing a free platform for mid‑to‑senior engineers to exchange ideas.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.