Big Data 21 min read

Tencent Alluxio: Accelerating the Next Generation of Big Data and AI

This article presents a comprehensive overview of Tencent's Alluxio project, covering the evolution of big‑data architecture, recent Alluxio research progress, typical deployment cases, and future work, while highlighting performance improvements, integration with cloud and AI workloads, and community contributions.

DataFunTalk
DataFunTalk
DataFunTalk
Tencent Alluxio: Accelerating the Next Generation of Big Data and AI

Introduction

Today's topic is Tencent Alluxio: accelerating the new generation of big data and AI transformation.

The article is organized into four main parts: big‑data architecture evolution, Alluxio R&D progress, typical case studies, and future work.

1. Big‑Data Architecture Evolution

1.1 The early days of the big‑data ecosystem and Alluxio

Alluxio originated in 2012 as Tachyon, an AMPlab project at Berkeley. At that time the big‑data ecosystem was closed, with isolated compute and storage solutions and few contributors.

1.2 The present big‑data ecosystem and Alluxio

Since 2013 Alluxio has become an open‑source project on GitHub with over 32,000 commits, more than 1,100 contributors from over 100 organizations, and was ranked 9th among influential Java open‑source projects in 2020.

Alluxio has evolved from a memory‑management tool to a data‑orchestration platform that can span private, public, and hybrid clouds, providing memory‑speed data access for analytics and AI/ML workloads.

1.3 Tencent and Alluxio

Tencent is a key Alluxio contributor with over 20 contributors, several PMC maintainers, and hundreds of merged pull requests. Future collaboration will focus on community releases and gray‑scale testing.

1.4 Outlook

Based on recent big‑data trends, further work will focus on data sharing, fine‑grained data permissions, and elastic cloud clusters.

2. Alluxio R&D Progress

2.1 Tencent contributions

JNIFUSE module: high‑performance POSIX API implementation for AI workloads, includes a FUSE shell for cache inspection and management.

Worker decommissioning: graceful removal of workers with data migration.

Integration with underlying storage systems such as Ozone, CephFS, and COSN.

Capacity‑aware worker selection strategy to balance load across heterogeneous disks.

2.2 Core improvements for 2022

Metadata service (Master) optimization: Raft‑based high‑availability, larger clusters, reduced memory usage.

Worker optimization: better storage efficiency and lower memory consumption.

Job service enhancements: more efficient distributed load, persist, and async‑write.

2.3 Application‑level optimizations

Deep integration with Kubernetes for easier deployment.

Improved support for Hudi and Iceberg metadata caching.

AI‑specific optimizations for massive small‑file datasets and FUSE memory usage.

OLAP (e.g., Presto) hot‑data estimation and monitoring.

2.4 Additional valuable contributions

JMX metrics collection compatible with HDFS tools.

Thread‑stack visibility for debugging.

Mount‑table inspection.

Worker pre‑registration to avoid stale block pollution.

IP‑based host configuration and audit logging for the Job service.

Ratis‑shell has been contributed to Apache Ratis, providing commands such as election , stepDown , pause , resume , group , and snapshot for cluster management.

3. Typical Alluxio Cases

3.1 Case 1 : An internal Tencent project similar to Presto on HDFS achieved a 2.44× performance boost, with median query improvement of 121%.

3.2 Case 2 : Supersql, a Tencent internal cross‑data‑source SQL engine, combined with Presto and Alluxio, achieved an average acceleration factor of 2.6 on TPC‑DS benchmarks.

3.3 Case 3 : Alluxio Local Cache deployed in Presto provides 3‑10× performance gains with simple configuration and multi‑disk support.

3.4 Case 4 : In a game‑AI scenario, Alluxio accelerated CephFS access, reducing FUSE failure rate from 2.8% to 0.73% and alleviating MDS pressure.

4. Future Work

Infrastructure: automated testing platform, conflict‑free code integration, deployment tooling.

Security: more authentication methods, proxy users, multi‑tenant support.

Ecosystem tools: continued development of ratis‑shell, intelligent pre‑warming, dynamic cache balancing.

Feature extensions: seamless hot‑cold data migration, traffic control, multi‑master federation.

5. Q&A

Q1: Can Alluxio be used on Tencent Cloud? Yes, this is an ongoing effort with multiple deployment modes.

Q2: How does Alluxio achieve seamless hot‑cold data migration? The solution is under design, leveraging AI‑driven data‑temperature inference and tiered storage placement.

For further reading, see the referenced materials and the upcoming Alluxio books.

Distributed SystemsBig DataAIAlluxioData Orchestration
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.