Tagged articles

Alluxio

59 articles · Page 1 of 1

Feb 2, 2026 · Artificial Intelligence

How Alluxio Boosts GPU Utilization to 99.57% for Embodied AI – Inside the MLPerf Success

This article explains how Alluxio’s distributed caching architecture tackles the massive, multimodal data challenges of embodied AI, delivers near‑zero‑millisecond access, achieves 99.57% GPU utilization in MLPerf Storage v2.0, and validates its value through real‑world enterprise deployments.

AI Data PlatformAlluxioData Infrastructure

0 likes · 21 min read

How Alluxio Boosts GPU Utilization to 99.57% for Embodied AI – Inside the MLPerf Success

DataFunTalk

Sep 3, 2025 · Artificial Intelligence

How Alluxio’s Distributed Cache Boosts AI Training to 99.57% GPU Utilization

Alluxio’s distributed caching dramatically accelerates AI training and checkpointing workloads, achieving up to 99.57% GPU utilization and linear scaling across clusters in the MLPerf Storage v2.0 benchmark, while using cost‑effective commodity hardware to eliminate I/O bottlenecks.

AI trainingAlluxioGPU Utilization

0 likes · 11 min read

How Alluxio’s Distributed Cache Boosts AI Training to 99.57% GPU Utilization

Bilibili Tech

Aug 12, 2025 · Artificial Intelligence

How Bilibili Scaled AI Model Training with Alluxio Cache Acceleration

This article details Bilibili's multi-layer storage architecture and Alluxio‑based cache acceleration for large‑scale AI model training, covering challenges of high‑throughput, low‑latency file access, metadata scalability, fault tolerance, and the engineering solutions that boosted I/O performance up to ten‑fold.

.aiAlluxioCaching

0 likes · 24 min read

How Bilibili Scaled AI Model Training with Alluxio Cache Acceleration

iQIYI Technical Product Team

Nov 21, 2024 · Big Data

Alluxio Integration and Optimization for Multi‑AZ Big Data Analytics at iQIYI

iQIYI integrates Alluxio with its QBFS multi‑AZ unified scheduling system, automatically caching hot tables, applying table‑level policies, page‑level storage and AZ‑aware worker selection, which together cut cross‑zone traffic, halve query latency, achieve up to 20× I/O speedup and a three‑fold overall performance boost.

AlluxioCache OptimizationData Lake

0 likes · 23 min read

Alluxio Integration and Optimization for Multi‑AZ Big Data Analytics at iQIYI

DataFunSummit

Jul 23, 2024 · Big Data

Multi-Cloud Unified Data Acceleration Layer at Xiaohongshu: Challenges, Alluxio Solution, and Performance Gains

This article presents Xiaohongshu's multi‑cloud unified data acceleration layer built with Alluxio, detailing the challenges of multi‑cloud architectures, the design goals, Alluxio's architecture and features, real‑world case studies in AI training and recommendation indexing, performance improvements, and future plans.

AI trainingAlluxioBig Data

0 likes · 22 min read

Multi-Cloud Unified Data Acceleration Layer at Xiaohongshu: Challenges, Alluxio Solution, and Performance Gains

DataFunSummit

May 24, 2024 · Big Data

Ctrip's Experience with Alluxio in Its Big Data Platform: Architecture, Transparent Access, Custom Authentication, CallerContext, and Dynamic Configuration

This article details how Ctrip, a leading travel company, leverages Alluxio as a distributed cache within its extensive big‑data infrastructure to improve data access speed, implement transparent storage access, support custom authentication and multi‑tenant features, enhance audit logging with CallerContext, and dynamically distribute client configurations via Kyuubi.

AlluxioBig DataCallerContext

0 likes · 14 min read

Ctrip's Experience with Alluxio in Its Big Data Platform: Architecture, Transparent Access, Custom Authentication, CallerContext, and Dynamic Configuration

DataFunTalk

May 21, 2024 · Big Data

Applying Alluxio to Autonomous Driving Model Training: Deployment, Performance, and Operational Insights

This article details how Alluxio was adopted to replace NAS in autonomous driving model training, describing the data closed‑loop workflow, the challenges of the previous system, Alluxio's architectural benefits, deployment strategies across single and multiple data centers, functional and performance testing, operational tuning, and the resulting cost and efficiency gains.

AlluxioDistributed storageModel Training

0 likes · 15 min read

Applying Alluxio to Autonomous Driving Model Training: Deployment, Performance, and Operational Insights

DataFunTalk

May 14, 2024 · Cloud Computing

Hybrid Cloud Architecture and AI Storage Evolution at Zhihu: From UnionStore to Alluxio

This article describes Zhihu's hybrid cloud architecture—including offline, online, and GPU data centers—its self‑built UnionStore cache, the performance and latency challenges faced during large‑scale AI model training, and the subsequent evaluation and migration to Alluxio community and enterprise editions to achieve higher throughput, stability, and lower operational overhead.

AI storageAlluxioBig Data

0 likes · 14 min read

Hybrid Cloud Architecture and AI Storage Evolution at Zhihu: From UnionStore to Alluxio

DataFunSummit

May 5, 2024 · Big Data

Alluxio in Lakehouse Architecture: Benefits, Challenges, and Real‑World Use Cases

This article explains how Alluxio enables a unified lake‑warehouse architecture by decoupling compute and storage, outlines its core capabilities, evaluates the cost‑saving and performance benefits, discusses the technical challenges, and presents several practical deployment scenarios in finance and AI workloads.

AlluxioBig DataData Orchestration

0 likes · 15 min read

Alluxio in Lakehouse Architecture: Benefits, Challenges, and Real‑World Use Cases

DataFunSummit

Mar 30, 2024 · Big Data

Alluxio in Data & AI Lakehouse: Architecture, Performance Optimizations, and Cloud Practices at OPPO

OPPO's data architects combined their self‑developed Shuttle service with Alluxio to double performance, halve system pressure, and double throughput, while building a unified Data & AI lakehouse that integrates structured and unstructured data, metadata management, real‑time ingestion, and cloud cost reductions.

.aiAlluxioBig Data

0 likes · 11 min read

Alluxio in Data & AI Lakehouse: Architecture, Performance Optimizations, and Cloud Practices at OPPO

DataFunTalk

Mar 3, 2024 · Big Data

Alluxio Local Cache for Presto on S3: Architecture, Implementation, and Performance Evaluation at NewsBreak

This article presents NewsBreak's practical deployment of Alluxio Local Cache with Presto on S3, detailing the system architecture, cache design considerations, implementation steps, performance metrics, and future optimization directions to reduce query latency and storage costs.

AlluxioBig DataCache

0 likes · 12 min read

Alluxio Local Cache for Presto on S3: Architecture, Implementation, and Performance Evaluation at NewsBreak

DataFunTalk

Feb 18, 2024 · Cloud Computing

Research on the Unified Storage Platform for the Supercomputing Internet

This article presents a comprehensive overview of the challenges, key technologies, and future applications of a unified storage platform built on Alluxio for China's national supercomputing internet, detailing its architecture, data flow strategies, deployment status, and industry use cases across multiple sectors.

AlluxioCloud ComputingData Flow

0 likes · 13 min read

Research on the Unified Storage Platform for the Supercomputing Internet

DataFunTalk

Feb 9, 2024 · Big Data

Alluxio’s Role in Lakehouse Architecture: Benefits, Challenges, and Real‑World Use Cases

This article explains how Alluxio enables lake‑warehouse integration by providing a data orchestration layer that caches data near compute, reduces storage‑compute separation costs, improves performance, and addresses challenges such as security, scalability, and multi‑cloud deployment, illustrated with several industry case studies.

.aiAlluxioBig Data

0 likes · 16 min read

Alluxio’s Role in Lakehouse Architecture: Benefits, Challenges, and Real‑World Use Cases

DataFunTalk

Feb 3, 2024 · Big Data

Alluxio: Introduction, Architecture, and Practical Experience for Big Data Construction

This article introduces Alluxio as an open‑source data orchestration layer, explains its architecture and core features such as unified namespace, caching strategies, and cloud‑native deployment, and shares practical experiences on using Alluxio to simplify data lakehouse construction, migration, and hot‑cold data separation in complex big‑data environments.

AlluxioBig DataCaching

0 likes · 13 min read

Alluxio: Introduction, Architecture, and Practical Experience for Big Data Construction

DataFunTalk

Jan 14, 2024 · Big Data

Optimizing Object Storage and Impala Engine in NetEase NDH: Performance Enhancements and Feature Additions

This presentation outlines NetEase's NDH big‑data platform, detailing its background, object‑storage upload and rename optimizations, Impala engine adaptations—including file‑handle caching, transparent URI handling, and getFileBlockLocations improvements—and a suite of operational enhancements such as dynamic proxy user configuration and audit‑log extensions.

AlluxioBig DataImpala

0 likes · 14 min read

Optimizing Object Storage and Impala Engine in NetEase NDH: Performance Enhancements and Feature Additions

DataFunTalk

Nov 26, 2023 · Big Data

Data Orchestration in Hybrid Storage Architectures with Alluxio

This article explains how Alluxio, an open‑source data orchestration system, improves data access efficiency in hybrid multi‑cloud and multi‑storage environments by providing caching, a unified namespace, interface translation, automated data management, and federation capabilities for modern big‑data workloads.

AlluxioCachingData Orchestration

0 likes · 18 min read

Data Orchestration in Hybrid Storage Architectures with Alluxio

DataFunSummit

Nov 18, 2023 · Artificial Intelligence

PyTorch Model Training Performance Tuning Guide with Alluxio

This guide explains how Ant Group uses Alluxio to overcome storage I/O, capacity, and latency challenges, delivering stability, performance, and scalability improvements for large‑scale PyTorch model training while reducing infrastructure costs and providing practical optimization techniques and code examples.

.aiAlluxioPerformance Tuning

0 likes · 4 min read

PyTorch Model Training Performance Tuning Guide with Alluxio

DataFunTalk

Nov 9, 2023 · Artificial Intelligence

Coeus: Bilibili's Cloud‑Native AI Platform and the PyTorch Training Performance Tuning Handbook

The article introduces Coeus, Bilibili's cloud‑native AI platform built on Kubernetes with Alluxio integration, explains how it solves major data and compute challenges, improves training performance, and promotes a free PyTorch performance‑tuning guide for engineers.

AI platformAlluxioKubernetes

0 likes · 4 min read

Coeus: Bilibili's Cloud‑Native AI Platform and the PyTorch Training Performance Tuning Handbook

DataFunTalk

Oct 27, 2023 · Big Data

PrestoDB vs Trino: Testing, Selection, Alluxio Acceleration, and Deployment Practices at Zhihu

This article details Zhihu's evaluation of PrestoDB and Trino, the integration of Alluxio for query acceleration, the architectural choices and deployment modes, extensive TPC‑DS and production performance tests, encountered challenges, and future optimization directions for their OLAP platform.

AlluxioCachingOLAP

0 likes · 28 min read

PrestoDB vs Trino: Testing, Selection, Alluxio Acceleration, and Deployment Practices at Zhihu

Programmer DD

Sep 15, 2023 · Big Data

How Alluxio Manages Massive Metadata: Inode, Block, MountTable, and Worker Insights

This article examines Alluxio's open-source distributed file system, detailing the core types of metadata—inode, block, mount table, and worker—along with the mechanisms for their storage, management, and optimization in both HEAP and ROCKS modes, and provides practical configuration guidance for scaling large-scale data environments.

AlluxioBig DataDistributed File System

0 likes · 15 min read

How Alluxio Manages Massive Metadata: Inode, Block, MountTable, and Worker Insights

DataFunTalk

Sep 9, 2023 · Big Data

Presto + Tencent DOP (Alluxio) Architecture and Optimization Practices for Financial OLAP

This article presents the practical implementation of Presto combined with Tencent DOP (Alluxio) in a financial OLAP scenario, detailing background and architectural evolution, the Presto‑Alluxio design, optimization techniques for caching, storage scalability, ORC handling, and performance results, followed by conclusions and future directions.

AlluxioBig DataOLAP

0 likes · 15 min read

Presto + Tencent DOP (Alluxio) Architecture and Optimization Practices for Financial OLAP

DataFunTalk

Aug 18, 2023 · Operations

Prometheus and Grafana Tutorial for Monitoring Alluxio: Introduction, Environment Setup, and Manual Tuning

This article introduces Prometheus and Grafana, guides readers through setting up a monitoring environment for Alluxio—including installing and configuring Prometheus Server, Grafana, and Alluxio data sources—and explains manual dashboard tuning and data export techniques.

AlluxioGrafanaManual Tuning

0 likes · 8 min read

Prometheus and Grafana Tutorial for Monitoring Alluxio: Introduction, Environment Setup, and Manual Tuning

DataFunTalk

Jun 25, 2023 · Big Data

Multi‑Cloud Cache Evolution at Zhihu: From Multi‑HDFS to UnionStore to Alluxio

This technical presentation details Zhihu's journey in multi‑cloud caching, covering the motivations for a multi‑cloud architecture, the design and limitations of the self‑built UnionStore component, and the adoption of Alluxio to achieve significant performance, stability, and cost improvements across model serving and training workloads.

AlluxioBig DataCaching

0 likes · 24 min read

Multi‑Cloud Cache Evolution at Zhihu: From Multi‑HDFS to UnionStore to Alluxio

DataFunTalk

Jun 16, 2023 · Cloud Native

Kubernetes Operator Deployment Challenges and Alluxio Operator Case Study

This article reviews the challenges of deploying applications on Kubernetes, introduces the operator concept as a mainstream solution, explains how to design and implement custom operators for services, and demonstrates these ideas with a detailed Alluxio Operator case study, including maturity levels and future enhancements.

AlluxioCloud NativeDeployment

0 likes · 17 min read

Kubernetes Operator Deployment Challenges and Alluxio Operator Case Study

DataFunTalk

Jun 5, 2023 · Cloud Computing

Comcast Hybrid Cloud Data Platform Case Study: Seamless and Secure Data Access with Alluxio

Comcast’s hybrid‑cloud data platform, built on Trino and Amazon S3, faced challenges such as fragmented data access, costly data copies, and latency, leading the DX team to adopt Alluxio as a unified, cache‑enabled, secure middle‑layer that bridges storage and compute.

AlluxioAmazon S3Data Platform

0 likes · 3 min read

Comcast Hybrid Cloud Data Platform Case Study: Seamless and Secure Data Access with Alluxio

DataFunTalk

May 25, 2023 · Artificial Intelligence

Optimizing Distributed Cache for Large-Scale Deep Learning Training with Alluxio and SiloD

This article examines the storage bottlenecks in large‑scale AI training, evaluates local‑disk and Alluxio‑based distributed caching strategies, proposes uniform cache eviction and replica‑aware global policies, and introduces the SiloD framework for coordinated compute‑storage scheduling to dramatically improve GPU utilization and overall cluster throughput.

AI trainingAlluxioCache Eviction

0 likes · 16 min read

Optimizing Distributed Cache for Large-Scale Deep Learning Training with Alluxio and SiloD

DataFunTalk

Feb 24, 2023 · Big Data

Presto and Alluxio Integration for Iceberg: Architecture, Best Practices, and Future Work

This article explains how Presto and Alluxio work together to query Iceberg tables, describes their architectures, deployment options, best‑practice recommendations such as using Iceberg native catalogs and local caches, and outlines future research directions for improving CPU usage and off‑heap caching.

AlluxioBig DataCache

0 likes · 14 min read

Presto and Alluxio Integration for Iceberg: Architecture, Best Practices, and Future Work

DataFunTalk

Feb 17, 2023 · Big Data

Tencent Alluxio (DOP) Deployment and Optimization in Financial Data Analytics

This article describes how Tencent's Alluxio-based Data Orchestration Platform (DOP) was applied to financial analytics, detailing the business background, challenges of large‑scale OLAP workloads, the Alluxio architecture and usage modes, performance results, and the series of optimizations and tuning performed to achieve significant speedups.

AlluxioBig DataData Orchestration

0 likes · 15 min read

Tencent Alluxio (DOP) Deployment and Optimization in Financial Data Analytics

DataFunTalk

Feb 15, 2023 · Big Data

Alluxio Deployment at Ant Group: Stability Building, Performance Optimization, and Scale‑up for Large‑Scale Model Training

This article summarizes how Ant Group introduced Alluxio to address storage I/O, capacity, and latency challenges in large‑scale model training, detailing stability improvements through worker‑register follower and master migration, performance gains via follower‑only reads, and horizontal scaling using metadata sharding and multi‑cluster deployment.

AlluxioBig DataModel Training

0 likes · 15 min read

Alluxio Deployment at Ant Group: Stability Building, Performance Optimization, and Scale‑up for Large‑Scale Model Training

DataFunTalk

Feb 12, 2023 · Big Data

Optimizing Bilibili Presto Cluster Query Performance with Alluxio and Local Cache

This article presents a comprehensive technical overview of Bilibili's Presto cluster architecture, the challenges of query performance on Hadoop, and the systematic optimizations—including Alluxio integration, local cache mechanisms, multi‑active coordinators, label‑based scheduling, and real‑time penalties—that together improve availability, stability, and latency for large‑scale analytics workloads.

AlluxioBig DataCache

0 likes · 23 min read

Optimizing Bilibili Presto Cluster Query Performance with Alluxio and Local Cache

DataFunTalk

Jan 19, 2023 · Big Data

Tencent Alluxio: Accelerating the Next Generation of Big Data and AI

This article presents a comprehensive overview of Tencent's Alluxio project, covering the evolution of big‑data architecture, recent Alluxio research progress, typical deployment cases, and future work, while highlighting performance improvements, integration with cloud and AI workloads, and community contributions.

.aiAlluxioBig Data

0 likes · 21 min read

Tencent Alluxio: Accelerating the Next Generation of Big Data and AI

DataFunTalk

Jan 18, 2023 · Big Data

Five Major Trends Shaping Big Data, AI, and Cloud Industries in 2023

The article forecasts five key trends for 2023—including cloud cost optimization, multi‑cloud freedom, rapid AI model adoption, expanding data‑sharing ecosystems, and the convergence of data warehouses and lakes—highlighting how they will reshape the big data, artificial intelligence, and cloud landscapes.

AlluxioKubernetesdata sharing

0 likes · 6 min read

Five Major Trends Shaping Big Data, AI, and Cloud Industries in 2023

DataFunSummit

Jan 1, 2023 · Big Data

Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans

The Shopee Data Infra talk details the current storage architecture, Presto‑based acceleration with Alluxio caching, service‑oriented storage solutions using Alluxio Fuse and S3 APIs, and outlines future enhancements for Spark/Hive integration and CSI/Fuse optimizations, providing a comprehensive view of large‑scale big data storage engineering.

AlluxioCache ManagerData Infrastructure

0 likes · 16 min read

Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans

DataFunTalk

Nov 19, 2022 · Big Data

Improving Bilibili Offline Cluster Performance with Presto and Alluxio

This technical presentation explains how Bilibili reduced database pressure and query latency in its production environment by integrating Presto with Alluxio, detailing the offline cluster architecture, challenges of compute‑storage separation, caching strategies, consistency mechanisms, performance gains, and future work.

AlluxioCachepresto

0 likes · 17 min read

Improving Bilibili Offline Cluster Performance with Presto and Alluxio

Past Memory Big Data

Nov 15, 2022 · Big Data

How Uber Accelerated Presto Queries with Alluxio Local Cache

Uber processes over 500,000 daily Presto queries across 20 clusters handling more than 50 PB of data, and by deploying Alluxio Local Cache on NVMe disks they raised cache‑hit rates from roughly 65% to over 90% while addressing real‑time partition updates, node churn, and cache‑size constraints.

AlluxioBig DataConsistent Hashing

0 likes · 15 min read

How Uber Accelerated Presto Queries with Alluxio Local Cache

DataFunTalk

Sep 4, 2022 · Big Data

Alluxio 2.8 New Features Overview

This article summarizes the Alluxio 2.8 release, detailing enhancements in API support, enterprise‑grade security features, and data‑movement capabilities, while also covering new encryption options, master‑proxy S3 token handling, OPA integration, and various performance and observability optimizations.

APIAlluxioData Orchestration

0 likes · 9 min read

DataFunTalk

Aug 31, 2022 · Big Data

Alluxio Data Orchestration and Cache Acceleration in China Unicom: Use Cases and Performance Gains

This article presents Zhang Ce's detailed overview of Alluxio's deployment at China Unicom, covering cache acceleration, compute‑storage separation, mixed‑load workloads, and lightweight analysis, and demonstrates how these strategies dramatically improve performance, scalability, and cost efficiency for big data processing.

AlluxioCache AccelerationData Orchestration

0 likes · 19 min read

Alluxio Data Orchestration and Cache Acceleration in China Unicom: Use Cases and Performance Gains

DataFunSummit

Aug 21, 2022 · Big Data

Alluxio Stress Testing Methods and Practices

This article explains the purpose, sources, and manifestations of pressure in Alluxio, describes its built‑in stress testing framework, outlines how to run and configure stress tools, and provides guidance on result calculation, reporting, common issues, and debugging for effective performance evaluation.

AlluxioBig DataDistributed storage

0 likes · 11 min read

Alluxio Stress Testing Methods and Practices

DataFunTalk

Aug 20, 2022 · Artificial Intelligence

Atlas Supercomputing Platform: Architecture, Alluxio‑Fluid Integration, and Performance Improvements for AI Workloads

The article presents CloudKnow's Atlas supercomputing platform, detailing its AI‑focused architecture, early storage and bandwidth challenges, the integration of Alluxio and Fluid for distributed caching, various business adaptations, and experimental results showing significant performance gains across speech denoising, image classification, large‑file processing, and speech recognition workloads.

.aiAlluxioFluid

0 likes · 16 min read

Atlas Supercomputing Platform: Architecture, Alluxio‑Fluid Integration, and Performance Improvements for AI Workloads

DataFunTalk

Aug 8, 2022 · Artificial Intelligence

Accelerating Cloud Deep Learning Training with Alluxio: Overview, Usage Levels, and POSIX API Development

This article explains how Alluxio, an open‑source data abstraction layer, can accelerate cloud‑based deep‑learning training by providing POSIX‑compatible caching, simplifying data source integration, and offering three usage levels—from basic read‑through caching to full data‑as‑a‑service abstraction—backed by real‑world case studies and performance results.

.aiAlluxioCloud Training

0 likes · 10 min read

Accelerating Cloud Deep Learning Training with Alluxio: Overview, Usage Levels, and POSIX API Development

DataFunTalk

Aug 1, 2022 · Big Data

Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices

This article details Bilibili's lakehouse implementation using Apache Iceberg and Alluxio, covering background challenges, architectural components, data organization techniques like Z‑order and bitmap indexes, performance benchmarks, and future optimization plans for large‑scale analytics.

AlluxioBitmap IndexIceberg

0 likes · 21 min read

Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices

DataFunTalk

Jun 25, 2022 · Big Data

Alluxio Metadata and Data Synchronization: Design, Implementation, and Optimization

This article provides a comprehensive overview of Alluxio's metadata and data synchronization mechanisms, covering its unified namespace, mounting strategies, consistency models, various write modes, read workflows, metadata sync techniques, performance optimizations, and recommended configurations for different deployment scenarios.

AlluxioData Consistencymetadata synchronization

0 likes · 26 min read

Alluxio Metadata and Data Synchronization: Design, Implementation, and Optimization

DataFunTalk

Jan 19, 2022 · Artificial Intelligence

Alluxio for AI and Machine Learning: Architecture, Optimizations, and Performance Evaluation

This article presents a comprehensive technical overview of Alluxio, covering its role as a distributed data orchestration layer for AI workloads, core features such as caching and unified namespace, performance challenges in large‑scale machine‑learning pipelines, and the extensive optimizations and testing performed at Tencent to achieve high throughput and scalability.

.aiAlluxioCephFS

0 likes · 23 min read

Alluxio for AI and Machine Learning: Architecture, Optimizations, and Performance Evaluation

Architecture Digest

Jul 25, 2021 · Big Data

Design and Architecture of Hera Data Service for Unified Data Access at Vipshop

The article details the background, architecture, core features, scheduling mechanisms, Lisp‑based query DSL, and Alluxio integration of Vipshop's self‑developed Hera data service, illustrating how it unifies multi‑engine data access, improves SLA, and accelerates large‑scale crowd computing tasks.

AlluxioBig DataData Service

0 likes · 21 min read

Design and Architecture of Hera Data Service for Unified Data Access at Vipshop

Big Data Technology Architecture

Jul 20, 2021 · Big Data

PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations

This article details 360's Threat Hunting platform built on Flink, covering its evolution, architecture, block‑index design, Hilbert‑curve data ordering, like‑pushdown, join optimizations, Alluxio caching, and future plans for BI and multi‑user concurrency, all aimed at efficient PB‑scale data querying.

AlluxioBlock IndexFlink

0 likes · 18 min read

PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations

Alibaba Cloud Native

Apr 2, 2021 · Cloud Native

How Fluid Turns Kubernetes into a High‑Performance Data Logistics System

This article explains how the open‑source Fluid project addresses the inefficiencies of data‑intensive AI and big‑data workloads in cloud‑native Kubernetes environments by introducing a data‑centric abstraction, dual orchestration mechanisms, and seamless integration with Alluxio to achieve faster, secure, and scalable data access.

AlluxioBig DataCloud Native

0 likes · 19 min read

How Fluid Turns Kubernetes into a High‑Performance Data Logistics System

Alibaba Cloud Native

Mar 5, 2021 · Artificial Intelligence

How Alluxio Supercharges Cloud Deep Learning: Benchmarks, Architecture, and Tuning

This article examines why accelerating cloud‑based deep learning is essential, presents benchmark results comparing GPU generations and distributed training, introduces Alluxio as a distributed memory‑level cache, details its architecture on Kubernetes, and offers concrete tuning strategies to overcome I/O bottlenecks and boost training performance.

.aiAlluxioDeep Learning

0 likes · 16 min read

How Alluxio Supercharges Cloud Deep Learning: Benchmarks, Architecture, and Tuning

Tencent Cloud Developer

Dec 30, 2020 · Big Data

How Alluxio Boosts Tencent Cloud EMR: Cutting Bandwidth by 50% and Accelerating IO‑Intensive Workloads

This article analyzes the challenges of traditional monolithic big‑data architectures, explains how Tencent Cloud EMR integrates Alluxio for compute‑storage separation, presents detailed performance benchmarks showing 20‑50% bandwidth reduction and 5‑40% query speedup, and outlines the specific tuning measures applied.

AlluxioBig DataCloud Computing

0 likes · 10 min read

How Alluxio Boosts Tencent Cloud EMR: Cutting Bandwidth by 50% and Accelerating IO‑Intensive Workloads

Alibaba Cloud Native

Nov 16, 2020 · Cloud Native

What’s New in Fluid 0.4? DataLoad, Small‑File Boost, HDFS Support & Multi‑Dataset Deployment

Fluid 0.4 introduces a DataLoad custom resource for declarative data pre‑warming, enhances support for massive small‑file datasets, adds HDFS‑compatible access for Spark and other big‑data frameworks, and enables mixed‑deployment of multiple datasets on a single node, all backed by significant performance gains.

.aiAlluxioBig Data

0 likes · 8 min read

What’s New in Fluid 0.4? DataLoad, Small‑File Boost, HDFS Support & Multi‑Dataset Deployment

Big Data Technology Architecture

Aug 15, 2020 · Big Data

Alluxio: Open‑Source Data Orchestration Platform – Overview, Benefits, Innovations, and Getting‑Started Resources

Alluxio is an open‑source, memory‑centric data orchestration layer that bridges compute frameworks such as Spark, Presto, and TensorFlow with diverse storage systems, offering high‑speed I/O, unified namespace, multi‑level caching, and easy deployment, while providing extensive documentation, download links, and community resources for rapid adoption.

AlluxioAnalyticsData Orchestration

0 likes · 7 min read

Alluxio: Open‑Source Data Orchestration Platform – Overview, Benefits, Innovations, and Getting‑Started Resources

Big Data Technology & Architecture

Jul 11, 2020 · Big Data

Alluxio Tiered Metadata Management and Asynchronous Cache Eviction Implementation

The article explains Alluxio's tiered metadata management architecture, describing how the system separates hot and cold metadata into cached and persisted layers, and details the custom asynchronous eviction thread and cache implementation that replace Guava cache for efficient large‑scale metadata handling.

AlluxioCacheDistributed storage

0 likes · 15 min read

Alluxio Tiered Metadata Management and Asynchronous Cache Eviction Implementation

Alibaba Cloud Native

May 12, 2020 · Artificial Intelligence

Boosting Cloud‑Native AI Training with Alluxio: Performance Tuning on Kubernetes

This article examines the challenges of large‑scale deep‑learning model training on Kubernetes, analyzes performance bottlenecks caused by Alluxio‑FUSE integration, and presents a series of configuration and system‑level optimizations that dramatically improve data‑access speed and overall training throughput.

AI trainingAlluxioCloud Native

0 likes · 22 min read

Boosting Cloud‑Native AI Training with Alluxio: Performance Tuning on Kubernetes

Architects' Tech Alliance

Jul 28, 2019 · Big Data

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

The article explains how Alluxio, a memory‑speed virtual distributed file system, acts as a virtual data lake to unify access to structured and unstructured big‑data across heterogeneous storage systems, offering on‑demand fast local access, intelligent caching, reduced storage costs, and enterprise‑grade security and fault tolerance.

AlluxioBig DataCaching

0 likes · 15 min read

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

Beike Product & Technology

Jan 10, 2019 · Big Data

Accelerating QueryEngine with Alluxio in Spark SQL: Architecture, Features, and Performance Evaluation

This article presents the integration of Alluxio as an in‑memory caching layer for QueryEngine's Spark SQL engine, detailing Alluxio's architecture, key features, deployment practice, performance testing methodology, results, and lessons learned for large‑scale ad‑hoc query acceleration.

AlluxioPerformanceSpark SQL

0 likes · 13 min read

Accelerating QueryEngine with Alluxio in Spark SQL: Architecture, Features, and Performance Evaluation

NetEase Game Operations Platform

Dec 5, 2018 · Big Data

Presto + Alluxio Architecture for Interactive Ad‑hoc Queries in NetEase Game Data Warehouse

This article describes how NetEase Games built a Presto‑based interactive ad‑hoc query platform backed by Alluxio caching to achieve sub‑10‑second query latency, outlines the architectural design, performance comparisons with other Hadoop‑based solutions, encountered issues, and future improvement plans.

AlluxioBig DataData Warehouse

0 likes · 10 min read

Presto + Alluxio Architecture for Interactive Ad‑hoc Queries in NetEase Game Data Warehouse

Architects' Tech Alliance

Nov 5, 2018 · Big Data

Alluxio as a Virtual Distributed File System for Data Lake Solutions

The article explains how Alluxio provides a virtual distributed file system that acts as a "virtual data lake," enabling unified, high‑performance access to structured and unstructured data across heterogeneous storage back‑ends while reducing storage costs through intelligent caching and eliminating the need for permanent data copies.

AlluxioBig DataCaching

0 likes · 16 min read

Alluxio as a Virtual Distributed File System for Data Lake Solutions

Suning Technology

May 11, 2018 · Big Data

How Suning Scaled HDFS with Alluxio: Multi‑Cluster Architecture and Performance Gains

This article details Suning's approach to overcoming HDFS Namenode performance bottlenecks by partitioning into multiple clusters, leveraging Alluxio's unified namespace, and presenting design decisions, implementation challenges, and performance test results that show significant throughput and latency improvements.

AlluxioDistributed storageHDFS

0 likes · 12 min read

How Suning Scaled HDFS with Alluxio: Multi‑Cluster Architecture and Performance Gains

Ctrip Technology

Feb 28, 2018 · Big Data

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

The article explains how Ctrip's big‑data platform introduced Alluxio to isolate real‑time Spark Streaming jobs from HDFS NameNode maintenance, reduce NameNode pressure, improve Spark SQL performance, and provide a unified storage layer across multiple HDFS clusters.

AlluxioBig DataData Lake

0 likes · 9 min read

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

Architects' Tech Alliance

Apr 4, 2017 · Big Data

Alluxio: Memory‑Centric Distributed File System for Big Data Storage and Compute

Alluxio, formerly Tachyon, is a memory‑centric distributed file system that unifies heterogeneous big‑data storage backends, optimizes small files, and provides a fast, unified data access layer between storage systems like S3 or HDFS and compute frameworks such as Spark or Hadoop.

AlluxioCompute FrameworksDistributed File System

0 likes · 7 min read

Alluxio: Memory‑Centric Distributed File System for Big Data Storage and Compute