Tagged articles

Distributed Computing

181 articles · Page 1 of 2

May 26, 2026 · Big Data

How MaxCompute Evolves into an AI‑Ready Data Platform: Architecture, Core Capabilities, and Real‑World Cases

The article details MaxCompute's transformation into a cloud‑native, AI‑centric data warehouse, covering multi‑modal storage, model management, heterogeneous CPU/GPU scheduling, SQL AI functions, the MaxFrame Python framework, and several production case studies that demonstrate performance gains of up to 50% and elastic resource scaling to 160 000 cores.

Data+AIDistributed ComputingLarge‑model preprocessing

0 likes · 13 min read

How MaxCompute Evolves into an AI‑Ready Data Platform: Architecture, Core Capabilities, and Real‑World Cases

DataFunTalk

May 25, 2026 · Big Data

MaxCompute’s AI‑Ready Evolution: Architecture, Features, and Real‑World Use Cases

This article examines how Alibaba Cloud’s MaxCompute platform has been transformed for AI workloads, detailing its multi‑layer architecture, multimodal data storage, SQL AI functions, the Python‑based MaxFrame framework, and real‑world deployments in large‑model preprocessing, autonomous driving, and multimodal image labeling.

AIBig DataDistributed Computing

0 likes · 12 min read

MaxCompute’s AI‑Ready Evolution: Architecture, Features, and Real‑World Use Cases

AI Engineering

May 17, 2026 · Artificial Intelligence

How OpenAI’s Updated Codex Turns Your Old PCs into a Personal AI Compute Fleet

OpenAI’s latest Codex update adds locked‑screen system control and lets any device with the client join a distributed network, enabling users to repurpose idle computers as a coordinated AI compute fleet while raising security and platform‑compatibility questions.

AI AgentsCodexDistributed Computing

0 likes · 5 min read

How OpenAI’s Updated Codex Turns Your Old PCs into a Personal AI Compute Fleet

DataFunSummit

Apr 27, 2026 · Big Data

How MaxCompute Evolves Big Data Platforms for AI: Architecture, Core Capabilities, and Real‑World Cases

The article details MaxCompute's AI‑driven evolution, covering its multilayer architecture, multimodal storage management, SQL AI functions, the Python‑based MaxFrame framework, and several industry case studies that demonstrate performance gains and flexible resource scheduling for large‑scale AI workloads.

Cloud Data WarehouseData+AIDistributed Computing

0 likes · 12 min read

How MaxCompute Evolves Big Data Platforms for AI: Architecture, Core Capabilities, and Real‑World Cases

Alibaba Cloud Big Data AI Platform

Apr 22, 2026 · Artificial Intelligence

How to Build an End‑to‑End Hand‑Video to VLA Data Pipeline on Alibaba Cloud PAI with Data‑Juicer

This article details a step‑by‑step, distributed pipeline built on Alibaba Cloud PAI using Data‑Juicer and Ray that transforms raw egocentric hand videos into LeRobot v2.0‑compatible Vision‑Language‑Action (VLA) training data, covering video splitting, frame extraction, camera calibration, 3D hand reconstruction, pose estimation, action captioning, and export, with code snippets, performance numbers, and references.

Data-JuicerDistributed ComputingEmbodied AI

0 likes · 29 min read

How to Build an End‑to‑End Hand‑Video to VLA Data Pipeline on Alibaba Cloud PAI with Data‑Juicer

Ctrip Technology

Apr 16, 2026 · Big Data

How Ray + DuckDB Cut 9B-Row Attribution Queries from 40s to 15s

When attribution analysis on over 900 million rows slowed to more than 40 seconds and threatened cluster stability, Ctrip's smart attribution team rebuilt the architecture with Ray and DuckDB, achieving sub‑15‑second query times, 160 % performance gain, and complete resource isolation.

Attribution AnalysisBig DataDistributed Computing

0 likes · 22 min read

How Ray + DuckDB Cut 9B-Row Attribution Queries from 40s to 15s

JakartaEE China Community

Apr 1, 2026 · Artificial Intelligence

Top Java AI Development Tools for 2025

This guide reviews eight leading AI development tools for Java in 2025, explaining how each library or framework—such as DJL, TensorFlow Java, Hugging Face, LangChain, Apache Kafka, Ray, Deeplearning4j, and Neo4j—enables Java developers to build, train, and deploy intelligent applications without switching languages.

AIDistributed ComputingJava

0 likes · 9 min read

DataFunSummit

Mar 3, 2026 · Backend Development

How Ant Group Supercharged AI Data Pipelines with Ray: Boosting Index Build Speed and Reliability

This article details Ant Group's use of the Ray distributed computing framework to accelerate massive data indexing, migrate a C++ engine to Ray, implement elastic resource scheduling, improve long‑tail task efficiency, and build a robust RAG operator system with comprehensive governance, achieving up to 2× speed gains and 99.9% success rates.

Backend DevelopmentDistributed ComputingRay

0 likes · 15 min read

How Ant Group Supercharged AI Data Pipelines with Ray: Boosting Index Build Speed and Reliability

Big Data Technology Tribe

Mar 2, 2026 · Big Data

How Ray Data’s LogicalOptimizer Transforms Plans for Faster Execution

This article explains Ray Data’s execution pipeline, detailing the LogicalOptimizer’s architecture, core abstractions, rule‑based optimization process, and both logical and physical rule sets, with concrete code examples and practical illustrations of each optimization technique.

Big DataDistributed ComputingLogical Optimizer

0 likes · 14 min read

How Ray Data’s LogicalOptimizer Transforms Plans for Faster Execution

DataFunSummit

Jan 18, 2026 · Big Data

How Ray Reinvents AI Data Pipelines for Massive Multimodal Inference

This article examines the shortcomings of traditional big‑data engines for AI workloads, presents a Ray‑based heterogeneous fusion architecture that unifies CPU/GPU scheduling, Python ecosystems, and streaming‑batch processing, and details fault‑tolerance, checkpointing, compute‑storage separation, resource‑utilization, scalability, and observability improvements that enable thousands of nodes and dramatically higher GPU efficiency.

Big DataCloud NativeDistributed Computing

0 likes · 31 min read

How Ray Reinvents AI Data Pipelines for Massive Multimodal Inference

Ray's Galactic Tech

Dec 23, 2025 · Backend Development

How Apache Ignite Powers Low‑Latency Real‑Time Bidding at Scale

This article explains how Apache Ignite's memory‑first architecture, distributed compute grid, and event‑driven streaming enable sub‑100 ms decision making, high throughput, and strong consistency for real‑time bidding platforms, with practical code examples, Spring Boot integration, monitoring tips, and security considerations.

Apache IgniteDistributed ComputingIn-Memory Data Grid

0 likes · 8 min read

How Apache Ignite Powers Low‑Latency Real‑Time Bidding at Scale

ByteDance Data Platform

Dec 23, 2025 · Artificial Intelligence

How Daft and Ray Supercharge Million‑Hour Video Processing for AI‑Powered Robotics

This article details a scalable, distributed pipeline that uses LAS AI Data Lake, Daft on Ray, and advanced video‑processing techniques—scene detection, splitting, frame sampling, filtering, and caption generation—to transform tens of millions of hours of robot‑captured video into high‑quality, searchable semantic data while dramatically boosting CPU and GPU utilization.

AI PipelineDaftDistributed Computing

0 likes · 21 min read

How Daft and Ray Supercharge Million‑Hour Video Processing for AI‑Powered Robotics

Data Party THU

Nov 21, 2025 · Artificial Intelligence

Unlocking 2025 Multi-Agent AI: Core Tech, Frameworks, and Emerging Trends

This article analyzes the technical foundations, development frameworks, real‑time inference optimizations, typical industry deployments, and future research directions of multi‑agent systems in 2025, highlighting protocols like FIPA‑ACL and MCP, tools such as LangGraph and ADP3.0, and edge‑computing breakthroughs.

AI ArchitectureDistributed ComputingModel Quantization

0 likes · 16 min read

Unlocking 2025 Multi-Agent AI: Core Tech, Frameworks, and Emerging Trends

Big Data Technology Tribe

Nov 21, 2025 · Fundamentals

Mastering Ray: Core Concepts of Tasks, Actors, and Objects for Distributed Computing

This guide explains Ray's fundamental building blocks—including Tasks, Actors, remote Objects, Placement Groups, and environment dependencies—showing how to define, schedule, and retrieve distributed workloads with code examples and command‑line utilities.

ActorsDistributed ComputingObject Store

0 likes · 8 min read

Mastering Ray: Core Concepts of Tasks, Actors, and Objects for Distributed Computing

Architects' Tech Alliance

Oct 27, 2025 · Artificial Intelligence

How AI Super Nodes Are Redefining Scalable AI Infrastructure

The article examines the emerging AI Super Node ecosystem, detailing its core concepts, four‑layer architecture, key enabling technologies, current challenges such as compatibility and energy consumption, and future directions like quantum‑classic hybrids and green low‑carbon designs, illustrating how it overcomes scaling bottlenecks in modern AI deployments.

AI InfrastructureDistributed ComputingSecure AI

0 likes · 13 min read

How AI Super Nodes Are Redefining Scalable AI Infrastructure

DataFunSummit

Sep 20, 2025 · Artificial Intelligence

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how WeChat’s Astra platform leverages the Ray distributed framework to manage million‑node AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost, and outlines the architecture that unifies Ray services across multiple Kubernetes clusters.

AI scalingAstra PlatformDistributed Computing

0 likes · 5 min read

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

DataFunSummit

Sep 18, 2025 · Artificial Intelligence

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how Tencent's WeChat team leveraged the Ray distributed computing framework within the Astra platform to tackle massive AI workloads, addressing challenges of scale, GPU diversity, operational complexity, and cost while outlining their architecture and practical insights.

AI InfrastructureAstra PlatformDistributed Computing

0 likes · 6 min read

DataFunSummit

Sep 13, 2025 · Artificial Intelligence

How Pinterest Scaled LLM Data Pipelines with Ray: Boosting Throughput and Cutting Costs

This article details how Pinterest’s senior staff engineer Dr. Luo leveraged the open‑source Ray framework to overcome LLM data‑preprocessing bottlenecks, describing the system’s architecture, key features such as map_batches, Carry‑Over Columns and Accumulators, and the dramatic performance and cost improvements achieved.

Data preprocessingDistributed ComputingLLM

0 likes · 12 min read

How Pinterest Scaled LLM Data Pipelines with Ray: Boosting Throughput and Cutting Costs

DataFunSummit

Sep 11, 2025 · Artificial Intelligence

How Ray Powers Massive AI Computing on WeChat: Lessons from Tencent

This article examines how Tencent leverages the Ray distributed framework within the Astra platform to handle WeChat's massive AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost while outlining the architecture and practical benefits.

AI scalingAstra PlatformDistributed Computing

0 likes · 5 min read

How Ray Powers Massive AI Computing on WeChat: Lessons from Tencent

DataFunSummit

Sep 3, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article explains how Ant Group’s Ragent framework leverages Ray to create scalable, multi‑tenant AI agents, detailing its background, motivation, and design while outlining the core modules—Profile, Memory, Planning, and Action—that power large‑language‑model agents.

Ant GroupDistributed ComputingRay

0 likes · 5 min read

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

ByteDance Data Platform

Sep 3, 2025 · Artificial Intelligence

Revolutionizing AI Data Lakes: How Daft + Lance Enable Multimodal Processing

This article explores how the LAS team's AI‑driven data lake solution, built on Daft for lake computing and Lance for lake storage, tackles the emerging challenges of multimodal data handling, offering faster I/O, heterogeneous CPU‑GPU scheduling, and seamless integration for AI workloads.

AIDaftDistributed Computing

0 likes · 11 min read

Revolutionizing AI Data Lakes: How Daft + Lance Enable Multimodal Processing

DataFunSummit

Aug 28, 2025 · Artificial Intelligence

How We Scaled AI Compute to Millions of Nodes with Ray on WeChat

This article explains how Tencent's WeChat team built the Astra platform on Ray to manage millions of AI compute nodes, addressing challenges of massive scale, heterogeneous GPU resources, low‑priority node instability, deployment complexity, and cost, while detailing architecture, scheduling strategies, and practical usage examples.

AI scalingDistributed ComputingRay

0 likes · 21 min read

How We Scaled AI Compute to Millions of Nodes with Ray on WeChat

Alibaba Cloud Big Data AI Platform

Aug 26, 2025 · Big Data

How MaxCompute Evolves for Python & AI: From SDK to Native Distributed Engine

This article outlines MaxCompute's decade‑long evolution—from the early PyODPS SDK to the native Distributed Python Engine—highlights the challenges big‑data platforms face in the AI era, and showcases Data+AI solutions and real‑world case studies across multimodal processing, massive text deduplication, and autonomous‑driving data pipelines.

AI FunctionsBig DataData+AI

0 likes · 15 min read

How MaxCompute Evolves for Python & AI: From SDK to Native Distributed Engine

Baobao Algorithm Notes

Aug 1, 2025 · Artificial Intelligence

Why Training Large Language Models Feels Like Alchemy—and How to Master It

This article breaks down the hardware bottlenecks of large‑scale LLM training, explains the Roofline performance model, arithmetic intensity, and how computation and communication costs interact on GPUs and TPUs, offering concrete formulas and examples for efficient scaling.

Arithmetic intensityDistributed ComputingGPU

0 likes · 12 min read

Why Training Large Language Models Feels Like Alchemy—and How to Master It

Ops Development & AI Practice

Jul 29, 2025 · Artificial Intelligence

How Ray Transforms Distributed Training for Large Language Models

In the era of data‑driven AI, Ray offers an open‑source unified compute framework that abstracts distributed system complexity, enabling developers to seamlessly scale Python code from a laptop to large GPU clusters, and provides the Ray AI Runtime (AIR) with libraries such as Ray Data, Train, Tune, and Serve to accelerate LLM training, hyper‑parameter tuning, and model serving.

AI RuntimeDistributed ComputingLLM training

0 likes · 10 min read

How Ray Transforms Distributed Training for Large Language Models

360 Zhihui Cloud Developer

Jul 22, 2025 · Big Data

How Apache SeaTunnel Revolutionizes Heterogeneous Data Integration with Decoupled Connectors

This article explores how Apache SeaTunnel addresses modern data integration challenges by providing a high‑performance, distributed, plugin‑based platform that decouples connectors from execution engines, enabling seamless batch and streaming synchronization across heterogeneous sources such as databases, message queues, and data lakes.

Apache SeaTunnelBatch ProcessingConnector Architecture

0 likes · 24 min read

How Apache SeaTunnel Revolutionizes Heterogeneous Data Integration with Decoupled Connectors

DataFunSummit

Jun 20, 2025 · Artificial Intelligence

EasyRec Deep Dive: Training & Inference Architecture, Optimizations, and Online Learning

This article explains EasyRec's end‑to‑end recommendation system, covering its training‑inference architecture, a series of CPU/GPU and distributed optimizations, and a real‑time online‑learning pipeline that together improve throughput, latency, and model freshness.

AI InfrastructureDistributed ComputingInference Optimization

0 likes · 15 min read

EasyRec Deep Dive: Training & Inference Architecture, Optimizations, and Online Learning

Alibaba Cloud Developer

Jun 10, 2025 · Big Data

How Ray Data Streams Data: From Logical Plans to Distributed Execution

This deep‑dive explains how Ray Data transforms user‑level Dataset APIs into a logical plan, optimizes it, converts it into a physical streaming execution graph, and runs it on a cluster using task and actor pools, detailing each component from read sources to write sinks with code examples.

Distributed ComputingPythonRay Data

0 likes · 69 min read

How Ray Data Streams Data: From Logical Plans to Distributed Execution

Big Data Tech Team

Jun 8, 2025 · Big Data

Master Hadoop: A Step-by-Step Learning Roadmap for Big Data Professionals

This guide outlines a comprehensive Hadoop learning roadmap, covering essential prerequisites, core concepts such as HDFS, MapReduce, and YARN, hands‑on projects, advanced ecosystem tools like Hive, Pig, HBase and Spark, plus curated resources and community channels for aspiring big‑data engineers.

Distributed ComputingHDFSHadoop

0 likes · 7 min read

Master Hadoop: A Step-by-Step Learning Roadmap for Big Data Professionals

Alibaba Cloud Infrastructure

Jun 3, 2025 · Artificial Intelligence

Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies

This article explains how to build a flexible machine‑learning infrastructure on Alibaba Cloud ACK using Ray and KubeRay, covering Ray's core components, AI libraries, deployment options on VMs and Kubernetes, code examples for data processing, model serving, and advanced scheduling and quota management techniques.

AIAlibaba CloudDistributed Computing

0 likes · 17 min read

Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies

Big Data Technology & Architecture

Apr 17, 2025 · Big Data

MaxCompute: Intelligent Data Warehouse Platform for the Data+AI Era

This article, based on a meetup presentation, details Alibaba Cloud's MaxCompute platform—its evolution, serverless architecture, AI integration, distributed Python framework, Object Table, near‑real‑time processing, and intelligent warehouse features—addressing the challenges of data warehouses in the Data+AI era.

Big DataData WarehouseDistributed Computing

0 likes · 11 min read

MaxCompute: Intelligent Data Warehouse Platform for the Data+AI Era

Alibaba Cloud Big Data AI Platform

Mar 17, 2025 · Big Data

How MaxFrame Enables Scalable Python AI Workloads on MaxCompute

This article introduces MaxFrame, a cloud‑native distributed Python compute service built on MaxCompute, detailing its architecture, seamless integration with the Python ecosystem, and real‑world use cases ranging from large‑scale data analysis and machine learning to offline LLM inference and custom image deployments.

Big DataData WarehouseDistributed Computing

0 likes · 18 min read

How MaxFrame Enables Scalable Python AI Workloads on MaxCompute

Volcano Engine Developer Services

Mar 5, 2025 · Artificial Intelligence

How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB

This article introduces DeepSeek Smallpond, a lightweight yet high‑performance AI data‑processing engine built on Ray and DuckDB, explains its dual Dataframe and LogicalPlan APIs, showcases integration with Volcano Engine's AI Data Lake LAS, and provides practical code examples for distributed processing, multimodal storage, and RAG pipelines.

AI data processingData LakeDistributed Computing

0 likes · 18 min read

How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB

Big Data Technology & Architecture

Mar 3, 2025 · Big Data

The Turning Point for Data Development: From Traditional Data Engineering to AI Data Engineering

The article analyzes how the rapid rise of open‑source large‑model AI in 2025 is reshaping the data development profession, urging developers to transition from specialized data‑engineer roles to full‑stack AI data engineering skills such as distributed computing, lake‑house architectures, and model tuning.

AIBig DataData Engineering

0 likes · 7 min read

The Turning Point for Data Development: From Traditional Data Engineering to AI Data Engineering

Alibaba Cloud Big Data AI Platform

Feb 14, 2025 · Big Data

How MaxCompute Powers Intelligent Data Warehousing in the Data+AI Era

This article summarizes a meetup talk by Alibaba Cloud expert Yu Deshui, detailing MaxCompute’s evolution, serverless architecture, AI‑enabled features, and the platform’s comprehensive solutions—including OpenLake, MaxFrame, Object Table, near‑real‑time computing, and AI Functions—to address the challenges of modern data‑centric AI workloads.

AI integrationBig DataData Warehouse

0 likes · 13 min read

How MaxCompute Powers Intelligent Data Warehousing in the Data+AI Era

JD Cloud Developers

Dec 26, 2024 · Databases

How ClickHouse Powers Billion‑User Tagging with Efficient Bitmap Storage

This article explains how ClickHouse’s columnar storage, compression, and bitmap functions enable fast, scalable processing of billions of user tags and groups in a CDP, covering data storage design, bitmap generation, and distributed query optimization.

CDPClickHouseColumnar Database

0 likes · 11 min read

How ClickHouse Powers Billion‑User Tagging with Efficient Bitmap Storage

dbaplus Community

Dec 24, 2024 · Big Data

How Bilibili Scaled Its Tag System for Massive Data and Real‑Time Accuracy

The article details Bilibili's comprehensive redesign of its tag system—including background challenges, architectural layers, technical upgrades like Iceberg integration and shard‑based ClickHouse writes, crowd selection methods, online service guarantees, performance metrics, and future plans—showcasing a data‑driven solution that boosts stability, speed, and business coverage.

ClickHouseData EngineeringDistributed Computing

0 likes · 24 min read

How Bilibili Scaled Its Tag System for Massive Data and Real‑Time Accuracy

AntData

Dec 11, 2024 · Big Data

Flex: A Stream‑Batch Integrated Vectorized Engine for Flink

This article introduces Flex, a Flink‑compatible stream‑batch vectorized engine built on Velox and Gluten, explains the SIMD‑based execution model, details native operator optimizations, fallback mechanisms, correctness and usability improvements, and presents performance results and future development plans.

Distributed ComputingFlinkSIMD

0 likes · 17 min read

Flex: A Stream‑Batch Integrated Vectorized Engine for Flink

Rare Earth Juejin Tech Community

Nov 29, 2024 · Big Data

How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray

The article details ByteDance's use of Ray and RayData to construct scalable audio and video data processing pipelines for multimodal AI models, addressing challenges of massive data volume, resource constraints, and fault tolerance through pipeline design, RayCore enhancements, and custom scheduling optimizations.

AIBig DataByteDance

0 likes · 16 min read

How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray

Architects' Tech Alliance

Nov 24, 2024 · Industry Insights

What’s Driving the Next Wave of Large‑Model Compute Infrastructure?

As AI accelerates, large‑model compute infrastructure becomes a cornerstone of digital transformation, with specialized accelerators, heterogeneous architectures, massive distributed clusters, intelligent scheduling, soaring costs, energy concerns, software‑hardware co‑design challenges, and data‑privacy issues shaping its future development.

AI hardwareCompute infrastructureDistributed Computing

0 likes · 9 min read

What’s Driving the Next Wave of Large‑Model Compute Infrastructure?

Smart Era Software Development

Nov 4, 2024 · Artificial Intelligence

How eBay’s Data+AI Platform Leverages Ray for Faster Model Development and Deployment

eBay upgraded its AI infrastructure by adopting Ray, cutting model development and deployment time by roughly 50% and boosting GPU utilization from about 10% to over 75% through automated cluster scaling and high‑throughput batch inference.

AI InfrastructureData+AIDistributed Computing

0 likes · 5 min read

How eBay’s Data+AI Platform Leverages Ray for Faster Model Development and Deployment

IT Services Circle

Oct 23, 2024 · Fundamentals

World’s Largest Known Prime Discovered Using GPUs: 2^136279841−1

A former Nvidia engineer, working through the GIMPS distributed project and leveraging thousands of GPUs across dozens of data centers, confirmed that 2^136279841−1—a 41,024,320‑digit Mersenne prime—is the largest known prime ever found, surpassing the previous record by over 1.6 million digits.

Distributed ComputingGIMPSGPU computing

0 likes · 7 min read

World’s Largest Known Prime Discovered Using GPUs: 2^136279841−1

WeChat Backend Team

Oct 23, 2024 · Artificial Intelligence

How We Scaled AI Computing in WeChat with Ray: From Challenges to AstraRay

This article details the AI computing challenges faced by WeChat, explains why the Ray distributed engine was chosen, and describes the design and large‑scale deployment of the AstraRay platform—including scheduling, resource management, and multi‑model support—to achieve low‑cost, high‑efficiency AI services.

AI platformAstraRayDistributed Computing

0 likes · 20 min read

How We Scaled AI Computing in WeChat with Ray: From Challenges to AstraRay

360 Tech Engineering

Oct 15, 2024 · Artificial Intelligence

Implementation and Optimization of 360 AI Compute Center: Infrastructure, Network, Kubernetes, and Training/Inference Acceleration

The article details the design and deployment of 360's AI Compute Center, covering GPU server selection, high‑performance networking, Kubernetes‑based cluster management, advanced scheduling, training and inference acceleration techniques, and a comprehensive AI development platform with visualization and fault‑tolerance features.

AI InfrastructureDistributed ComputingGPU Cluster

0 likes · 21 min read

Implementation and Optimization of 360 AI Compute Center: Infrastructure, Network, Kubernetes, and Training/Inference Acceleration

AsiaInfo Technology: New Tech Exploration

Sep 6, 2024 · Cloud Computing

How Unity Cloud Rendering Powers the Metaverse: Architecture and Use Cases

This article examines Unity's cloud rendering technology, detailing its distributed architecture, workflow steps, key innovations such as low‑latency transmission and real‑time rendering, and explores how these capabilities enable large‑scale, immersive experiences in the emerging metaverse.

Digital Content CreationDistributed ComputingMetaverse

0 likes · 10 min read

How Unity Cloud Rendering Powers the Metaverse: Architecture and Use Cases

DataFunSummit

Aug 1, 2024 · Big Data

Deep Dive into Apache Spark SQL: Concepts, Core Components, and API

This article provides a comprehensive overview of Apache Spark SQL, covering its fundamental concepts such as TreeNode, AST, and QueryPlan, the distinction between logical and physical plans, the rule‑execution framework, core components like SparkSqlParser and Analyzer, as well as the Spark Session, Dataset/DataFrame, and various writer APIs, supplemented by a detailed Q&A session.

Apache SparkBig DataDistributed Computing

0 likes · 19 min read

Deep Dive into Apache Spark SQL: Concepts, Core Components, and API

Mike Chen's Internet Architecture

Jul 15, 2024 · Big Data

Master Distributed Computing: Hadoop, Spark, and Flink Explained

This article introduces the fundamentals of distributed computing, compares major frameworks such as Hadoop, Spark, and Flink, and outlines their key components, performance characteristics, and typical application scenarios including big‑data analytics, cloud services, real‑time streaming, and scientific computing.

Big DataDistributed ComputingFlink

0 likes · 7 min read

Master Distributed Computing: Hadoop, Spark, and Flink Explained

DataFunSummit

Jul 11, 2024 · Big Data

Design Principles of the Spark Core – DataFun Introduction to Apache Spark (Part 1)

This article provides a comprehensive overview of Apache Spark, covering its origins, key characteristics, core concepts such as RDD, DAG, partitioning and dependencies, the internal architecture including SparkConf, SparkContext, SparkEnv, storage and scheduling systems, as well as deployment models and the company behind the product.

Apache SparkBig DataDistributed Computing

0 likes · 16 min read

Design Principles of the Spark Core – DataFun Introduction to Apache Spark (Part 1)

Alibaba Cloud Developer

Jul 3, 2024 · Big Data

How to Scale Global Dictionary Indexing with Distributed SQL in Minutes

This article explains a distributed‑computing approach for generating a globally unique integer index from massive string datasets, replacing single‑reducer sorting with hash‑bucket partitioning and parallel processing to cut runtime from 30 minutes to just 2 minutes.

Big DataDistributed Computingglobal index

0 likes · 5 min read

How to Scale Global Dictionary Indexing with Distributed SQL in Minutes

Tencent Cloud Developer

May 29, 2024 · Artificial Intelligence

Distributed Network Embedding Algorithm for Billion‑Scale Graph Data in Tencent Games

Tencent’s Game Social Algorithm Team presents a Spark‑based distributed network embedding framework that recursively partitions hundred‑billion‑edge game graphs into manageable subgraphs, runs node2vec locally, and fuses results, enabling efficient link prediction and node classification across multiple games within hours.

Distributed ComputingGame AnalyticsSpark

0 likes · 7 min read

Distributed Network Embedding Algorithm for Billion‑Scale Graph Data in Tencent Games

Architects' Tech Alliance

Mar 27, 2024 · Industry Insights

Why AI Large‑Model Training Needs Ultra‑High‑Bandwidth, Low‑Latency Networks

The rapid growth of AI model sizes has created unprecedented demands on network bandwidth, latency, stability, and automation, making efficient RDMA‑based interconnects, advanced congestion control, and intelligent deployment essential for scaling distributed training clusters to thousands of GPUs.

AI InfrastructureAI trainingDistributed Computing

0 likes · 11 min read

Why AI Large‑Model Training Needs Ultra‑High‑Bandwidth, Low‑Latency Networks

JD Tech

Mar 18, 2024 · Artificial Intelligence

High‑Performance Inference Architecture: Distributed Graph Heterogeneous Computing Framework and GPU Multi‑Stream Optimization

The article describes how JD’s advertising team tackled the high‑concurrency, low‑latency challenges of online recommendation inference by designing a distributed graph heterogeneous computing framework, optimizing GPU kernel launches with TensorBatch, deep‑learning compiler techniques, and a multi‑stream GPU architecture, achieving significant throughput and latency improvements.

AI inferenceDeep Learning CompilerDistributed Computing

0 likes · 14 min read

High‑Performance Inference Architecture: Distributed Graph Heterogeneous Computing Framework and GPU Multi‑Stream Optimization

JD Cloud Developers

Mar 14, 2024 · Artificial Intelligence

How JD Retail Boosted Online Recommendation Inference with Distributed Heterogeneous Computing

This article details JD Retail's ad‑tech team's deep‑compute optimizations—including a distributed graph‑based heterogeneous framework, GPU‑focused inference engine enhancements, TensorBatch request aggregation, deep‑learning compiler bucket pre‑compilation, asynchronous compilation, and multi‑stream GPU processing—to overcome high‑concurrency, low‑latency online recommendation challenges.

Deep Learning CompilerDistributed ComputingGPU inference

0 likes · 14 min read

How JD Retail Boosted Online Recommendation Inference with Distributed Heterogeneous Computing

DataFunTalk

Jan 29, 2024 · Artificial Intelligence

PAI‑ChatLearn: A Flexible Large‑Scale RLHF Training Framework for Massive Models

The article introduces PAI‑ChatLearn, a flexible and high‑performance framework developed by Alibaba Cloud's PAI team that supports full‑pipeline RLHF training for large models, explains the evolution of parallel training strategies, details the framework’s architecture and configuration, and showcases performance results and practical usage examples.

AI FrameworkDistributed ComputingPAI-ChatLearn

0 likes · 17 min read

PAI‑ChatLearn: A Flexible Large‑Scale RLHF Training Framework for Massive Models

JD Retail Technology

Jan 25, 2024 · Artificial Intelligence

Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration

This article describes how JD Retail's advertising technology team tackled the high‑compute demands of modern recommendation models by designing a distributed graph‑partitioned heterogeneous computing framework, introducing TensorBatch request aggregation, leveraging deep‑learning compiler bucketing and asynchronous compilation, and implementing a multi‑stream GPU architecture to dramatically improve online inference throughput and latency.

Deep Learning CompilerDistributed ComputingGPU Acceleration

0 likes · 13 min read

Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration

Architects' Tech Alliance

Dec 23, 2023 · Artificial Intelligence

Future Development Paths of Computing Power Technology (2023): Chip Architecture, Near‑Memory Computing, and Distributed xPU Systems

The article outlines the accelerating demand for high‑performance computing driven by AI, AR/VR, biotech and other workloads, examines the limits of Moore's law, and presents emerging solutions such as advanced chip architectures, chiplet integration, near‑memory/in‑memory computing, and distributed xPU‑based systems for scalable, efficient compute.

AI accelerationChipletDistributed Computing

0 likes · 11 min read

Future Development Paths of Computing Power Technology (2023): Chip Architecture, Near‑Memory Computing, and Distributed xPU Systems

Volcano Engine Developer Services

Dec 21, 2023 · Artificial Intelligence

How ByteDance Scales AI Workloads with Ray, KubeRay, and Kueue

This article explains why Ray is popular among AI researchers, how ByteDance uses KubeRay to host Ray applications, and how Kueue manages and schedules RayJob workloads, covering Ray's architecture, KubeRay components, real-world use cases, and job scheduling strategies.

AIDistributed ComputingKubeRay

0 likes · 12 min read

How ByteDance Scales AI Workloads with Ray, KubeRay, and Kueue

HomeTech

Nov 24, 2023 · Backend Development

Implementing Task Scheduling and Distributed Processing with Celery and Redis in Python

This article explains how to use Celery together with Redis to manage and execute periodic and asynchronous tasks in Python, covering basic concepts, architecture, configuration steps, single‑worker and multi‑worker setups, distributed processing strategies, and practical considerations for reliable task execution.

Distributed ComputingPythonTask scheduling

0 likes · 8 min read

Implementing Task Scheduling and Distributed Processing with Celery and Redis in Python

政采云技术

Sep 19, 2023 · Big Data

Techniques for Processing Massive Data: Sorting, Querying, Top‑K, and Deduplication

This article explains core concepts and practical solutions for handling massive datasets that cannot fit into memory, covering batch processing, distributed sorting, bitmap indexing, hash‑based lookups, top‑K extraction, and deduplication techniques with code examples and multi‑machine strategies.

Big DataDeduplicationDistributed Computing

0 likes · 18 min read

Techniques for Processing Massive Data: Sorting, Querying, Top‑K, and Deduplication

Baidu Geek Talk

Aug 28, 2023 · Cloud Native

Baidu Search Vertical Offline Computing System Architecture Evolution

Baidu's search vertical offline computing system evolved through four stages—from a fragmented pre‑2018 processing setup to a unified business framework, then serverless functions, and finally a data‑intelligent architecture with multi‑layer abstraction, graph and multi‑language engines, achieving 5‑10× efficiency gains and dramatically reducing failures.

Baidu SearchCloud NativeDAG Processing

0 likes · 23 min read

Baidu Search Vertical Offline Computing System Architecture Evolution

DataFunSummit

Aug 25, 2023 · Big Data

Big Data Meets Cloud Native: Tencent's Cloud‑Native Big Data Architecture, Challenges, and Practices

This article explores how Tencent integrates big data with cloud‑native technologies, detailing the evolution, opportunities, challenges, the peak‑range (FengLuan) architecture, engine and scheduling layers, mixed‑workload strategies, runtime optimizations, and future directions for large‑scale data platforms.

Cloud NativeDistributed ComputingResource Scheduling

0 likes · 17 min read

Big Data Meets Cloud Native: Tencent's Cloud‑Native Big Data Architecture, Challenges, and Practices

DataFunTalk

Aug 22, 2023 · Artificial Intelligence

Building Complex Distributed Systems with Ray: An AutoML Case Study and Cloud‑Native Deployment

This article explains how the Ray distributed computing engine simplifies the design, deployment, and operation of complex cloud‑native distributed systems—illustrated through an AutoML service example—by detailing system complexity, Ray’s core concepts, resource customization, runtime environments, monitoring, and ecosystem integrations.

AIAutoMLCloud Native

0 likes · 26 min read

Building Complex Distributed Systems with Ray: An AutoML Case Study and Cloud‑Native Deployment

DataFunSummit

Aug 2, 2023 · Big Data

Loop Detection in Risk Control: Challenges, Distributed Graph Computing Optimizations, and ArcNeural Engine Case Studies

This article discusses the challenges of loop detection in financial risk control, presents distributed graph computing optimization techniques—including pruning, multi‑graph handling, and memory‑efficient algorithms—shows experimental results, and shares real‑world ArcNeural engine case studies and future directions.

ArcNeuralBig DataDistributed Computing

0 likes · 13 min read

Loop Detection in Risk Control: Challenges, Distributed Graph Computing Optimizations, and ArcNeural Engine Case Studies

AntTech

Jun 27, 2023 · Artificial Intelligence

Fanglue: An Interactive System for Decision Rule Crafting in Fraud Detection

Fanglue is an interactive, web‑based rule‑development platform that integrates expert domain knowledge with distributed AI algorithms to efficiently generate and evaluate decision rules for anti‑fraud scenarios, leveraging Ray for real‑time processing and achieving VLDB‑2023 acceptance.

AIDistributed ComputingRay

0 likes · 10 min read

Fanglue: An Interactive System for Decision Rule Crafting in Fraud Detection

Big Data Technology & Architecture

Jun 16, 2023 · Big Data

Optimizing Big Data SQL: Handling Data Skew and Data Explosion

This article examines common performance issues in big data SQL queries, such as data skew and data explosion, and provides systematic troubleshooting steps and practical optimization techniques across the Map, Reduce, and Join stages, including partition merging, column pruning, predicate pushdown, and join strategies.

Data ExplosionData SkewDistributed Computing

0 likes · 10 min read

Optimizing Big Data SQL: Handling Data Skew and Data Explosion

ByteDance Cloud Native

Jun 13, 2023 · Artificial Intelligence

How Ray and Cloud‑Native Tech Supercharge Large‑Model Offline Inference

This article explains the challenges of large‑model offline (batch) inference, such as GPU memory limits and distributed scheduling, and shows how Ray’s cloud‑native architecture, model partitioning, and Ray Datasets can be used to build efficient, elastic inference frameworks deployed with KubeRay.

Distributed ComputingGPU memoryRay

0 likes · 18 min read

How Ray and Cloud‑Native Tech Supercharge Large‑Model Offline Inference

DataFunSummit

Apr 9, 2023 · Big Data

Expert Interview: Architecture and Trends of Big Data Platforms

This article presents a comprehensive interview with several big‑data platform experts, outlining the core components such as data integration, storage and computation, distributed scheduling, and query analysis, while also highlighting current challenges, best‑practice tools, and future trends in big‑data architecture.

Big DataData IntegrationDistributed Computing

0 likes · 10 min read

Expert Interview: Architecture and Trends of Big Data Platforms

Tencent Cloud Developer

Mar 22, 2023 · Artificial Intelligence

Tencent Star Network: High‑Performance GPU Cluster Architecture for Large‑Scale AI Model Training

Tencent’s Star Network delivers a 1.6 Tbps Ethernet‑RDMA fabric, fat‑tree topology supporting up to 4 K GPUs, multi‑track traffic aggregation and adaptive heterogeneous links plus a custom TCCL library, cutting AllReduce overhead from 35 % to 3.7 %, speeding AI training iterations by 32 % while automating deployment and providing sub‑second self‑healing.

AI trainingDistributed ComputingGPU clusters

0 likes · 19 min read

Tencent Star Network: High‑Performance GPU Cluster Architecture for Large‑Scale AI Model Training

Python Programming Learning Circle

Mar 18, 2023 · Fundamentals

Introduction to Parallel Programming and Python Parallel Libraries

This article introduces parallel programming concepts, memory architectures, execution models, Python threading versus multiprocessing performance, and reviews several Python parallel libraries such as Ray, Dask, Dispy, ipyparallel, and Joblib for building scalable concurrent applications.

Distributed ComputingMultiprocessingParallel Programming

0 likes · 10 min read

Introduction to Parallel Programming and Python Parallel Libraries

StarRing Big Data Open Lab

Feb 24, 2023 · Big Data

What Makes MPP Databases the Powerhouse Behind Modern Data Analytics?

MPP (Massive Parallel Processing) databases, designed for large‑scale analytical workloads, use distributed, shared‑nothing architectures with multiple control and compute nodes, offering high scalability, diverse data‑sharding strategies, and powerful SQL compatibility, as illustrated by vendors like Teradata, Vertica, Greenplum, and emerging open‑source solutions.

Big DataDistributed ComputingGreenplum

0 likes · 15 min read

What Makes MPP Databases the Powerhouse Behind Modern Data Analytics?

NetEase Yanxuan Technology Product Team

Feb 20, 2023 · Big Data

Data Task Optimization Techniques and Practices

The article surveys unconventional offline data‑task optimizations—such as distribution‑by, seeded random shuffling, explode‑based skew mitigation, hash bucketing, task‑parallelism tuning, and multi‑insert materialization—organized by point, line, and surface perspectives, and stresses that effective performance gains require both technical tricks and business‑driven pipeline adjustments.

Distributed ComputingHiveSQL tuning

0 likes · 16 min read

Data Task Optimization Techniques and Practices

Baidu Geek Talk

Feb 17, 2023 · Artificial Intelligence

How PGLBox Achieves 27× Faster GPU‑Powered Large‑Scale Graph Learning

PGLBox, Baidu’s GPU‑based large‑scale graph training framework, delivers up to 27× speedup over CPU clusters by fully GPU‑accelerating storage, sampling, and training, supporting billions of nodes, advanced GNN algorithms, multi‑level storage, and seamless integration of massive pretrained models.

Distributed ComputingGPUGraph Neural Networks

0 likes · 7 min read

How PGLBox Achieves 27× Faster GPU‑Powered Large‑Scale Graph Learning

StarRing Big Data Open Lab

Feb 8, 2023 · Big Data

Why MapReduce and Spark Still Matter: A Deep Dive into Distributed Computing

Distributed computing splits massive tasks across multiple servers, and this article explains the classic MapReduce batch engine and the modern Spark framework, covering their architectures, strengths, limitations, and evolution, while highlighting key features like fault tolerance, in‑memory processing, and real‑time streaming capabilities.

Big DataDistributed ComputingMapReduce

0 likes · 12 min read

Why MapReduce and Spark Still Matter: A Deep Dive into Distributed Computing

Architect's Tech Stack

Dec 30, 2022 · Big Data

Distributed Computing Is Not a Panacea for Big Data: Prioritize Single‑Node Performance First

While distributed clusters are popular for big‑data processing, they are not a universal solution; tasks that are hard to partition or involve heavy cross‑node communication often perform better on a well‑optimized single machine, making a careful analysis of workload characteristics essential before scaling out.

Big DataDistributed ComputingPerformance Tuning

0 likes · 14 min read

Distributed Computing Is Not a Panacea for Big Data: Prioritize Single‑Node Performance First

DataFunSummit

Dec 21, 2022 · Big Data

Big Data Platform Architecture: Expert Insights on Components, Challenges, and Trends

An expert interview series examines the architecture of big data platforms, detailing core modules such as data integration, storage, computation, scheduling, and query analysis, while highlighting current challenges, best‑practice tools, and future trends like cloud‑native, object storage, and real‑time processing.

Big DataDistributed ComputingQuery Engines

0 likes · 12 min read

Big Data Platform Architecture: Expert Insights on Components, Challenges, and Trends

AntTech

Dec 14, 2022 · Artificial Intelligence

Privacy-Preserving Machine Learning for AI and Big Data Using Intel SGX, Occlum, and BigDL PPML

This article presents an end‑to‑end privacy‑preserving machine‑learning solution for AI and big‑data workloads built on Intel SGX, the open‑source TEE OS Occlum, and BigDL PPML, detailing its architecture, key features, deployment steps, and real‑world applications.

Distributed ComputingPrivacySGX

0 likes · 15 min read

Privacy-Preserving Machine Learning for AI and Big Data Using Intel SGX, Occlum, and BigDL PPML

Architects Research Society

Nov 30, 2022 · Artificial Intelligence

A Comprehensive Overview of Machine Learning Tools and Libraries

An extensive survey ranks and compares a wide range of machine learning libraries and frameworks—both deep and shallow learning—detailing their languages, types, GPU acceleration, distributed computing capabilities, and typical academic and industrial applications, based on Google search popularity as of May.

Distributed ComputingGPU Accelerationdeep learning

0 likes · 20 min read

A Comprehensive Overview of Machine Learning Tools and Libraries

DataFunTalk

Nov 12, 2022 · Artificial Intelligence

Causal Inference Methods for Large‑Scale Game Analytics: Distributed Propensity Score Matching, Robust Double‑Robust Estimation, and Panel DID

This article introduces causal inference methodologies tailored for game scenarios, discusses the challenges of offline inference on massive data, and presents three distributed solutions—low‑complexity propensity‑score matching, robust double‑robust estimation, and panel difference‑in‑differences—along with their implementation details and performance insights.

Distributed ComputingGame AnalyticsPropensity Score Matching

0 likes · 12 min read

Causal Inference Methods for Large‑Scale Game Analytics: Distributed Propensity Score Matching, Robust Double‑Robust Estimation, and Panel DID

ITPUB

Oct 21, 2022 · Big Data

Hadoop Explained: Architecture, Core Components, and Real-World Applications

This article provides a comprehensive overview of Hadoop, covering its historical development, key characteristics, the HDFS storage framework, the MapReduce processing engine, YARN resource manager, and a wide range of real-world application scenarios, as well as the broader Hadoop ecosystem and its major components.

Big DataDistributed ComputingHDFS

0 likes · 20 min read

Hadoop Explained: Architecture, Core Components, and Real-World Applications

Python Crawling & Data Mining

Oct 16, 2022 · Big Data

What Makes Hadoop the Backbone of Modern Big Data Processing?

This article provides a comprehensive overview of Hadoop, covering its history, core features, the HDFS storage framework, MapReduce computation engine, YARN resource manager, real‑world application scenarios, and the surrounding ecosystem of tools such as Hive, Spark and Kafka.

Distributed ComputingHDFSHadoop

0 likes · 20 min read

What Makes Hadoop the Backbone of Modern Big Data Processing?

DataFunSummit

Oct 1, 2022 · Artificial Intelligence

GraphLearn: An Industrial‑Scale Distributed Graph Learning Platform and Its System Optimizations

This article introduces GraphLearn, a large‑scale distributed graph learning platform designed for industrial GNN workloads, details its architecture, sampling implementation, training pipeline, system optimizations such as GPU‑accelerated sampling, and showcases real‑world applications in recommendation and risk control.

Distributed ComputingLarge-Scale GraphSampling Optimization

0 likes · 13 min read

GraphLearn: An Industrial‑Scale Distributed Graph Learning Platform and Its System Optimizations

Huawei Cloud Developer Alliance

Jun 22, 2022 · Cloud Native

How Distributed Cloud Native Powers Multi‑Region Apps: Insights from Huawei’s Summit

The Huawei Cloud HCDE summit highlighted the rise of distributed cloud native, detailing global scheduling, SaaS tenant isolation, AI creation tool workflows, and the integration of Kubernetes and Dapr to address cross‑cloud, cross‑region application challenges.

AICloud NativeDapr

0 likes · 5 min read

How Distributed Cloud Native Powers Multi‑Region Apps: Insights from Huawei’s Summit

ITPUB

May 27, 2022 · Databases

How HugeGraph’s Self‑Built Graph Computing Tackles Large‑Scale Graph Challenges

This article explains the fundamentals of graph computing, compares it with traditional processing, outlines industry challenges such as partitioning and load imbalance, and details HugeGraph’s self‑developed architecture, key technical solutions, and how developers can create and deploy graph algorithms.

Algorithm DevelopmentData PartitioningDistributed Computing

0 likes · 14 min read

How HugeGraph’s Self‑Built Graph Computing Tackles Large‑Scale Graph Challenges

DataFunSummit

May 8, 2022 · Artificial Intelligence

Machine Learning‑Based Time Series Forecasting and Anomaly Detection System at JD Search

The article describes JD Search's machine‑learning alert system that combines offline and real‑time training, FFT‑based periodic detection, Prophet forecasting, and DBSCAN anomaly clustering, and explains architectural design, data preprocessing, model optimization, and distributed deployment to improve alert accuracy and response speed.

Anomaly DetectionDBSCANDistributed Computing

0 likes · 10 min read

Machine Learning‑Based Time Series Forecasting and Anomaly Detection System at JD Search

DataFunTalk

Apr 2, 2022 · Big Data

SuperSQL: A High‑Performance Cross‑Engine, Cross‑Data‑Center SQL Middleware for Big Data

The article introduces SuperSQL, a federated SQL middleware that unifies heterogeneous data sources across multiple data centers, leverages Apache Calcite for cost‑based optimization, pushes down operators to various engines, manages metadata with a Trie model, and demonstrates significant performance gains over traditional solutions.

Big DataCost-Based OptimizationCross‑Data‑Center

0 likes · 27 min read

SuperSQL: A High‑Performance Cross‑Engine, Cross‑Data‑Center SQL Middleware for Big Data

Architects' Tech Alliance

Mar 13, 2022 · Industry Insights

Why RDMA Is Replacing TCP/IP in AI-Driven Data Centers

The article analyzes how the AI era’s demand for ultra‑low latency and high throughput exposes fundamental limits of the traditional TCP/IP stack, and explains why RDMA’s kernel‑bypass, zero‑copy design, and emerging congestion‑control algorithms are becoming the preferred network fabric for modern data‑center workloads.

AI FabricData CenterDistributed Computing

0 likes · 12 min read

Why RDMA Is Replacing TCP/IP in AI-Driven Data Centers

Python Programming Learning Circle

Jan 27, 2022 · Big Data

Using ipyparallel for Parallel and Distributed Computing in Python

This article explains how to overcome Python's Global Interpreter Lock by installing ipyparallel, configuring parallel profiles, and using engines, DirectView, and LoadBalancedView to run both synchronous and asynchronous tasks, with code examples and performance comparisons.

Distributed ComputingPythonipyparallel

0 likes · 9 min read

Using ipyparallel for Parallel and Distributed Computing in Python

JD Retail Technology

Jan 27, 2022 · Big Data

How JD’s Custom Spark Engine Tackles Data Skew for Massive Offline Jobs

This article explains JD’s self‑developed data‑skew mitigation solution for Spark, detailing the problem of uneven key distribution, the limitations of the open‑source AQE implementation, and JD’s OptimizeSkewedJoinV2 algorithm that dramatically reduces stage latency in large‑scale join workloads.

Adaptive Query ExecutionBig DataData Skew

0 likes · 13 min read

How JD’s Custom Spark Engine Tackles Data Skew for Massive Offline Jobs

dbaplus Community

Nov 27, 2021 · Big Data

How Vipshop’s Hera Data Service Boosts Big Data Access and Performance

The article details the design, architecture, core features, scheduling logic, and performance gains of Vipshop’s self‑built Hera data service, which unifies data‑warehouse access, supports multiple engines, adapts SQL execution, and dramatically improves SLA for both B‑to‑B and B‑to‑C workloads.

Big DataData ServiceDistributed Computing

0 likes · 22 min read

How Vipshop’s Hera Data Service Boosts Big Data Access and Performance

Programmer DD

Nov 9, 2021 · Backend Development

Why RPC Is the Backbone of Modern Distributed Systems

This article explains the fundamentals of Remote Procedure Call (RPC), its historical evolution, core concepts such as remote procedures and calls, practical examples, analogies, and the challenges it introduces—like latency, address‑space isolation, partial failures, and concurrency—while highlighting RPC’s pivotal role in enabling scalable distributed architectures.

Backend DevelopmentDistributed ComputingRPC

0 likes · 12 min read

Why RPC Is the Backbone of Modern Distributed Systems

iQIYI Technical Product Team

Oct 9, 2021 · Big Data

Exploring iQIYI’s Unified Big Data + AI Architecture: Challenges, Solutions, and Future Directions

iQIYI’s unified big‑data + AI platform combines a hybrid‑cloud model, storage‑compute separation via its QBFS virtual file system, a reusable feature‑store and operator DAGs, and multi‑tenant YARN scheduling to overcome legacy Hive/Spark bottlenecks, accelerate large‑scale model training, improve data quality, and prepare for future real‑time, privacy‑preserving AI workloads.

AIDistributed ComputingHybrid Cloud

0 likes · 10 min read

Exploring iQIYI’s Unified Big Data + AI Architecture: Challenges, Solutions, and Future Directions

21CTO

Aug 15, 2021 · Artificial Intelligence

Explore Gorse: An Open‑Source Go Recommendation Engine for Scalable AI‑Driven Suggestions

Gorse is an open‑source Go recommendation system that automates model selection, supports distributed training and prediction, offers a RESTful API and dashboard, and stores data in MySQL/MongoDB with Redis caching, enabling fast integration of personalized suggestions into online services.

Distributed ComputingGoRESTful API

0 likes · 3 min read

Explore Gorse: An Open‑Source Go Recommendation Engine for Scalable AI‑Driven Suggestions

Alimama Tech

Aug 4, 2021 · Big Data

Fast Attribution Engine (FAE): High‑Performance Distributed Computing for User Behavior and Advertising Attribution

FAE, Alibaba’s high‑performance distributed MPP engine, stores billions of user‑behavior events in a time‑ordered AFile model and uses stateless masters, importers, mergers and workers with Redis and MySQL metadata to deliver sub‑second, 10‑100× faster ad‑attribution queries across ad‑hoc, offline and near‑real‑time scenarios such as frequency, path and funnel analysis.

Ad AttributionBig DataDistributed Computing

0 likes · 11 min read

Fast Attribution Engine (FAE): High‑Performance Distributed Computing for User Behavior and Advertising Attribution

DataFunTalk

Aug 3, 2021 · Big Data

Fast Attribution Engine (FAE): A High‑Performance Distributed Computing Engine for User Behavior and Advertising Attribution

The article introduces Alibaba's Fast Attribution Engine (FAE), describing the technical challenges of user behavior and advertising attribution, its data model (AFile), system architecture, performance advantages over traditional OLAP solutions, and a range of application scenarios such as frequency analysis, crowd flow modeling, path, retention, funnel analysis, and visitor selection.

Distributed ComputingFAEMPP engine

0 likes · 13 min read

Fast Attribution Engine (FAE): A High‑Performance Distributed Computing Engine for User Behavior and Advertising Attribution

Architecture Digest

Jul 25, 2021 · Big Data

Design and Architecture of Hera Data Service for Unified Data Access at Vipshop

The article details the background, architecture, core features, scheduling mechanisms, Lisp‑based query DSL, and Alluxio integration of Vipshop's self‑developed Hera data service, illustrating how it unifies multi‑engine data access, improves SLA, and accelerates large‑scale crowd computing tasks.

AlluxioBig DataData Service

0 likes · 21 min read

Design and Architecture of Hera Data Service for Unified Data Access at Vipshop

DataFunTalk

Jul 7, 2021 · Big Data

Solving Data Island Challenges and Enabling Advanced OLAP Analysis on Heterogeneous Big Data Platforms – Kyligence Solution Overview

This article explains the growing analytical demands in the big‑data era, the limitations of traditional OLAP, and how Kyligence’s distributed OLAP engine addresses data‑island issues, multi‑dimensional and many‑to‑many analysis, unified security, and performance optimization with MDX on Spark, delivering a seamless Excel‑like experience.

AnalyticsBig DataData Integration

0 likes · 9 min read

Solving Data Island Challenges and Enabling Advanced OLAP Analysis on Heterogeneous Big Data Platforms – Kyligence Solution Overview

Python Crawling & Data Mining

Jun 14, 2021 · Big Data

Why Stanford’s Data Mining Tutorial Is the Ultimate Guide to Large‑Scale Data Mining

This article introduces the third edition of Stanford’s Data Mining Tutorial, highlighting its panoramic roadmap of data‑mining techniques for massive datasets, core features, comprehensive topic coverage, target audience, and supplementary resources while noting its popularity among students and professionals.

Distributed ComputingStanfordalgorithms

0 likes · 11 min read

Why Stanford’s Data Mining Tutorial Is the Ultimate Guide to Large‑Scale Data Mining

Architects' Tech Alliance

May 31, 2021 · Fundamentals

Fundamentals of Parallel and Distributed Computing and Hardware Architectures

This article explains the evolution of cloud computing, the distinction between serial, parallel, and distributed computing models, and details the four classic computer architecture classifications (SISD, SIMD, MISD, MIMD) along with shared‑memory and distributed‑memory MIMD systems and their role in modern distributed system layers.

Distributed ComputingMIMDSISD

0 likes · 11 min read

Fundamentals of Parallel and Distributed Computing and Hardware Architectures

Architects Research Society

May 31, 2021 · Artificial Intelligence

Comprehensive Survey of Machine Learning Tools and Libraries

This article presents a detailed overview and ranking of numerous machine learning tools and libraries, distinguishing deep and shallow learning approaches, highlighting language support, GPU acceleration, and distributed computing capabilities, and provides insights into their academic and industrial usage.

Distributed ComputingGPU Accelerationshallow learning

0 likes · 9 min read

Comprehensive Survey of Machine Learning Tools and Libraries

NetEase Game Operations Platform

May 22, 2021 · Big Data

Comprehensive Overview and Source Code Analysis of NetEase Spark Kyuubi

This article systematically introduces NetEase Kyuubi, an open‑source high‑performance JDBC and SQL execution engine built on Apache Spark, covering its background, core architecture, service discovery, session and operation management, startup processes, and key source‑code implementations with detailed code examples.

Apache ThriftBig DataDistributed Computing

0 likes · 47 min read

Comprehensive Overview and Source Code Analysis of NetEase Spark Kyuubi