Tagged articles

Ray

58 articles · Page 1 of 1

Jun 22, 2026 · Artificial Intelligence

Building DataFlow: An Industrial‑Grade LLM Data Pipeline from Documents to Training

The article presents DataFlow, an open‑source, GPU‑centric data‑engineering framework that tackles LLM data‑preparation bottlenecks by defining a two‑level operator taxonomy, a LLM‑driven WebAgent for automatic crawling, a PDF‑to‑Markdown MinerU, a Ray‑based distributed runtime, and extensive multimodal extensions, and validates the design with quantitative experiments showing significant quality gains across math, code, and reasoning benchmarks.

DataFlowLLMMultimodal

0 likes · 14 min read

Building DataFlow: An Industrial‑Grade LLM Data Pipeline from Documents to Training

Alibaba Cloud Infrastructure

May 26, 2026 · Cloud Native

How BYD and Alibaba Cloud Use Argo Workflows to Efficiently Schedule Millions of Autonomous Driving Tasks

Facing over 1 PB of daily sensor data, BYD replaced Airflow with a multi‑cluster Argo Workflows and Argo CD architecture, integrated Ray for GPU workloads, and achieved 20‑40 k concurrent workflows, an 11‑fold efficiency boost, 30% cost reduction, and near‑99% success rates.

Argo WorkflowsCloud NativeRay

0 likes · 11 min read

How BYD and Alibaba Cloud Use Argo Workflows to Efficiently Schedule Millions of Autonomous Driving Tasks

DataFunSummit

May 11, 2026 · Artificial Intelligence

How Lance Powers Enterprise Multimodal AI Data Lakes

The article analyzes why 74% of AI projects fail due to feedback gaps and data silos, explains how the open‑source Lance format addresses these issues with unified multimodal storage, outlines a layered Lance‑on‑Ray architecture, and details three real‑world practices—implicit feedback loops, GPU‑accelerated self‑evolution, and semantic knowledge‑graph evolution—to boost R&D efficiency.

CAGRADaftData Lake

0 likes · 13 min read

How Lance Powers Enterprise Multimodal AI Data Lakes

Alibaba Cloud Big Data AI Platform

Apr 22, 2026 · Artificial Intelligence

How to Build an End‑to‑End Hand‑Video to VLA Data Pipeline on Alibaba Cloud PAI with Data‑Juicer

This article details a step‑by‑step, distributed pipeline built on Alibaba Cloud PAI using Data‑Juicer and Ray that transforms raw egocentric hand videos into LeRobot v2.0‑compatible Vision‑Language‑Action (VLA) training data, covering video splitting, frame extraction, camera calibration, 3D hand reconstruction, pose estimation, action captioning, and export, with code snippets, performance numbers, and references.

Data-JuicerDistributed ComputingEmbodied AI

0 likes · 29 min read

How to Build an End‑to‑End Hand‑Video to VLA Data Pipeline on Alibaba Cloud PAI with Data‑Juicer

Ctrip Technology

Apr 16, 2026 · Big Data

How Ray + DuckDB Cut 9B-Row Attribution Queries from 40s to 15s

When attribution analysis on over 900 million rows slowed to more than 40 seconds and threatened cluster stability, Ctrip's smart attribution team rebuilt the architecture with Ray and DuckDB, achieving sub‑15‑second query times, 160 % performance gain, and complete resource isolation.

Attribution AnalysisBig DataDistributed Computing

0 likes · 22 min read

How Ray + DuckDB Cut 9B-Row Attribution Queries from 40s to 15s

Alibaba Cloud Big Data AI Platform

Apr 13, 2026 · Artificial Intelligence

How to Build a Scalable Multimodal Data Pipeline with Alibaba Cloud PAI and DataJuicer

This article details a step‑by‑step guide for constructing a high‑performance multimodal data pipeline—covering video segmentation, duration filtering, frame extraction, safety and aesthetic scoring, and caption generation—using Alibaba Cloud PAI, Paimon, DataJuicer, and distributed frameworks like Ray and Daft, with real‑world performance metrics.

AIAlibaba CloudDaft

0 likes · 30 min read

How to Build a Scalable Multimodal Data Pipeline with Alibaba Cloud PAI and DataJuicer

Big Data Technology & Architecture

Apr 3, 2026 · Industry Insights

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

This article analyzes how the Daft‑Ray‑Lance stack tackles the challenges of multimodal AI workloads by offering a high‑performance Rust engine, adaptive back‑pressure, seamless Ray‑based distributed scheduling, and a storage format optimized for random access, vector indexing, and zero‑copy schema evolution, complete with benchmark comparisons and practical deployment guidance.

DaftData EngineeringLance

0 likes · 21 min read

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

Big Data Technology Tribe

Mar 15, 2026 · Databases

How to Build Distributed Scalar Indexes with Lance and Ray

This guide explains the end‑to‑end workflow for constructing a distributed scalar index in Lance by orchestrating validation, fragment sharding, worker‑level indexing via Ray, and final metadata merging, complete with code snippets and detailed step‑by‑step instructions.

LancePythonRay

0 likes · 12 min read

How to Build Distributed Scalar Indexes with Lance and Ray

DataFunSummit

Mar 3, 2026 · Backend Development

How Ant Group Supercharged AI Data Pipelines with Ray: Boosting Index Build Speed and Reliability

This article details Ant Group's use of the Ray distributed computing framework to accelerate massive data indexing, migrate a C++ engine to Ray, implement elastic resource scheduling, improve long‑tail task efficiency, and build a robust RAG operator system with comprehensive governance, achieving up to 2× speed gains and 99.9% success rates.

Backend DevelopmentDistributed ComputingRay

0 likes · 15 min read

How Ant Group Supercharged AI Data Pipelines with Ray: Boosting Index Build Speed and Reliability

DataFunSummit

Mar 2, 2026 · Artificial Intelligence

How Data-Juicer Powers Multi‑Modal Data Processing for Large Language Models

This article explains the evolution of Data‑Juicer from a pure‑text preprocessing tool to a full‑stack multi‑modal data engine, detailing its architecture, operator library, Ray‑based distributed execution, performance benchmarks, integration with AI agents, and roadmap for future AI‑centric data workflows.

Data-JuicerMulti-modalRay

0 likes · 31 min read

How Data-Juicer Powers Multi‑Modal Data Processing for Large Language Models

Big Data Technology Tribe

Mar 1, 2026 · Backend Development

How Ray Data Turns Logical Plans into Executable Workflows – A Deep Dive

This article provides a comprehensive, step‑by‑step explanation of Ray Data's LogicalPlan architecture, covering its class hierarchy, core methods, logical operators, optimization rules, planning from logical to physical operators, execution binding, metadata inference, lineage serialization, and the full file/module index for developers building scalable data pipelines.

@DataLogicalPlanOptimization

0 likes · 35 min read

How Ray Data Turns Logical Plans into Executable Workflows – A Deep Dive

Big Data Technology Tribe

Feb 28, 2026 · Big Data

Unlocking Ray Data: A Deep Dive into the Dataset API and Its Powerful Transformations

This comprehensive guide explains Ray Data's Dataset core type, its distributed pipeline design, lazy execution model, API groups, transformation methods, column operations, I/O integrations, metadata utilities, and execution workflow, providing clear code examples and practical usage tips.

APIDistributed DataPython

0 likes · 18 min read

Unlocking Ray Data: A Deep Dive into the Dataset API and Its Powerful Transformations

Data STUDIO

Feb 21, 2026 · Big Data

Boost Python Performance Up to 50× Without Changing Your Code

Python’s reputation for slowness can be overcome by selecting the right tools—Numba, PyPy, CuPy, JAX, Ray, Joblib, async I/O, memory profilers, and big‑data frameworks—delivering speedups from 6× to over 50× with minimal or no code modifications.

GPUProfilingRay

0 likes · 22 min read

Boost Python Performance Up to 50× Without Changing Your Code

JD Tech

Jan 31, 2026 · Artificial Intelligence

How JD's 9N‑LLM Engine Powers Scalable Generative Recommendation at Massive Scale

This article details JD Retail's 9N‑LLM unified training framework that tackles the massive data, hardware heterogeneity, and algorithmic challenges of generative recommendation by integrating TensorFlow and PyTorch, supporting GPU/NPU, and delivering high‑throughput sample processing, sparse/dense optimization, and flexible reinforcement‑learning capabilities.

GPU/NPURaylarge-scale AI

0 likes · 26 min read

How JD's 9N‑LLM Engine Powers Scalable Generative Recommendation at Massive Scale

DataFunSummit

Jan 18, 2026 · Big Data

How Ray Reinvents AI Data Pipelines for Massive Multimodal Inference

This article examines the shortcomings of traditional big‑data engines for AI workloads, presents a Ray‑based heterogeneous fusion architecture that unifies CPU/GPU scheduling, Python ecosystems, and streaming‑batch processing, and details fault‑tolerance, checkpointing, compute‑storage separation, resource‑utilization, scalability, and observability improvements that enable thousands of nodes and dramatically higher GPU efficiency.

Big DataCloud NativeDistributed Computing

0 likes · 31 min read

How Ray Reinvents AI Data Pipelines for Massive Multimodal Inference

ByteDance Data Platform

Dec 23, 2025 · Artificial Intelligence

How Daft and Ray Supercharge Million‑Hour Video Processing for AI‑Powered Robotics

This article details a scalable, distributed pipeline that uses LAS AI Data Lake, Daft on Ray, and advanced video‑processing techniques—scene detection, splitting, frame sampling, filtering, and caption generation—to transform tens of millions of hours of robot‑captured video into high‑quality, searchable semantic data while dramatically boosting CPU and GPU utilization.

AI PipelineDaftDistributed Computing

0 likes · 21 min read

How Daft and Ray Supercharge Million‑Hour Video Processing for AI‑Powered Robotics

Big Data Technology Tribe

Nov 21, 2025 · Fundamentals

Mastering Ray: Core Concepts of Tasks, Actors, and Objects for Distributed Computing

This guide explains Ray's fundamental building blocks—including Tasks, Actors, remote Objects, Placement Groups, and environment dependencies—showing how to define, schedule, and retrieve distributed workloads with code examples and command‑line utilities.

ActorsDistributed ComputingObject Store

0 likes · 8 min read

Mastering Ray: Core Concepts of Tasks, Actors, and Objects for Distributed Computing

DataFunSummit

Sep 20, 2025 · Artificial Intelligence

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how WeChat’s Astra platform leverages the Ray distributed framework to manage million‑node AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost, and outlines the architecture that unifies Ray services across multiple Kubernetes clusters.

AI scalingAstra PlatformDistributed Computing

0 likes · 5 min read

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

DataFunSummit

Sep 18, 2025 · Artificial Intelligence

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how Tencent's WeChat team leveraged the Ray distributed computing framework within the Astra platform to tackle massive AI workloads, addressing challenges of scale, GPU diversity, operational complexity, and cost while outlining their architecture and practical insights.

AI InfrastructureAstra PlatformDistributed Computing

0 likes · 6 min read

DataFunSummit

Sep 13, 2025 · Artificial Intelligence

How Pinterest Scaled LLM Data Pipelines with Ray: Boosting Throughput and Cutting Costs

This article details how Pinterest’s senior staff engineer Dr. Luo leveraged the open‑source Ray framework to overcome LLM data‑preprocessing bottlenecks, describing the system’s architecture, key features such as map_batches, Carry‑Over Columns and Accumulators, and the dramatic performance and cost improvements achieved.

Data preprocessingDistributed ComputingLLM

0 likes · 12 min read

How Pinterest Scaled LLM Data Pipelines with Ray: Boosting Throughput and Cutting Costs

DataFunSummit

Sep 11, 2025 · Artificial Intelligence

How Ray Powers Massive AI Computing on WeChat: Lessons from Tencent

This article examines how Tencent leverages the Ray distributed framework within the Astra platform to handle WeChat's massive AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost while outlining the architecture and practical benefits.

AI scalingAstra PlatformDistributed Computing

0 likes · 5 min read

How Ray Powers Massive AI Computing on WeChat: Lessons from Tencent

DataFunTalk

Sep 10, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

The article presents Ant Group’s Ray‑based Ragent framework, detailing its background, motivation behind unified AI serving, and the four core modules—Profile, Memory, Planning, and Action—that together enable large‑language‑model agents for financial applications.

AI FrameworkAnt GroupLLM Agents

0 likes · 4 min read

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

DataFunSummit

Sep 9, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI agent framework, covering its background, motivation in the large‑model era, and a four‑module design (Profile, Memory, Planning, Action) that enables scalable LLM‑driven agents.

AI FrameworkAnt GroupLLM Agents

0 likes · 4 min read

How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray

DataFunSummit

Sep 8, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines LLM‑Based AI Agents on Ray

This article introduces Ant Group’s new Ray‑based distributed agent framework Ragent, outlines its background and motivation, and details the four core modules—Profile, Memory, Planning, and Action—that together enable sophisticated LLM‑driven AI agents for large‑scale applications.

AI agentsAnt GroupLLM

0 likes · 4 min read

How Ant Group’s Ragent Redefines LLM‑Based AI Agents on Ray

DataFunSummit

Sep 7, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI‑agent framework, covering its background, motivation, design and implementation, and detailing the four core modules—Profile, Memory, Planning, and Action—that enable large‑language‑model agents at massive scale.

AI agentsAnt GroupRagent

0 likes · 4 min read

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

DataFunTalk

Sep 5, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI Agent framework, detailing its background, motivations, and design, and explains the four essential modules—Profile, Memory, Planning, and Action—that enable scalable large‑language‑model agents for real‑world applications.

Ant GroupRay

0 likes · 5 min read

DataFunSummit

Sep 5, 2025 · Artificial Intelligence

How Ant Group’s Ragent Leverages Ray for Scalable AI Agents

This article introduces Ant Group’s Ragent, a Ray‑based distributed agent framework, covering its background, motivation, and design—including core modules like Profile, Memory, Planning, and Action—that enable large‑language‑model agents to operate at massive scale.

Ant GroupRagentRay

0 likes · 5 min read

How Ant Group’s Ragent Leverages Ray for Scalable AI Agents

DataFunTalk

Sep 5, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ray‑based distributed agent framework Ragent, outlines its background, motivation, and design, and details the four essential modules—Profile, Memory, Planning, and Action—that power large‑language‑model agents in large‑scale AI serving.

AI agentsAnt GroupLLM

0 likes · 5 min read

DataFunSummit

Sep 3, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article explains how Ant Group’s Ragent framework leverages Ray to create scalable, multi‑tenant AI agents, detailing its background, motivation, and design while outlining the core modules—Profile, Memory, Planning, and Action—that power large‑language‑model agents.

Ant GroupDistributed ComputingRay

0 likes · 5 min read

ByteDance Data Platform

Sep 3, 2025 · Artificial Intelligence

Revolutionizing AI Data Lakes: How Daft + Lance Enable Multimodal Processing

This article explores how the LAS team's AI‑driven data lake solution, built on Daft for lake computing and Lance for lake storage, tackles the emerging challenges of multimodal data handling, offering faster I/O, heterogeneous CPU‑GPU scheduling, and seamless integration for AI workloads.

AIDaftDistributed Computing

0 likes · 11 min read

Revolutionizing AI Data Lakes: How Daft + Lance Enable Multimodal Processing

DataFunSummit

Sep 2, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

This article introduces Ant Group’s Ray‑based distributed agent framework Ragent, outlines its background, motivation, and design, and breaks down the four essential modules—Profile, Memory, Planning, and Action—that enable large‑language‑model agents to operate in real‑world scenarios.

Ant GroupLLMRagent

0 likes · 5 min read

DataFunSummit

Aug 28, 2025 · Artificial Intelligence

How We Scaled AI Compute to Millions of Nodes with Ray on WeChat

This article explains how Tencent's WeChat team built the Astra platform on Ray to manage millions of AI compute nodes, addressing challenges of massive scale, heterogeneous GPU resources, low‑priority node instability, deployment complexity, and cost, while detailing architecture, scheduling strategies, and practical usage examples.

AI scalingDistributed ComputingRay

0 likes · 21 min read

How We Scaled AI Compute to Millions of Nodes with Ray on WeChat

Ops Development & AI Practice

Jul 29, 2025 · Artificial Intelligence

How Ray Transforms Distributed Training for Large Language Models

In the era of data‑driven AI, Ray offers an open‑source unified compute framework that abstracts distributed system complexity, enabling developers to seamlessly scale Python code from a laptop to large GPU clusters, and provides the Ray AI Runtime (AIR) with libraries such as Ray Data, Train, Tune, and Serve to accelerate LLM training, hyper‑parameter tuning, and model serving.

AI RuntimeDistributed ComputingLLM training

0 likes · 10 min read

How Ray Transforms Distributed Training for Large Language Models

Alibaba Cloud Infrastructure

Jul 18, 2025 · Information Security

Securing Ray Clusters on Alibaba Cloud ACK: Best Practices and Configurations

This guide details comprehensive security best practices for deploying Ray clusters on Alibaba Cloud ACK, covering TLS communication, namespace isolation, resource quotas, RBAC, security contexts, image scanning, resource limits, RRSA integration, multi‑cluster isolation, and recommendations for protecting dashboards and services from unauthorized access.

ACKRaykubernetes

0 likes · 18 min read

Securing Ray Clusters on Alibaba Cloud ACK: Best Practices and Configurations

Alibaba Cloud Big Data AI Platform

Jul 2, 2025 · Artificial Intelligence

Boost Your AI Model Training with Ray on Alibaba Cloud PAI – A Step‑by‑Step Guide

This article introduces the integration of the open‑source distributed AI framework Ray with Alibaba Cloud’s PAI platform, detailing its advantages, architecture, fault‑tolerance, resource management, and provides a comprehensive step‑by‑step tutorial—including configuration, command examples, and code snippets—to efficiently run Ray jobs on PAI‑DLC.

Alibaba CloudPAIRay

0 likes · 11 min read

Boost Your AI Model Training with Ray on Alibaba Cloud PAI – A Step‑by‑Step Guide

Alibaba Cloud Infrastructure

Jun 3, 2025 · Artificial Intelligence

Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies

This article explains how to build a flexible machine‑learning infrastructure on Alibaba Cloud ACK using Ray and KubeRay, covering Ray's core components, AI libraries, deployment options on VMs and Kubernetes, code examples for data processing, model serving, and advanced scheduling and quota management techniques.

AIAlibaba CloudDistributed Computing

0 likes · 17 min read

Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies

Baobao Algorithm Notes

May 20, 2025 · Artificial Intelligence

Boosting RLHF Training Efficiency with Asynchronous vLLM and Ray Integration

This article explains how an asynchronous RLHF pipeline built on vLLM, Ray, and OpenRLHF dramatically reduces training bottlenecks by decoupling inference, environment interaction, and model updates, and provides detailed implementation code and design choices for scalable reinforcement learning.

OpenRLHFRLHFRay

0 likes · 11 min read

Boosting RLHF Training Efficiency with Asynchronous vLLM and Ray Integration

Baobao Algorithm Notes

Apr 20, 2025 · Artificial Intelligence

Can Agentic RL Transform LLM Training? A Deep Dive into VeRL and Search‑R1

This article explores the emerging concept of agentic reinforcement learning for large language models, analyzes ByteDance's VeRL and the Search‑R1 frameworks, identifies practical challenges in tool integration and environment parallelism, and proposes a unified, Ray‑based architecture to enable scalable, high‑quality RL environments.

Rayenvironment designsearch-r1

0 likes · 11 min read

Can Agentic RL Transform LLM Training? A Deep Dive into VeRL and Search‑R1

AntData

Apr 3, 2025 · Artificial Intelligence

Ray Flow Insight: Visualizing and Debugging Distributed AI Applications

Ray Flow Insight is an Ant Group open‑source tool that visualizes Ray's distributed programming primitives—Actors, Tasks, and Objects—to turn complex reinforcement‑learning systems from opaque "black boxes" into transparent, debuggable workflows, providing logical, physical, distributed stack, and flame‑graph views for performance analysis and optimization.

AIRayRay Flow Insight

0 likes · 32 min read

Ray Flow Insight: Visualizing and Debugging Distributed AI Applications

Volcano Engine Developer Services

Mar 5, 2025 · Artificial Intelligence

How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB

This article introduces DeepSeek Smallpond, a lightweight yet high‑performance AI data‑processing engine built on Ray and DuckDB, explains its dual Dataframe and LogicalPlan APIs, showcases integration with Volcano Engine's AI Data Lake LAS, and provides practical code examples for distributed processing, multimodal storage, and RAG pipelines.

AI data processingData LakeDistributed Computing

0 likes · 18 min read

How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB

DataFunTalk

Jan 11, 2025 · Artificial Intelligence

Ragent: Ant Group’s Ray‑Based Distributed Agent Framework

This article introduces Ragent, Ant Group’s Ray‑powered distributed agent framework, covering its background, motivation, design, implementation details, multi‑agent capabilities, and future directions for large‑model AI applications.

AIAgent frameworkDistributed Agents

0 likes · 14 min read

Ragent: Ant Group’s Ray‑Based Distributed Agent Framework

Rare Earth Juejin Tech Community

Nov 29, 2024 · Big Data

How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray

The article details ByteDance's use of Ray and RayData to construct scalable audio and video data processing pipelines for multimodal AI models, addressing challenges of massive data volume, resource constraints, and fault tolerance through pipeline design, RayCore enhancements, and custom scheduling optimizations.

AIBig DataByteDance

0 likes · 16 min read

How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray

Smart Era Software Development

Nov 4, 2024 · Artificial Intelligence

How eBay’s Data+AI Platform Leverages Ray for Faster Model Development and Deployment

eBay upgraded its AI infrastructure by adopting Ray, cutting model development and deployment time by roughly 50% and boosting GPU utilization from about 10% to over 75% through automated cluster scaling and high‑throughput batch inference.

AI InfrastructureData+AIDistributed Computing

0 likes · 5 min read

How eBay’s Data+AI Platform Leverages Ray for Faster Model Development and Deployment

WeChat Backend Team

Oct 23, 2024 · Artificial Intelligence

How We Scaled AI Computing in WeChat with Ray: From Challenges to AstraRay

This article details the AI computing challenges faced by WeChat, explains why the Ray distributed engine was chosen, and describes the design and large‑scale deployment of the AstraRay platform—including scheduling, resource management, and multi‑model support—to achieve low‑cost, high‑efficiency AI services.

AI platformAstraRayDistributed Computing

0 likes · 20 min read

How We Scaled AI Computing in WeChat with Ray: From Challenges to AstraRay

Python Crawling & Data Mining

Sep 4, 2024 · Backend Development

How to Resolve Modin pandas Excel Import Errors: Engine Setup and File Naming Tips

This article walks through a common Modin pandas Excel import error, explains why installing an execution engine like Ray or Dask is required, and shows how renaming a conflicting local file resolves the ModuleNotFoundError, enabling successful data loading.

Raydaskmodin

0 likes · 4 min read

How to Resolve Modin pandas Excel Import Errors: Engine Setup and File Naming Tips

Didi Tech

Jan 25, 2024 · Artificial Intelligence

Ray-native XGBoost Training Platform: Architecture, Performance, and Technical Challenges

Didi’s new Ray‑native XGBoost training platform replaces the fault‑prone Spark solution with a fully Pythonic, fault‑tolerant architecture that leverages Ray’s autoscaling and gang‑scheduling, delivering 2–6× speedups, reduced failure rates, efficient sparse‑vector handling, scalable hyper‑parameter search, and improved resource utilization for large‑scale machine‑learning workloads.

MLOpsRayXGBoost

0 likes · 20 min read

Ray-native XGBoost Training Platform: Architecture, Performance, and Technical Challenges

Volcano Engine Developer Services

Dec 21, 2023 · Artificial Intelligence

How ByteDance Scales AI Workloads with Ray, KubeRay, and Kueue

This article explains why Ray is popular among AI researchers, how ByteDance uses KubeRay to host Ray applications, and how Kueue manages and schedules RayJob workloads, covering Ray's architecture, KubeRay components, real-world use cases, and job scheduling strategies.

AIDistributed ComputingKubeRay

0 likes · 12 min read

How ByteDance Scales AI Workloads with Ray, KubeRay, and Kueue

DataFunTalk

Aug 22, 2023 · Artificial Intelligence

Building Complex Distributed Systems with Ray: An AutoML Case Study and Cloud‑Native Deployment

This article explains how the Ray distributed computing engine simplifies the design, deployment, and operation of complex cloud‑native distributed systems—illustrated through an AutoML service example—by detailing system complexity, Ray’s core concepts, resource customization, runtime environments, monitoring, and ecosystem integrations.

AIAutoMLCloud Native

0 likes · 26 min read

Building Complex Distributed Systems with Ray: An AutoML Case Study and Cloud‑Native Deployment

AntTech

Jun 27, 2023 · Artificial Intelligence

Fanglue: An Interactive System for Decision Rule Crafting in Fraud Detection

Fanglue is an interactive, web‑based rule‑development platform that integrates expert domain knowledge with distributed AI algorithms to efficiently generate and evaluate decision rules for anti‑fraud scenarios, leveraging Ray for real‑time processing and achieving VLDB‑2023 acceptance.

AIDistributed ComputingRay

0 likes · 10 min read

Fanglue: An Interactive System for Decision Rule Crafting in Fraud Detection

Volcano Engine Developer Services

Jun 20, 2023 · Artificial Intelligence

Boosting Large-Model Offline Inference with Ray and Cloud-Native Architecture

Large-model offline (batch) inference, which processes massive data on billion-parameter models, faces GPU memory and distributed scheduling challenges; this article explains how Ray's cloud-native framework, model parallelism, and Ray Datasets pipelines address these issues, improve throughput, and enable elastic, efficient GPU utilization.

GPU UtilizationRaycloud-native

0 likes · 16 min read

Boosting Large-Model Offline Inference with Ray and Cloud-Native Architecture

ByteDance Cloud Native

Jun 13, 2023 · Artificial Intelligence

How Ray and Cloud‑Native Tech Supercharge Large‑Model Offline Inference

This article explains the challenges of large‑model offline (batch) inference, such as GPU memory limits and distributed scheduling, and shows how Ray’s cloud‑native architecture, model partitioning, and Ray Datasets can be used to build efficient, elastic inference frameworks deployed with KubeRay.

Distributed ComputingGPU memoryRay

0 likes · 18 min read

How Ray and Cloud‑Native Tech Supercharge Large‑Model Offline Inference

AntTech

Jan 3, 2023 · Artificial Intelligence

Ray: The Distributed Framework Powering the Next Generation of Generative AI

Ray, an open‑source distributed computing framework originally created by Berkeley's RiseLab and heavily contributed to by Ant Group, underpins many AI workloads—from privacy‑preserving federated learning to large‑scale model training for ChatGPT—making it a critical yet often overlooked engine of the generative AI revolution.

OpenAIRaymachine learning

0 likes · 7 min read

Ray: The Distributed Framework Powering the Next Generation of Generative AI

Code DAO

Dec 17, 2021 · Artificial Intelligence

How to Accelerate XGBoost Training with Tree Methods, Cloud Computing, and Ray

The article explains why XGBoost training can be slow despite its speed focus and presents three acceleration techniques—choosing an optimal tree_method, leveraging cloud resources for larger memory, and using Ray for distributed training—complete with code examples and benchmark results.

Cloud ComputingRayXGBoost

0 likes · 5 min read

How to Accelerate XGBoost Training with Tree Methods, Cloud Computing, and Ray

Code DAO

Dec 17, 2021 · Artificial Intelligence

How to Scale XGBoost with Ray for Distributed Multi‑GPU Training

XGBoost‑Ray provides a fault‑tolerant, multi‑node, multi‑GPU backend for XGBoost that integrates seamlessly with Ray Tune, supports distributed data loading, and can be enabled with only three code changes, enabling scalable training and inference on large clusters.

GPURayRay Tune

0 likes · 8 min read

How to Scale XGBoost with Ray for Distributed Multi‑GPU Training

AntTech

Mar 23, 2021 · Big Data

From MapReduce to Ray: The Evolution of Big Data Computing Engines and Career Opportunities

This article traces the history of big‑data computing engines—from early MapReduce and Hadoop through Spark, Storm, Flink, and the newer Ray—explaining their technical advances, real‑world applications in AI and finance, and why graduates should consider a career in this rapidly evolving field.

AIBig DataCareer

0 likes · 16 min read

From MapReduce to Ray: The Evolution of Big Data Computing Engines and Career Opportunities

AntTech

Mar 1, 2021 · Artificial Intelligence

Building a Fusion Engine with Ray: Ant Group’s Large‑Scale Distributed Computing Practices

The article explains how Ant Group tackles the challenge of tightly integrating multiple computing paradigms by building a Ray‑based fusion engine, detailing its architecture, features, large‑scale applications in online machine learning and parallel processing, and outlining future development and recruitment opportunities.

Ant GroupFusion EngineRay

0 likes · 10 min read

Building a Fusion Engine with Ray: Ant Group’s Large‑Scale Distributed Computing Practices

AntTech

Dec 4, 2019 · Artificial Intelligence

Ant Financial’s Online Learning System Built on Ray: Architecture, Challenges, and Future Plans

The interview details how Ant Financial transitioned from offline to online machine learning by adopting the Ray distributed engine, describing their open architecture, fusion computing approach, technical advantages, encountered pitfalls, and plans to open‑source the system for broader AI and big‑data use.

AIAnt FinancialBig Data

0 likes · 15 min read

Ant Financial’s Online Learning System Built on Ray: Architecture, Challenges, and Future Plans

AntTech

Nov 1, 2019 · Artificial Intelligence

Building a Unified Online Machine Learning Platform with Ray for Alipay’s “Collect Five Blessings” Campaign

The article describes how Alipay tackled the cold‑start, conversion, and user‑experience challenges of its time‑limited “Collect Five Blessings” activity by designing a unified online machine‑learning system based on the Ray distributed‑computing framework, emphasizing stability, efficiency, simplicity, multi‑language support, and fault‑tolerant scheduling.

AlipayRayonline machine learning

0 likes · 11 min read

Building a Unified Online Machine Learning Platform with Ray for Alipay’s “Collect Five Blessings” Campaign