Tagged articles
55 articles
Page 1 of 1
DataFunSummit
DataFunSummit
May 11, 2026 · Artificial Intelligence

How Lance Powers Enterprise Multimodal AI Data Lakes

The article analyzes why 74% of AI projects fail due to feedback gaps and data silos, explains how the open‑source Lance format addresses these issues with unified multimodal storage, outlines a layered Lance‑on‑Ray architecture, and details three real‑world practices—implicit feedback loops, GPU‑accelerated self‑evolution, and semantic knowledge‑graph evolution—to boost R&D efficiency.

CAGRADaftData Lake
0 likes · 13 min read
How Lance Powers Enterprise Multimodal AI Data Lakes
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 22, 2026 · Artificial Intelligence

How to Build an End‑to‑End Hand‑Video to VLA Data Pipeline on Alibaba Cloud PAI with Data‑Juicer

This article details a step‑by‑step, distributed pipeline built on Alibaba Cloud PAI using Data‑Juicer and Ray that transforms raw egocentric hand videos into LeRobot v2.0‑compatible Vision‑Language‑Action (VLA) training data, covering video splitting, frame extraction, camera calibration, 3D hand reconstruction, pose estimation, action captioning, and export, with code snippets, performance numbers, and references.

Data-JuicerEmbodied AILerobot
0 likes · 29 min read
How to Build an End‑to‑End Hand‑Video to VLA Data Pipeline on Alibaba Cloud PAI with Data‑Juicer
Ctrip Technology
Ctrip Technology
Apr 16, 2026 · Big Data

How Ray + DuckDB Cut 9B-Row Attribution Queries from 40s to 15s

When attribution analysis on over 900 million rows slowed to more than 40 seconds and threatened cluster stability, Ctrip's smart attribution team rebuilt the architecture with Ray and DuckDB, achieving sub‑15‑second query times, 160 % performance gain, and complete resource isolation.

Attribution AnalysisBig DataDuckDB
0 likes · 22 min read
How Ray + DuckDB Cut 9B-Row Attribution Queries from 40s to 15s
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 13, 2026 · Artificial Intelligence

How to Build a Scalable Multimodal Data Pipeline with Alibaba Cloud PAI and DataJuicer

This article details a step‑by‑step guide for constructing a high‑performance multimodal data pipeline—covering video segmentation, duration filtering, frame extraction, safety and aesthetic scoring, and caption generation—using Alibaba Cloud PAI, Paimon, DataJuicer, and distributed frameworks like Ray and Daft, with real‑world performance metrics.

AIAlibaba CloudDaft
0 likes · 30 min read
How to Build a Scalable Multimodal Data Pipeline with Alibaba Cloud PAI and DataJuicer
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 3, 2026 · Industry Insights

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

This article analyzes how the Daft‑Ray‑Lance stack tackles the challenges of multimodal AI workloads by offering a high‑performance Rust engine, adaptive back‑pressure, seamless Ray‑based distributed scheduling, and a storage format optimized for random access, vector indexing, and zero‑copy schema evolution, complete with benchmark comparisons and practical deployment guidance.

BenchmarkDaftLance
0 likes · 21 min read
Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines
Big Data Technology Tribe
Big Data Technology Tribe
Mar 15, 2026 · Databases

How to Build Distributed Scalar Indexes with Lance and Ray

This guide explains the end‑to‑end workflow for constructing a distributed scalar index in Lance by orchestrating validation, fragment sharding, worker‑level indexing via Ray, and final metadata merging, complete with code snippets and detailed step‑by‑step instructions.

DatasetsLancePython
0 likes · 12 min read
How to Build Distributed Scalar Indexes with Lance and Ray
DataFunSummit
DataFunSummit
Mar 3, 2026 · Backend Development

How Ant Group Supercharged AI Data Pipelines with Ray: Boosting Index Build Speed and Reliability

This article details Ant Group's use of the Ray distributed computing framework to accelerate massive data indexing, migrate a C++ engine to Ray, implement elastic resource scheduling, improve long‑tail task efficiency, and build a robust RAG operator system with comprehensive governance, achieving up to 2× speed gains and 99.9% success rates.

Backend DevelopmentRayai data pipeline
0 likes · 15 min read
How Ant Group Supercharged AI Data Pipelines with Ray: Boosting Index Build Speed and Reliability
DataFunSummit
DataFunSummit
Mar 2, 2026 · Artificial Intelligence

How Data-Juicer Powers Multi‑Modal Data Processing for Large Language Models

This article explains the evolution of Data‑Juicer from a pure‑text preprocessing tool to a full‑stack multi‑modal data engine, detailing its architecture, operator library, Ray‑based distributed execution, performance benchmarks, integration with AI agents, and roadmap for future AI‑centric data workflows.

Data-JuicerRaydata-processing
0 likes · 31 min read
How Data-Juicer Powers Multi‑Modal Data Processing for Large Language Models
Big Data Technology Tribe
Big Data Technology Tribe
Mar 1, 2026 · Backend Development

How Ray Data Turns Logical Plans into Executable Workflows – A Deep Dive

This article provides a comprehensive, step‑by‑step explanation of Ray Data's LogicalPlan architecture, covering its class hierarchy, core methods, logical operators, optimization rules, planning from logical to physical operators, execution binding, metadata inference, lineage serialization, and the full file/module index for developers building scalable data pipelines.

@DataBackendLogicalPlan
0 likes · 35 min read
How Ray Data Turns Logical Plans into Executable Workflows – A Deep Dive
Data STUDIO
Data STUDIO
Feb 21, 2026 · Big Data

Boost Python Performance Up to 50× Without Changing Your Code

Python’s reputation for slowness can be overcome by selecting the right tools—Numba, PyPy, CuPy, JAX, Ray, Joblib, async I/O, memory profilers, and big‑data frameworks—delivering speedups from 6× to over 50× with minimal or no code modifications.

AsyncGPUProfiling
0 likes · 22 min read
Boost Python Performance Up to 50× Without Changing Your Code
JD Tech
JD Tech
Jan 31, 2026 · Artificial Intelligence

How JD's 9N‑LLM Engine Powers Scalable Generative Recommendation at Massive Scale

This article details JD Retail's 9N‑LLM unified training framework that tackles the massive data, hardware heterogeneity, and algorithmic challenges of generative recommendation by integrating TensorFlow and PyTorch, supporting GPU/NPU, and delivering high‑throughput sample processing, sparse/dense optimization, and flexible reinforcement‑learning capabilities.

GPU/NPURaylarge-scale AI
0 likes · 26 min read
How JD's 9N‑LLM Engine Powers Scalable Generative Recommendation at Massive Scale
DataFunSummit
DataFunSummit
Jan 18, 2026 · Big Data

How Ray Reinvents AI Data Pipelines for Massive Multimodal Inference

This article examines the shortcomings of traditional big‑data engines for AI workloads, presents a Ray‑based heterogeneous fusion architecture that unifies CPU/GPU scheduling, Python ecosystems, and streaming‑batch processing, and details fault‑tolerance, checkpointing, compute‑storage separation, resource‑utilization, scalability, and observability improvements that enable thousands of nodes and dramatically higher GPU efficiency.

Big DataCloud NativeRay
0 likes · 31 min read
How Ray Reinvents AI Data Pipelines for Massive Multimodal Inference
ByteDance Data Platform
ByteDance Data Platform
Dec 23, 2025 · Artificial Intelligence

How Daft and Ray Supercharge Million‑Hour Video Processing for AI‑Powered Robotics

This article details a scalable, distributed pipeline that uses LAS AI Data Lake, Daft on Ray, and advanced video‑processing techniques—scene detection, splitting, frame sampling, filtering, and caption generation—to transform tens of millions of hours of robot‑captured video into high‑quality, searchable semantic data while dramatically boosting CPU and GPU utilization.

AI PipelineDaftRay
0 likes · 21 min read
How Daft and Ray Supercharge Million‑Hour Video Processing for AI‑Powered Robotics
DataFunSummit
DataFunSummit
Sep 20, 2025 · Artificial Intelligence

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how WeChat’s Astra platform leverages the Ray distributed framework to manage million‑node AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost, and outlines the architecture that unifies Ray services across multiple Kubernetes clusters.

AI scalingAstra PlatformGPU Management
0 likes · 5 min read
How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments
DataFunSummit
DataFunSummit
Sep 18, 2025 · Artificial Intelligence

How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments

This article examines how Tencent's WeChat team leveraged the Ray distributed computing framework within the Astra platform to tackle massive AI workloads, addressing challenges of scale, GPU diversity, operational complexity, and cost while outlining their architecture and practical insights.

AI InfrastructureAstra PlatformRay
0 likes · 6 min read
How We Scaled WeChat AI Services with Ray: Lessons from Million‑Node Deployments
DataFunSummit
DataFunSummit
Sep 13, 2025 · Artificial Intelligence

How Pinterest Scaled LLM Data Pipelines with Ray: Boosting Throughput and Cutting Costs

This article details how Pinterest’s senior staff engineer Dr. Luo leveraged the open‑source Ray framework to overcome LLM data‑preprocessing bottlenecks, describing the system’s architecture, key features such as map_batches, Carry‑Over Columns and Accumulators, and the dramatic performance and cost improvements achieved.

LLMPerformance OptimizationPinterest
0 likes · 12 min read
How Pinterest Scaled LLM Data Pipelines with Ray: Boosting Throughput and Cutting Costs
DataFunSummit
DataFunSummit
Sep 11, 2025 · Artificial Intelligence

How Ray Powers Massive AI Computing on WeChat: Lessons from Tencent

This article examines how Tencent leverages the Ray distributed framework within the Astra platform to handle WeChat's massive AI workloads, addressing challenges of scale, heterogeneous GPU resources, operational complexity, and cost while outlining the architecture and practical benefits.

AI scalingAstra PlatformRay
0 likes · 5 min read
How Ray Powers Massive AI Computing on WeChat: Lessons from Tencent
DataFunTalk
DataFunTalk
Sep 10, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

The article presents Ant Group’s Ray‑based Ragent framework, detailing its background, motivation behind unified AI serving, and the four core modules—Profile, Memory, Planning, and Action—that together enable large‑language‑model agents for financial applications.

AI FrameworkAnt GroupDistributed Systems
0 likes · 4 min read
How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents
DataFunSummit
DataFunSummit
Sep 9, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI agent framework, covering its background, motivation in the large‑model era, and a four‑module design (Profile, Memory, Planning, Action) that enables scalable LLM‑driven agents.

AI FrameworkAnt GroupDistributed Systems
0 likes · 4 min read
How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray
DataFunSummit
DataFunSummit
Sep 8, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines LLM‑Based AI Agents on Ray

This article introduces Ant Group’s new Ray‑based distributed agent framework Ragent, outlines its background and motivation, and details the four core modules—Profile, Memory, Planning, and Action—that together enable sophisticated LLM‑driven AI agents for large‑scale applications.

AI agentsAnt GroupDistributed Systems
0 likes · 4 min read
How Ant Group’s Ragent Redefines LLM‑Based AI Agents on Ray
DataFunSummit
DataFunSummit
Sep 7, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI‑agent framework, covering its background, motivation, design and implementation, and detailing the four core modules—Profile, Memory, Planning, and Action—that enable large‑language‑model agents at massive scale.

AI agentsAnt GroupDistributed Systems
0 likes · 4 min read
Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray
DataFunTalk
DataFunTalk
Sep 5, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI Agent framework, detailing its background, motivations, and design, and explains the four essential modules—Profile, Memory, Planning, and Action—that enable scalable large‑language‑model agents for real‑world applications.

Ant GroupRay
0 likes · 5 min read
Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray
DataFunSummit
DataFunSummit
Sep 5, 2025 · Artificial Intelligence

How Ant Group’s Ragent Leverages Ray for Scalable AI Agents

This article introduces Ant Group’s Ragent, a Ray‑based distributed agent framework, covering its background, motivation, and design—including core modules like Profile, Memory, Planning, and Action—that enable large‑language‑model agents to operate at massive scale.

Ant GroupRagentRay
0 likes · 5 min read
How Ant Group’s Ragent Leverages Ray for Scalable AI Agents
DataFunTalk
DataFunTalk
Sep 5, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ray‑based distributed agent framework Ragent, outlines its background, motivation, and design, and details the four essential modules—Profile, Memory, Planning, and Action—that power large‑language‑model agents in large‑scale AI serving.

AI agentsAnt GroupDistributed Systems
0 likes · 5 min read
Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray
DataFunSummit
DataFunSummit
Sep 3, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article explains how Ant Group’s Ragent framework leverages Ray to create scalable, multi‑tenant AI agents, detailing its background, motivation, and design while outlining the core modules—Profile, Memory, Planning, and Action—that power large‑language‑model agents.

Ant GroupRaydistributed computing
0 likes · 5 min read
Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray
DataFunSummit
DataFunSummit
Sep 2, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

This article introduces Ant Group’s Ray‑based distributed agent framework Ragent, outlines its background, motivation, and design, and breaks down the four essential modules—Profile, Memory, Planning, and Action—that enable large‑language‑model agents to operate in real‑world scenarios.

Ant GroupDistributed SystemsLLM
0 likes · 5 min read
How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents
DataFunSummit
DataFunSummit
Aug 28, 2025 · Artificial Intelligence

How We Scaled AI Compute to Millions of Nodes with Ray on WeChat

This article explains how Tencent's WeChat team built the Astra platform on Ray to manage millions of AI compute nodes, addressing challenges of massive scale, heterogeneous GPU resources, low‑priority node instability, deployment complexity, and cost, while detailing architecture, scheduling strategies, and practical usage examples.

AI scalingCluster ManagementRay
0 likes · 21 min read
How We Scaled AI Compute to Millions of Nodes with Ray on WeChat
Ops Development & AI Practice
Ops Development & AI Practice
Jul 29, 2025 · Artificial Intelligence

How Ray Transforms Distributed Training for Large Language Models

In the era of data‑driven AI, Ray offers an open‑source unified compute framework that abstracts distributed system complexity, enabling developers to seamlessly scale Python code from a laptop to large GPU clusters, and provides the Ray AI Runtime (AIR) with libraries such as Ray Data, Train, Tune, and Serve to accelerate LLM training, hyper‑parameter tuning, and model serving.

AI RuntimeLLM trainingPython
0 likes · 10 min read
How Ray Transforms Distributed Training for Large Language Models
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jul 18, 2025 · Information Security

Securing Ray Clusters on Alibaba Cloud ACK: Best Practices and Configurations

This guide details comprehensive security best practices for deploying Ray clusters on Alibaba Cloud ACK, covering TLS communication, namespace isolation, resource quotas, RBAC, security contexts, image scanning, resource limits, RRSA integration, multi‑cluster isolation, and recommendations for protecting dashboards and services from unauthorized access.

ACKKubernetesRay
0 likes · 18 min read
Securing Ray Clusters on Alibaba Cloud ACK: Best Practices and Configurations
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 2, 2025 · Artificial Intelligence

Boost Your AI Model Training with Ray on Alibaba Cloud PAI – A Step‑by‑Step Guide

This article introduces the integration of the open‑source distributed AI framework Ray with Alibaba Cloud’s PAI platform, detailing its advantages, architecture, fault‑tolerance, resource management, and provides a comprehensive step‑by‑step tutorial—including configuration, command examples, and code snippets—to efficiently run Ray jobs on PAI‑DLC.

Alibaba CloudPAIRay
0 likes · 11 min read
Boost Your AI Model Training with Ray on Alibaba Cloud PAI – A Step‑by‑Step Guide
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jun 3, 2025 · Artificial Intelligence

Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies

This article explains how to build a flexible machine‑learning infrastructure on Alibaba Cloud ACK using Ray and KubeRay, covering Ray's core components, AI libraries, deployment options on VMs and Kubernetes, code examples for data processing, model serving, and advanced scheduling and quota management techniques.

AIAlibaba CloudKubeRay
0 likes · 17 min read
Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies
Baobao Algorithm Notes
Baobao Algorithm Notes
Apr 20, 2025 · Artificial Intelligence

Can Agentic RL Transform LLM Training? A Deep Dive into VeRL and Search‑R1

This article explores the emerging concept of agentic reinforcement learning for large language models, analyzes ByteDance's VeRL and the Search‑R1 frameworks, identifies practical challenges in tool integration and environment parallelism, and proposes a unified, Ray‑based architecture to enable scalable, high‑quality RL environments.

Rayenvironment designsearch-r1
0 likes · 11 min read
Can Agentic RL Transform LLM Training? A Deep Dive into VeRL and Search‑R1
AntData
AntData
Apr 3, 2025 · Artificial Intelligence

Ray Flow Insight: Visualizing and Debugging Distributed AI Applications

Ray Flow Insight is an Ant Group open‑source tool that visualizes Ray's distributed programming primitives—Actors, Tasks, and Objects—to turn complex reinforcement‑learning systems from opaque "black boxes" into transparent, debuggable workflows, providing logical, physical, distributed stack, and flame‑graph views for performance analysis and optimization.

AIDebuggingDistributed Systems
0 likes · 32 min read
Ray Flow Insight: Visualizing and Debugging Distributed AI Applications
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 5, 2025 · Artificial Intelligence

How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB

This article introduces DeepSeek Smallpond, a lightweight yet high‑performance AI data‑processing engine built on Ray and DuckDB, explains its dual Dataframe and LogicalPlan APIs, showcases integration with Volcano Engine's AI Data Lake LAS, and provides practical code examples for distributed processing, multimodal storage, and RAG pipelines.

AI data processingData LakeDuckDB
0 likes · 18 min read
How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB
DataFunTalk
DataFunTalk
Jan 11, 2025 · Artificial Intelligence

Ragent: Ant Group’s Ray‑Based Distributed Agent Framework

This article introduces Ragent, Ant Group’s Ray‑powered distributed agent framework, covering its background, motivation, design, implementation details, multi‑agent capabilities, and future directions for large‑model AI applications.

AIAgent FrameworkDistributed Agents
0 likes · 14 min read
Ragent: Ant Group’s Ray‑Based Distributed Agent Framework
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 29, 2024 · Big Data

How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray

The article details ByteDance's use of Ray and RayData to construct scalable audio and video data processing pipelines for multimodal AI models, addressing challenges of massive data volume, resource constraints, and fault tolerance through pipeline design, RayCore enhancements, and custom scheduling optimizations.

AIBig DataByteDance
0 likes · 16 min read
How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray
WeChat Backend Team
WeChat Backend Team
Oct 23, 2024 · Artificial Intelligence

How We Scaled AI Computing in WeChat with Ray: From Challenges to AstraRay

This article details the AI computing challenges faced by WeChat, explains why the Ray distributed engine was chosen, and describes the design and large‑scale deployment of the AstraRay platform—including scheduling, resource management, and multi‑model support—to achieve low‑cost, high‑efficiency AI services.

AI PlatformAstraRayRay
0 likes · 20 min read
How We Scaled AI Computing in WeChat with Ray: From Challenges to AstraRay
Didi Tech
Didi Tech
Jan 25, 2024 · Artificial Intelligence

Ray-native XGBoost Training Platform: Architecture, Performance, and Technical Challenges

Didi’s new Ray‑native XGBoost training platform replaces the fault‑prone Spark solution with a fully Pythonic, fault‑tolerant architecture that leverages Ray’s autoscaling and gang‑scheduling, delivering 2–6× speedups, reduced failure rates, efficient sparse‑vector handling, scalable hyper‑parameter search, and improved resource utilization for large‑scale machine‑learning workloads.

MLOpsRayXGBoost
0 likes · 20 min read
Ray-native XGBoost Training Platform: Architecture, Performance, and Technical Challenges
DataFunTalk
DataFunTalk
Aug 22, 2023 · Artificial Intelligence

Building Complex Distributed Systems with Ray: An AutoML Case Study and Cloud‑Native Deployment

This article explains how the Ray distributed computing engine simplifies the design, deployment, and operation of complex cloud‑native distributed systems—illustrated through an AutoML service example—by detailing system complexity, Ray’s core concepts, resource customization, runtime environments, monitoring, and ecosystem integrations.

AIAutoMLCloud Native
0 likes · 26 min read
Building Complex Distributed Systems with Ray: An AutoML Case Study and Cloud‑Native Deployment
AntTech
AntTech
Jun 27, 2023 · Artificial Intelligence

Fanglue: An Interactive System for Decision Rule Crafting in Fraud Detection

Fanglue is an interactive, web‑based rule‑development platform that integrates expert domain knowledge with distributed AI algorithms to efficiently generate and evaluate decision rules for anti‑fraud scenarios, leveraging Ray for real‑time processing and achieving VLDB‑2023 acceptance.

AIRayVLDB2023
0 likes · 10 min read
Fanglue: An Interactive System for Decision Rule Crafting in Fraud Detection
Volcano Engine Developer Services
Volcano Engine Developer Services
Jun 20, 2023 · Artificial Intelligence

Boosting Large-Model Offline Inference with Ray and Cloud-Native Architecture

Large-model offline (batch) inference, which processes massive data on billion-parameter models, faces GPU memory and distributed scheduling challenges; this article explains how Ray's cloud-native framework, model parallelism, and Ray Datasets pipelines address these issues, improve throughput, and enable elastic, efficient GPU utilization.

GPU utilizationRaycloud-native
0 likes · 16 min read
Boosting Large-Model Offline Inference with Ray and Cloud-Native Architecture
ByteDance Cloud Native
ByteDance Cloud Native
Jun 13, 2023 · Artificial Intelligence

How Ray and Cloud‑Native Tech Supercharge Large‑Model Offline Inference

This article explains the challenges of large‑model offline (batch) inference, such as GPU memory limits and distributed scheduling, and shows how Ray’s cloud‑native architecture, model partitioning, and Ray Datasets can be used to build efficient, elastic inference frameworks deployed with KubeRay.

GPU MemoryLarge ModelRay
0 likes · 18 min read
How Ray and Cloud‑Native Tech Supercharge Large‑Model Offline Inference
AntTech
AntTech
Jan 3, 2023 · Artificial Intelligence

Ray: The Distributed Framework Powering the Next Generation of Generative AI

Ray, an open‑source distributed computing framework originally created by Berkeley's RiseLab and heavily contributed to by Ant Group, underpins many AI workloads—from privacy‑preserving federated learning to large‑scale model training for ChatGPT—making it a critical yet often overlooked engine of the generative AI revolution.

OpenAIRaymachine learning
0 likes · 7 min read
Ray: The Distributed Framework Powering the Next Generation of Generative AI
Code DAO
Code DAO
Dec 17, 2021 · Artificial Intelligence

How to Accelerate XGBoost Training with Tree Methods, Cloud Computing, and Ray

The article explains why XGBoost training can be slow despite its speed focus and presents three acceleration techniques—choosing an optimal tree_method, leveraging cloud resources for larger memory, and using Ray for distributed training—complete with code examples and benchmark results.

Distributed TrainingRayXGBoost
0 likes · 5 min read
How to Accelerate XGBoost Training with Tree Methods, Cloud Computing, and Ray
Code DAO
Code DAO
Dec 17, 2021 · Artificial Intelligence

How to Scale XGBoost with Ray for Distributed Multi‑GPU Training

XGBoost‑Ray provides a fault‑tolerant, multi‑node, multi‑GPU backend for XGBoost that integrates seamlessly with Ray Tune, supports distributed data loading, and can be enabled with only three code changes, enabling scalable training and inference on large clusters.

Distributed TrainingGPURay
0 likes · 8 min read
How to Scale XGBoost with Ray for Distributed Multi‑GPU Training
AntTech
AntTech
Mar 1, 2021 · Artificial Intelligence

Building a Fusion Engine with Ray: Ant Group’s Large‑Scale Distributed Computing Practices

The article explains how Ant Group tackles the challenge of tightly integrating multiple computing paradigms by building a Ray‑based fusion engine, detailing its architecture, features, large‑scale applications in online machine learning and parallel processing, and outlining future development and recruitment opportunities.

Ant GroupFusion EngineRay
0 likes · 10 min read
Building a Fusion Engine with Ray: Ant Group’s Large‑Scale Distributed Computing Practices
AntTech
AntTech
Nov 1, 2019 · Artificial Intelligence

Building a Unified Online Machine Learning Platform with Ray for Alipay’s “Collect Five Blessings” Campaign

The article describes how Alipay tackled the cold‑start, conversion, and user‑experience challenges of its time‑limited “Collect Five Blessings” activity by designing a unified online machine‑learning system based on the Ray distributed‑computing framework, emphasizing stability, efficiency, simplicity, multi‑language support, and fault‑tolerant scheduling.

AlipayRaySystem Architecture
0 likes · 11 min read
Building a Unified Online Machine Learning Platform with Ray for Alipay’s “Collect Five Blessings” Campaign