Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 22, 2026 · Artificial Intelligence

How to Build an End‑to‑End Hand‑Video to VLA Data Pipeline on Alibaba Cloud PAI with Data‑Juicer

This article details a step‑by‑step, distributed pipeline built on Alibaba Cloud PAI using Data‑Juicer and Ray that transforms raw egocentric hand videos into LeRobot v2.0‑compatible Vision‑Language‑Action (VLA) training data, covering video splitting, frame extraction, camera calibration, 3D hand reconstruction, pose estimation, action captioning, and export, with code snippets, performance numbers, and references.

Data-JuicerDistributed ComputingLerobot
0 likes · 29 min read
How to Build an End‑to‑End Hand‑Video to VLA Data Pipeline on Alibaba Cloud PAI with Data‑Juicer
DataFunSummit
DataFunSummit
Mar 2, 2026 · Artificial Intelligence

How Data-Juicer Powers Multi‑Modal Data Processing for Large Language Models

This article explains the evolution of Data‑Juicer from a pure‑text preprocessing tool to a full‑stack multi‑modal data engine, detailing its architecture, operator library, Ray‑based distributed execution, performance benchmarks, integration with AI agents, and roadmap for future AI‑centric data workflows.

Data-JuicerMulti-ModalRay
0 likes · 31 min read
How Data-Juicer Powers Multi‑Modal Data Processing for Large Language Models