Alibaba Cloud Big Data AI Platform
Author

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

455
Articles
0
Likes
387
Views
0
Comments
Recent Articles

Latest from Alibaba Cloud Big Data AI Platform

100 recent articles max
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 28, 2026 · Artificial Intelligence

Zero‑Learning Video to Semantic Vector Pipeline with MaxFrame’s Distributed AI Engine

Faced with exploding video volumes and bottlenecks in frame extraction, labeling, and vector storage, MaxFrame offers a three‑step, end‑to‑end distributed pipeline that turns raw videos into searchable semantic vectors while providing zero‑threshold scaling, transparent OSS mounting, row‑level fault tolerance, and elastic concurrency control.

MaxComputeMaxFrameOSS
0 likes · 6 min read
Zero‑Learning Video to Semantic Vector Pipeline with MaxFrame’s Distributed AI Engine
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 27, 2026 · Information Security

Real-Time Agentic Risk Detection with Flink, Fluss, and Large Language Models

The article presents a Flink‑Fluss‑LLM architecture that captures full‑link agent events via a non‑intrusive hook, combines semantic AI inference with deterministic CEP rules, and delivers millisecond‑level alerts for malicious user detection, tool result poisoning, and chain‑attack risk mitigation.

AI FunctionAgent SecurityFlink
0 likes · 41 min read
Real-Time Agentic Risk Detection with Flink, Fluss, and Large Language Models
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 22, 2026 · Artificial Intelligence

How to Build an End‑to‑End Hand‑Video to VLA Data Pipeline on Alibaba Cloud PAI with Data‑Juicer

This article details a step‑by‑step, distributed pipeline built on Alibaba Cloud PAI using Data‑Juicer and Ray that transforms raw egocentric hand videos into LeRobot v2.0‑compatible Vision‑Language‑Action (VLA) training data, covering video splitting, frame extraction, camera calibration, 3D hand reconstruction, pose estimation, action captioning, and export, with code snippets, performance numbers, and references.

Data-JuicerDistributed ComputingLerobot
0 likes · 29 min read
How to Build an End‑to‑End Hand‑Video to VLA Data Pipeline on Alibaba Cloud PAI with Data‑Juicer
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 20, 2026 · Cloud Computing

How Alibaba Cloud’s Agentic Search Redefines Enterprise AI Search

The article analyzes Alibaba Cloud Elasticsearch’s shift from keyword‑based to Agent‑native search, detailing the Agent Native architecture, hybrid retrieval 2.0, FalconSeek engine performance gains of up to 300%, cost reductions of 40‑70%, and the ecosystem of ES Skills, cloud‑native enhancements, and observability that together enable a scalable AI search platform for enterprises.

AI searchAgentic ArchitectureElasticsearch
0 likes · 13 min read
How Alibaba Cloud’s Agentic Search Redefines Enterprise AI Search
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 17, 2026 · Big Data

What Spark 4.0 Brings: VARIANT Type, Native SQL UDFs, and Serverless Enhancements

Apache Spark 4.0 introduces a high‑performance VARIANT data type for semi‑structured JSON, native SQL UDFs that eliminate Python UDF bottlenecks, a richer Python DataSource API, a new pipeline syntax, upgraded Structured Streaming state management, and Alibaba Cloud EMR Serverless optimizations that together deliver up to 30% speed gains and seamless migration from Spark 3.x.

Apache SparkPython APISQL UDF
0 likes · 12 min read
What Spark 4.0 Brings: VARIANT Type, Native SQL UDFs, and Serverless Enhancements
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 16, 2026 · Artificial Intelligence

Build a Full End‑to‑End Embodied AI Workflow with Isaac Lab Arena

This notebook walks through a complete pipeline—from configuring Isaac Lab Arena environments and downloading datasets, to using Mimic for large‑scale data augmentation, fine‑tuning a GR00T‑N1.5 policy, and performing closed‑loop evaluation—demonstrating how to develop and validate embodied AI tasks on PAI‑DSW.

GR00TIsaac LabMimic
0 likes · 14 min read
Build a Full End‑to‑End Embodied AI Workflow with Isaac Lab Arena
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 13, 2026 · Artificial Intelligence

How to Build a Scalable Multimodal Data Pipeline with Alibaba Cloud PAI and DataJuicer

This article details a step‑by‑step guide for constructing a high‑performance multimodal data pipeline—covering video segmentation, duration filtering, frame extraction, safety and aesthetic scoring, and caption generation—using Alibaba Cloud PAI, Paimon, DataJuicer, and distributed frameworks like Ray and Daft, with real‑world performance metrics.

AIAlibaba CloudDaft
0 likes · 30 min read
How to Build a Scalable Multimodal Data Pipeline with Alibaba Cloud PAI and DataJuicer
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 10, 2026 · Artificial Intelligence

How to Supercharge Small LLM Agents with ReAct Data Construction and EasyDistill

This guide explains how to build high‑quality agent training data using ReAct trajectories, synthesize difficult samples with a data‑flywheel, and distill the knowledge into small LLMs on Alibaba Cloud PAI, covering teacher model deployment, EasyDistill installation, data generation, task solving, rubric filtering, and final model deployment.

AgentData GenerationEasyDistill
0 likes · 14 min read
How to Supercharge Small LLM Agents with ReAct Data Construction and EasyDistill
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 9, 2026 · Artificial Intelligence

How Data Flywheels Accelerate Small Agentic Model Training

This article details a data‑flywheel framework for training compact agentic language models, describing synthetic task generation, mock environment simulation, rubric‑based reward design, iterative hard‑sample augmentation, and experimental results that show consistent performance gains across benchmarks.

Reward DesignSynthetic Environmentsagentic models
0 likes · 17 min read
How Data Flywheels Accelerate Small Agentic Model Training
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 8, 2026 · Artificial Intelligence

Running Distributed Reinforcement Learning with Isaac Lab’s Newton Engine and Rerun Visualizer on PAI

This guide explains how to use the Newton physics engine and the lightweight Rerun visualizer with Isaac Lab on the PAI platform, covering environment setup, visualizer selection, single‑ and multi‑GPU reinforcement‑learning training, and performance analysis via TensorBoard.

Distributed TrainingIsaac LabNewton Engine
0 likes · 9 min read
Running Distributed Reinforcement Learning with Isaac Lab’s Newton Engine and Rerun Visualizer on PAI