Tagged articles

18 articles

Page 1 of 1

May 10, 2026 · Big Data

How Lance File Format v2.2 Accelerates, Cuts Costs, and Governs Multimodal Data

Lance File Format v2.2 tackles the AI data explosion by delivering hundred‑fold random‑read performance, advanced two‑layer compression, zero‑cost schema evolution, Git‑style versioning, external blob handling, and a roadmap toward native media support and intelligent encoding, positioning it as a core infrastructure for large‑scale multimodal workloads.

Data GovernanceFile FormatIO performance

0 likes · 14 min read

How Lance File Format v2.2 Accelerates, Cuts Costs, and Governs Multimodal Data

DataFunTalk

May 8, 2026 · Big Data

How MaxCompute Evolves into a Data+AI Platform: Architecture, Core Capabilities, and Real-World Cases

The article explains how Alibaba Cloud's MaxCompute has been transformed into a cloud‑native Data+AI platform, detailing its layered architecture, multimodal storage, model management, hybrid compute scheduling, SQL AI functions, the MaxFrame Python framework, and several enterprise case studies that demonstrate performance gains and flexible resource orchestration.

AI integrationBig DataCloud Native

0 likes · 11 min read

How MaxCompute Evolves into a Data+AI Platform: Architecture, Core Capabilities, and Real-World Cases

DataFunTalk

Apr 21, 2026 · Industry Insights

How AI Agents Are Redefining Data Governance: 5 Key Shifts and 3 Strategic Solutions

In the AI era, data consumption moves from a few technical users to all business staff, forcing a fundamental redesign of data governance across five dimensions—resource consumption, frequency, semantics, knowledge base, and modality—and proposing three actionable strategies to make data semantically rich, fully multimodal, and AI‑consumable.

AIData GovernanceEnterprise Analytics

0 likes · 18 min read

How AI Agents Are Redefining Data Governance: 5 Key Shifts and 3 Strategic Solutions

Machine Heart

Apr 18, 2026 · Artificial Intelligence

Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall

Paxini, together with JD Cloud, Tencent Cloud, and Baidu Intelligent Cloud, launches the world’s first hundred‑billion‑scale, full‑modal, high‑degree‑of‑freedom embodied AI data cloud mall, offering instant online data procurement, end‑to‑end model training pipelines, and validated performance gains in both lab and real‑world robot tasks.

Embodied AIModel TrainingMultimodal Data

0 likes · 13 min read

Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall

Big Data Technology & Architecture

Apr 3, 2026 · Industry Insights

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

This article analyzes how the Daft‑Ray‑Lance stack tackles the challenges of multimodal AI workloads by offering a high‑performance Rust engine, adaptive back‑pressure, seamless Ray‑based distributed scheduling, and a storage format optimized for random access, vector indexing, and zero‑copy schema evolution, complete with benchmark comparisons and practical deployment guidance.

BenchmarkDaftLance

0 likes · 21 min read

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

Big Data Tech Team

Mar 3, 2026 · Artificial Intelligence

AI‑Powered DWD Layer: Boost Efficiency, Quality, and Multimodal Data

This article examines how large‑language models can reconstruct the data‑warehouse DWD layer by automating ETL script generation, data cleaning, standardization, and cross‑table association, presenting three high‑frequency scenarios—structured data cleaning, multimodal data parsing, and intelligent table linking—along with tool selections, step‑by‑step procedures, real‑world case studies, and practical pitfalls.

AICase StudyDWD

0 likes · 18 min read

AI‑Powered DWD Layer: Boost Efficiency, Quality, and Multimodal Data

DataFunTalk

Dec 26, 2025 · Cloud Native

How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing

Haier’s digital transformation leverages a cloud‑native, open‑source‑based multi‑modal data lake that unifies structured and unstructured industrial data, uses metadata models and knowledge graphs for governance, and provides AI‑ready services that balance performance, cost, and real‑time requirements.

AIData LakeMultimodal Data

0 likes · 12 min read

How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing

DataFunSummit

Dec 10, 2025 · Big Data

How Apache Hudi Powers the Next‑Gen AI‑Native Lakehouse: Insights from the Asia Meetup

The article recaps the Apache Hudi Asia Meetup hosted by JD, covering community updates, JD's data‑lake challenges, the upcoming Hudi 1.1 release, JD's architectural redesign, Kuaishou's real‑time lake adoption, and Huawei Cloud's deep optimizations, all aimed at building an AI‑native, real‑time lakehouse.

AI-nativeApache HudiData Lake

0 likes · 13 min read

How Apache Hudi Powers the Next‑Gen AI‑Native Lakehouse: Insights from the Asia Meetup

DataFunSummit

Nov 24, 2025 · Big Data

How Tencent Cloud Uses Iceberg, Gravitino and Multimodal Lakes for Unified Data Processing

This article series explores Tencent Cloud's Iceberg‑based batch‑stream integration, Apache Gravitino's unified metadata and lineage solution, Xiaohongshu's data‑architecture evolution for the Big AI Data era, and a practical Data+AI multimodal data‑lake implementation, highlighting challenges, architectural designs, and performance gains.

Big DataData LakeIceberg

0 likes · 7 min read

How Tencent Cloud Uses Iceberg, Gravitino and Multimodal Lakes for Unified Data Processing

DataFunTalk

Nov 22, 2025 · Big Data

How Modern Data Lakes and AI Governance Transform Enterprise Analytics

This article collection examines Tencent Cloud’s Iceberg batch‑stream integration, AI‑driven game data governance, Apache Gravitino unified metadata and lineage, Xiaohongshu’s multimodal data‑lake evolution, and Volcano Engine’s Data+AI multimodal lake, highlighting architectures, techniques, performance gains, and practical implementations.

AI GovernanceData LakeGravitino

0 likes · 7 min read

How Modern Data Lakes and AI Governance Transform Enterprise Analytics

Alibaba Cloud Big Data AI Platform

Nov 21, 2025 · Artificial Intelligence

Build a Multimodal AI Data Analysis and Retrieval System with Hologres 4.0

This guide walks through constructing an AI‑powered multimodal data analysis platform on Hologres 4.0, covering PDF ingestion, vector and full‑text indexing, AI functions, stored procedures, and hybrid retrieval queries to enable enterprise‑level search and insight extraction.

AIHologresMultimodal Data

0 likes · 41 min read

Build a Multimodal AI Data Analysis and Retrieval System with Hologres 4.0

Instant Consumer Technology Team

Nov 7, 2025 · Artificial Intelligence

How Game‑TARS Redefines Game AI with Human‑Native Interaction and Sparse Reasoning

Game‑TARS, a general‑purpose game AI from ByteDance's Seed team, replaces custom function calls with low‑level keyboard‑mouse actions, leverages massive multimodal data, sparse‑thinking and decaying‑loss algorithms, and achieves zero‑shot mastery across diverse games, surpassing top large models like GPT‑5 and Gemini‑2.5‑Pro.

AI trainingMultimodal Datagame AI

0 likes · 10 min read

How Game‑TARS Redefines Game AI with Human‑Native Interaction and Sparse Reasoning

DataFunSummit

Sep 24, 2024 · Artificial Intelligence

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

The article discusses the challenges of training ever‑larger AI models on internet‑scale data, critiques traditional batch ETL pipelines, and proposes a streaming data‑flow architecture with dynamic data selection and a shared‑memory/Alluxio middle layer to decouple data processing from model training, improving efficiency and scalability.

AI InfrastructureMultimodal Datadata pipelines

0 likes · 20 min read

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

AntTech

Sep 10, 2024 · Big Data

From DATA for AI to AI for DATA: Evolution of Ant Group’s Intelligent Data System

The talk reviews the rapid evolution of data technologies—from early database foundations and big‑data breakthroughs to the rise of generative AI—highlighting how Ant Group’s data platform is shifting from a cost‑efficiency focus to a value‑centric, multimodal, AI‑driven ecosystem.

Big DataData PlatformsMultimodal Data

0 likes · 17 min read

From DATA for AI to AI for DATA: Evolution of Ant Group’s Intelligent Data System

21CTO

Sep 7, 2024 · Artificial Intelligence

Why AI Databases Are the Next Big Leap for Vector Search and Multimodal Data

The article explains how AI databases combine structured, unstructured, and vector data, integrate machine‑learning, NLP, and generative models, and why platforms like Vespa are emerging as open‑source solutions to meet the performance and scalability demands of modern generative AI applications.

AI DatabaseMultimodal DataVespa

0 likes · 8 min read

Why AI Databases Are the Next Big Leap for Vector Search and Multimodal Data

DataFunSummit

Jun 15, 2024 · Artificial Intelligence

Large‑Model‑Driven Data Governance: Technical Outlook and Research Highlights

This article reviews the rising importance of data quality for large models, explores data‑centric AI, large‑model pre‑training data engineering, and presents recent Fudan University research on using large models to improve data governance across multiple domains such as attribute normalization, geographic cleaning, compliance checking, and multimodal retrieval.

AIData GovernanceKnowledge Graphs

0 likes · 19 min read

Large‑Model‑Driven Data Governance: Technical Outlook and Research Highlights

DataFunSummit

Aug 11, 2023 · Artificial Intelligence

Application of Knowledge Graphs in Risk Control at Wing Payment

This presentation details how Wing Payment leverages a large‑scale, multimodal knowledge graph and AI techniques—including computer vision, unsupervised and supervised learning, federated learning, and graph neural networks—to detect and mitigate fraud across payment, e‑commerce, and credit scenarios, while outlining system architecture, algorithmic approaches, case studies, and future research directions.

Financial ServicesMultimodal Datafraud detection

0 likes · 17 min read

Application of Knowledge Graphs in Risk Control at Wing Payment

58 Tech

Apr 16, 2021 · Artificial Intelligence

Graph Neural Network Based Anti‑Fraud Solution for Online Information Services

The article presents a comprehensive anti‑fraud framework that analyzes black‑market fraud characteristics, reviews conventional fraud‑mitigation methods, and proposes a multimodal graph‑neural‑network approach—leveraging device, behavior, and content similarity—to accurately identify fraudulent users on large‑scale internet platforms.

Multimodal Dataanti‑fraudfraud detection

0 likes · 18 min read

Graph Neural Network Based Anti‑Fraud Solution for Online Information Services