Tagged articles
18 articles
Page 1 of 1
DataFunSummit
DataFunSummit
May 10, 2026 · Big Data

How Lance File Format v2.2 Accelerates, Cuts Costs, and Governs Multimodal Data

Lance File Format v2.2 tackles the AI data explosion by delivering hundred‑fold random‑read performance, advanced two‑layer compression, zero‑cost schema evolution, Git‑style versioning, external blob handling, and a roadmap toward native media support and intelligent encoding, positioning it as a core infrastructure for large‑scale multimodal workloads.

Data GovernanceFile FormatIO performance
0 likes · 14 min read
How Lance File Format v2.2 Accelerates, Cuts Costs, and Governs Multimodal Data
DataFunTalk
DataFunTalk
May 8, 2026 · Big Data

How MaxCompute Evolves into a Data+AI Platform: Architecture, Core Capabilities, and Real-World Cases

The article explains how Alibaba Cloud's MaxCompute has been transformed into a cloud‑native Data+AI platform, detailing its layered architecture, multimodal storage, model management, hybrid compute scheduling, SQL AI functions, the MaxFrame Python framework, and several enterprise case studies that demonstrate performance gains and flexible resource orchestration.

AI integrationBig DataCloud Native
0 likes · 11 min read
How MaxCompute Evolves into a Data+AI Platform: Architecture, Core Capabilities, and Real-World Cases
DataFunTalk
DataFunTalk
Apr 21, 2026 · Industry Insights

How AI Agents Are Redefining Data Governance: 5 Key Shifts and 3 Strategic Solutions

In the AI era, data consumption moves from a few technical users to all business staff, forcing a fundamental redesign of data governance across five dimensions—resource consumption, frequency, semantics, knowledge base, and modality—and proposing three actionable strategies to make data semantically rich, fully multimodal, and AI‑consumable.

AIData GovernanceEnterprise Analytics
0 likes · 18 min read
How AI Agents Are Redefining Data Governance: 5 Key Shifts and 3 Strategic Solutions
Machine Heart
Machine Heart
Apr 18, 2026 · Artificial Intelligence

Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall

Paxini, together with JD Cloud, Tencent Cloud, and Baidu Intelligent Cloud, launches the world’s first hundred‑billion‑scale, full‑modal, high‑degree‑of‑freedom embodied AI data cloud mall, offering instant online data procurement, end‑to‑end model training pipelines, and validated performance gains in both lab and real‑world robot tasks.

Embodied AIModel TrainingMultimodal Data
0 likes · 13 min read
Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 3, 2026 · Industry Insights

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

This article analyzes how the Daft‑Ray‑Lance stack tackles the challenges of multimodal AI workloads by offering a high‑performance Rust engine, adaptive back‑pressure, seamless Ray‑based distributed scheduling, and a storage format optimized for random access, vector indexing, and zero‑copy schema evolution, complete with benchmark comparisons and practical deployment guidance.

BenchmarkDaftLance
0 likes · 21 min read
Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines
Big Data Tech Team
Big Data Tech Team
Mar 3, 2026 · Artificial Intelligence

AI‑Powered DWD Layer: Boost Efficiency, Quality, and Multimodal Data

This article examines how large‑language models can reconstruct the data‑warehouse DWD layer by automating ETL script generation, data cleaning, standardization, and cross‑table association, presenting three high‑frequency scenarios—structured data cleaning, multimodal data parsing, and intelligent table linking—along with tool selections, step‑by‑step procedures, real‑world case studies, and practical pitfalls.

AICase StudyDWD
0 likes · 18 min read
AI‑Powered DWD Layer: Boost Efficiency, Quality, and Multimodal Data
DataFunTalk
DataFunTalk
Dec 26, 2025 · Cloud Native

How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing

Haier’s digital transformation leverages a cloud‑native, open‑source‑based multi‑modal data lake that unifies structured and unstructured industrial data, uses metadata models and knowledge graphs for governance, and provides AI‑ready services that balance performance, cost, and real‑time requirements.

AIData LakeMultimodal Data
0 likes · 12 min read
How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing
DataFunSummit
DataFunSummit
Nov 24, 2025 · Big Data

How Tencent Cloud Uses Iceberg, Gravitino and Multimodal Lakes for Unified Data Processing

This article series explores Tencent Cloud's Iceberg‑based batch‑stream integration, Apache Gravitino's unified metadata and lineage solution, Xiaohongshu's data‑architecture evolution for the Big AI Data era, and a practical Data+AI multimodal data‑lake implementation, highlighting challenges, architectural designs, and performance gains.

Big DataData LakeIceberg
0 likes · 7 min read
How Tencent Cloud Uses Iceberg, Gravitino and Multimodal Lakes for Unified Data Processing
DataFunTalk
DataFunTalk
Nov 22, 2025 · Big Data

How Modern Data Lakes and AI Governance Transform Enterprise Analytics

This article collection examines Tencent Cloud’s Iceberg batch‑stream integration, AI‑driven game data governance, Apache Gravitino unified metadata and lineage, Xiaohongshu’s multimodal data‑lake evolution, and Volcano Engine’s Data+AI multimodal lake, highlighting architectures, techniques, performance gains, and practical implementations.

AI GovernanceData LakeGravitino
0 likes · 7 min read
How Modern Data Lakes and AI Governance Transform Enterprise Analytics
Instant Consumer Technology Team
Instant Consumer Technology Team
Nov 7, 2025 · Artificial Intelligence

How Game‑TARS Redefines Game AI with Human‑Native Interaction and Sparse Reasoning

Game‑TARS, a general‑purpose game AI from ByteDance's Seed team, replaces custom function calls with low‑level keyboard‑mouse actions, leverages massive multimodal data, sparse‑thinking and decaying‑loss algorithms, and achieves zero‑shot mastery across diverse games, surpassing top large models like GPT‑5 and Gemini‑2.5‑Pro.

AI trainingMultimodal Datagame AI
0 likes · 10 min read
How Game‑TARS Redefines Game AI with Human‑Native Interaction and Sparse Reasoning
DataFunSummit
DataFunSummit
Sep 24, 2024 · Artificial Intelligence

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

The article discusses the challenges of training ever‑larger AI models on internet‑scale data, critiques traditional batch ETL pipelines, and proposes a streaming data‑flow architecture with dynamic data selection and a shared‑memory/Alluxio middle layer to decouple data processing from model training, improving efficiency and scalability.

AI InfrastructureMultimodal Datadata pipelines
0 likes · 20 min read
Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training
AntTech
AntTech
Sep 10, 2024 · Big Data

From DATA for AI to AI for DATA: Evolution of Ant Group’s Intelligent Data System

The talk reviews the rapid evolution of data technologies—from early database foundations and big‑data breakthroughs to the rise of generative AI—highlighting how Ant Group’s data platform is shifting from a cost‑efficiency focus to a value‑centric, multimodal, AI‑driven ecosystem.

Big DataData PlatformsMultimodal Data
0 likes · 17 min read
From DATA for AI to AI for DATA: Evolution of Ant Group’s Intelligent Data System
21CTO
21CTO
Sep 7, 2024 · Artificial Intelligence

Why AI Databases Are the Next Big Leap for Vector Search and Multimodal Data

The article explains how AI databases combine structured, unstructured, and vector data, integrate machine‑learning, NLP, and generative models, and why platforms like Vespa are emerging as open‑source solutions to meet the performance and scalability demands of modern generative AI applications.

AI DatabaseMultimodal DataVespa
0 likes · 8 min read
Why AI Databases Are the Next Big Leap for Vector Search and Multimodal Data
DataFunSummit
DataFunSummit
Jun 15, 2024 · Artificial Intelligence

Large‑Model‑Driven Data Governance: Technical Outlook and Research Highlights

This article reviews the rising importance of data quality for large models, explores data‑centric AI, large‑model pre‑training data engineering, and presents recent Fudan University research on using large models to improve data governance across multiple domains such as attribute normalization, geographic cleaning, compliance checking, and multimodal retrieval.

AIData GovernanceKnowledge Graphs
0 likes · 19 min read
Large‑Model‑Driven Data Governance: Technical Outlook and Research Highlights
DataFunSummit
DataFunSummit
Aug 11, 2023 · Artificial Intelligence

Application of Knowledge Graphs in Risk Control at Wing Payment

This presentation details how Wing Payment leverages a large‑scale, multimodal knowledge graph and AI techniques—including computer vision, unsupervised and supervised learning, federated learning, and graph neural networks—to detect and mitigate fraud across payment, e‑commerce, and credit scenarios, while outlining system architecture, algorithmic approaches, case studies, and future research directions.

Financial ServicesMultimodal Datafraud detection
0 likes · 17 min read
Application of Knowledge Graphs in Risk Control at Wing Payment
58 Tech
58 Tech
Apr 16, 2021 · Artificial Intelligence

Graph Neural Network Based Anti‑Fraud Solution for Online Information Services

The article presents a comprehensive anti‑fraud framework that analyzes black‑market fraud characteristics, reviews conventional fraud‑mitigation methods, and proposes a multimodal graph‑neural‑network approach—leveraging device, behavior, and content similarity—to accurately identify fraudulent users on large‑scale internet platforms.

Multimodal Dataanti‑fraudfraud detection
0 likes · 18 min read
Graph Neural Network Based Anti‑Fraud Solution for Online Information Services