Tagged articles

Multimodal Data

25 articles · Page 1 of 1
ITPUB
ITPUB
Jul 2, 2026 · Databases

Why OceanBase Built Lakebase: The Need for an AI‑Ready Lake‑Database

Lakebase, the core engine of OceanBase AI Database, unifies storage, management, computation and search for structured, unstructured and vector data, offering multimodal data capabilities, flexible deployment modes, and AI‑oriented use cases such as autonomous driving and financial document analytics.

AI ApplicationsAI DatabaseData Management
0 likes · 9 min read
Why OceanBase Built Lakebase: The Need for an AI‑Ready Lake‑Database
Old Zhang's AI Learning
Old Zhang's AI Learning
Jul 1, 2026 · Databases

Why Enterprise AI Hits a Wall at the Data Layer Despite Powerful Large Models

The article argues that as AI agents replace human users, the real bottleneck for enterprise AI shifts from model performance to data infrastructure, and explains how OceanBase’s AI‑native database—Lakebase—addresses multimodal data, hybrid search, agent safety, and massive logical tables to enable production‑grade AI applications.

AI DatabaseAgent-friendlyData Infrastructure
0 likes · 16 min read
Why Enterprise AI Hits a Wall at the Data Layer Despite Powerful Large Models
Machine Heart
Machine Heart
Jun 30, 2026 · Artificial Intelligence

Why Loop Engineering Is the Next Frontier: Two Young PhDs Target Human Closed‑Loop Data

Loop Engineering shifts AI from single prompts to continuous feedback loops, and by capturing human perception‑decision‑action‑feedback cycles with multimodal signals, the new Ego‑NeuroLoop paradigm promises far more data‑efficient embodied intelligence than existing ego‑centric video datasets.

Ego-NeuroLoopEmbodied AILoop Engineering
0 likes · 11 min read
Why Loop Engineering Is the Next Frontier: Two Young PhDs Target Human Closed‑Loop Data
DataFunTalk
DataFunTalk
Jun 29, 2026 · Big Data

How Agentic Streaming Is Redefining Real‑Time AI at Flink Forward Asia 2026

The Flink Forward Asia 2026 conference in Shenzhen showcased Apache Flink's evolution to Agentic Streaming for AI, introduced the multimodal Agentic Lake built on Apache Paimon 2.0, announced Fluss 1.0 as a real‑time context layer, and highlighted performance gains over competing stacks such as Ray and Daft.

Agentic StreamingApache FlinkApache Fluss
0 likes · 13 min read
How Agentic Streaming Is Redefining Real‑Time AI at Flink Forward Asia 2026
DataFunSummit
DataFunSummit
Jun 5, 2026 · Industry Insights

Why Enterprise Agents Need Real‑Time Fact Retrieval More Than Semantic Understanding

The article analyzes how enterprise‑level AI agents, when deployed in production, struggle with factual data retrieval despite semantic capabilities, and argues that real‑time, low‑latency, multimodal analytics—exemplified by systems like Apache Doris and SelectDB—are the essential data entry points for successful Agent deployments.

AI AgentApache DorisHybrid Search
0 likes · 9 min read
Why Enterprise Agents Need Real‑Time Fact Retrieval More Than Semantic Understanding
DataFunSummit
DataFunSummit
May 25, 2026 · Big Data

How Hisense Built an AI‑Ready Multimodal Data Platform: Storage, Governance, and Development

This article details Hisense's journey to create an AI‑ready multimodal data platform, covering the challenges of integrating diverse business systems, the shift from a Hadoop‑based architecture to a cloud‑native data lake, the JuData governance and development platform, and six practical scenarios that demonstrate unified ingestion, metadata management, rule‑based quality control, intelligent asset retrieval, and future AI‑driven DataOps capabilities.

AI platformCloud NativeData Governance
0 likes · 23 min read
How Hisense Built an AI‑Ready Multimodal Data Platform: Storage, Governance, and Development
DataFunSummit
DataFunSummit
May 10, 2026 · Big Data

How Lance File Format v2.2 Accelerates, Cuts Costs, and Governs Multimodal Data

Lance File Format v2.2 tackles the AI data explosion by delivering hundred‑fold random‑read performance, advanced two‑layer compression, zero‑cost schema evolution, Git‑style versioning, external blob handling, and a roadmap toward native media support and intelligent encoding, positioning it as a core infrastructure for large‑scale multimodal workloads.

Data GovernanceFile FormatIO performance
0 likes · 14 min read
How Lance File Format v2.2 Accelerates, Cuts Costs, and Governs Multimodal Data
DataFunTalk
DataFunTalk
May 8, 2026 · Big Data

How MaxCompute Evolves into a Data+AI Platform: Architecture, Core Capabilities, and Real-World Cases

The article explains how Alibaba Cloud's MaxCompute has been transformed into a cloud‑native Data+AI platform, detailing its layered architecture, multimodal storage, model management, hybrid compute scheduling, SQL AI functions, the MaxFrame Python framework, and several enterprise case studies that demonstrate performance gains and flexible resource orchestration.

AI integrationBig DataCloud Native
0 likes · 11 min read
How MaxCompute Evolves into a Data+AI Platform: Architecture, Core Capabilities, and Real-World Cases
DataFunTalk
DataFunTalk
Apr 21, 2026 · Industry Insights

How AI Agents Are Redefining Data Governance: 5 Key Shifts and 3 Strategic Solutions

In the AI era, data consumption moves from a few technical users to all business staff, forcing a fundamental redesign of data governance across five dimensions—resource consumption, frequency, semantics, knowledge base, and modality—and proposing three actionable strategies to make data semantically rich, fully multimodal, and AI‑consumable.

AIData GovernanceEnterprise Analytics
0 likes · 18 min read
How AI Agents Are Redefining Data Governance: 5 Key Shifts and 3 Strategic Solutions
Machine Heart
Machine Heart
Apr 18, 2026 · Artificial Intelligence

Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall

Paxini, together with JD Cloud, Tencent Cloud, and Baidu Intelligent Cloud, launches the world’s first hundred‑billion‑scale, full‑modal, high‑degree‑of‑freedom embodied AI data cloud mall, offering instant online data procurement, end‑to‑end model training pipelines, and validated performance gains in both lab and real‑world robot tasks.

Embodied AILarge-Scale DataModel Training
0 likes · 13 min read
Why Embodied Data Is the Biggest Gold Mine: Inside the World’s First Hundred‑Billion‑Scale Multimodal Data Cloud Mall
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 3, 2026 · Industry Insights

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

This article analyzes how the Daft‑Ray‑Lance stack tackles the challenges of multimodal AI workloads by offering a high‑performance Rust engine, adaptive back‑pressure, seamless Ray‑based distributed scheduling, and a storage format optimized for random access, vector indexing, and zero‑copy schema evolution, complete with benchmark comparisons and practical deployment guidance.

DaftData EngineeringLance
0 likes · 21 min read
Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines
Big Data Tech Team
Big Data Tech Team
Mar 3, 2026 · Artificial Intelligence

AI‑Powered DWD Layer: Boost Efficiency, Quality, and Multimodal Data

This article examines how large‑language models can reconstruct the data‑warehouse DWD layer by automating ETL script generation, data cleaning, standardization, and cross‑table association, presenting three high‑frequency scenarios—structured data cleaning, multimodal data parsing, and intelligent table linking—along with tool selections, step‑by‑step procedures, real‑world case studies, and practical pitfalls.

AICase StudyDWD
0 likes · 18 min read
AI‑Powered DWD Layer: Boost Efficiency, Quality, and Multimodal Data
DataFunTalk
DataFunTalk
Dec 26, 2025 · Cloud Native

How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing

Haier’s digital transformation leverages a cloud‑native, open‑source‑based multi‑modal data lake that unifies structured and unstructured industrial data, uses metadata models and knowledge graphs for governance, and provides AI‑ready services that balance performance, cost, and real‑time requirements.

AIData LakeMetadata
0 likes · 12 min read
How Haier Built a Cloud‑Native Multi‑Modal Data Lake for AI‑Ready Manufacturing
DataFunSummit
DataFunSummit
Nov 24, 2025 · Big Data

How Tencent Cloud Uses Iceberg, Gravitino and Multimodal Lakes for Unified Data Processing

This article series explores Tencent Cloud's Iceberg‑based batch‑stream integration, Apache Gravitino's unified metadata and lineage solution, Xiaohongshu's data‑architecture evolution for the Big AI Data era, and a practical Data+AI multimodal data‑lake implementation, highlighting challenges, architectural designs, and performance gains.

Big DataData LakeIceberg
0 likes · 7 min read
How Tencent Cloud Uses Iceberg, Gravitino and Multimodal Lakes for Unified Data Processing
DataFunTalk
DataFunTalk
Nov 22, 2025 · Big Data

How Modern Data Lakes and AI Governance Transform Enterprise Analytics

This article collection examines Tencent Cloud’s Iceberg batch‑stream integration, AI‑driven game data governance, Apache Gravitino unified metadata and lineage, Xiaohongshu’s multimodal data‑lake evolution, and Volcano Engine’s Data+AI multimodal lake, highlighting architectures, techniques, performance gains, and practical implementations.

AI GovernanceData LakeGravitino
0 likes · 7 min read
How Modern Data Lakes and AI Governance Transform Enterprise Analytics
Instant Consumer Technology Team
Instant Consumer Technology Team
Nov 7, 2025 · Artificial Intelligence

How Game‑TARS Redefines Game AI with Human‑Native Interaction and Sparse Reasoning

Game‑TARS, a general‑purpose game AI from ByteDance's Seed team, replaces custom function calls with low‑level keyboard‑mouse actions, leverages massive multimodal data, sparse‑thinking and decaying‑loss algorithms, and achieves zero‑shot mastery across diverse games, surpassing top large models like GPT‑5 and Gemini‑2.5‑Pro.

AI trainingMultimodal Datagame AI
0 likes · 10 min read
How Game‑TARS Redefines Game AI with Human‑Native Interaction and Sparse Reasoning
DataFunSummit
DataFunSummit
Sep 24, 2024 · Artificial Intelligence

Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training

The article discusses the challenges of training ever‑larger AI models on internet‑scale data, critiques traditional batch ETL pipelines, and proposes a streaming data‑flow architecture with dynamic data selection and a shared‑memory/Alluxio middle layer to decouple data processing from model training, improving efficiency and scalability.

AI InfrastructureMultimodal Datadata pipelines
0 likes · 20 min read
Streaming Data Pipelines and Scaling Laws for Efficient Large‑Model Training
AntTech
AntTech
Sep 10, 2024 · Big Data

From DATA for AI to AI for DATA: Evolution of Ant Group’s Intelligent Data System

The talk reviews the rapid evolution of data technologies—from early database foundations and big‑data breakthroughs to the rise of generative AI—highlighting how Ant Group’s data platform is shifting from a cost‑efficiency focus to a value‑centric, multimodal, AI‑driven ecosystem.

Big DataData EngineeringData Platforms
0 likes · 17 min read
From DATA for AI to AI for DATA: Evolution of Ant Group’s Intelligent Data System
21CTO
21CTO
Sep 7, 2024 · Artificial Intelligence

Why AI Databases Are the Next Big Leap for Vector Search and Multimodal Data

The article explains how AI databases combine structured, unstructured, and vector data, integrate machine‑learning, NLP, and generative models, and why platforms like Vespa are emerging as open‑source solutions to meet the performance and scalability demands of modern generative AI applications.

AI DatabaseGenerative AIMultimodal Data
0 likes · 8 min read
Why AI Databases Are the Next Big Leap for Vector Search and Multimodal Data
DataFunSummit
DataFunSummit
Jun 15, 2024 · Artificial Intelligence

Large‑Model‑Driven Data Governance: Technical Outlook and Research Highlights

This article reviews the rising importance of data quality for large models, explores data‑centric AI, large‑model pre‑training data engineering, and presents recent Fudan University research on using large models to improve data governance across multiple domains such as attribute normalization, geographic cleaning, compliance checking, and multimodal retrieval.

AIData EngineeringData Governance
0 likes · 19 min read
Large‑Model‑Driven Data Governance: Technical Outlook and Research Highlights
DataFunSummit
DataFunSummit
Aug 11, 2023 · Artificial Intelligence

Application of Knowledge Graphs in Risk Control at Wing Payment

This presentation details how Wing Payment leverages a large‑scale, multimodal knowledge graph and AI techniques—including computer vision, unsupervised and supervised learning, federated learning, and graph neural networks—to detect and mitigate fraud across payment, e‑commerce, and credit scenarios, while outlining system architecture, algorithmic approaches, case studies, and future research directions.

Multimodal Datafinancial servicesfraud detection
0 likes · 17 min read
Application of Knowledge Graphs in Risk Control at Wing Payment
58 Tech
58 Tech
Apr 16, 2021 · Artificial Intelligence

Graph Neural Network Based Anti‑Fraud Solution for Online Information Services

The article presents a comprehensive anti‑fraud framework that analyzes black‑market fraud characteristics, reviews conventional fraud‑mitigation methods, and proposes a multimodal graph‑neural‑network approach—leveraging device, behavior, and content similarity—to accurately identify fraudulent users on large‑scale internet platforms.

Graph Neural NetworksMultimodal Dataanti‑fraud
0 likes · 18 min read
Graph Neural Network Based Anti‑Fraud Solution for Online Information Services