Fundamentals 28 min read

Why Unstructured Data Management Is the Next Frontier for Enterprises

This article explores the evolution, current state, and challenges of enterprise unstructured data management, reviews case studies from traditional firms, Huawei and Ant Group, proposes an ECM‑based reference framework, compares it with structured data governance, and outlines future integration strategies with AI and unified data platforms.

DataFunSummit
DataFunSummit
DataFunSummit
Why Unstructured Data Management Is the Next Frontier for Enterprises

Introduction

导读 In the 2025 Data+AI Governance Summit, Dai Guohui, Vice‑President of the International Data Management Association Greater China, presented "Enterprise‑Level Unstructured Data Management Practices and Reflections," covering historical practice cases, reference frameworks, and future trends, with examples from Huawei, Ant Group and other large enterprises.

1. History and Current Status of Unstructured Data

1.1 Human Data Development Timeline

Early era (c. 3000 BC–20 century): Invention of cuneiform by Sumerians and the spread of paper as data carriers.

Late 19th–early 20th century: Introduction of photography, radio, television, expanding data formats.

Computer era (post‑1946): First computers enable integration of unstructured data with new technologies.

1998: Google launches search, first large‑scale retrieval of unstructured data.

Big‑Data & AI era (2005‑present): "Big Data" concept emphasizes massive unstructured data; AI breakthroughs (e.g., GPT) usher in intelligent processing.

1.2 Current Situation

Unstructured data includes text, images, audio, video, and both electronic and paper documents. It accounts for over 80% of enterprise data volume, growing at >60% annually, far outpacing structured data. Existing data warehouses and big‑data platforms still lack integrated governance for unstructured assets.

2. Lack of a Unified Methodology for Unstructured Data Management

Most data‑management frameworks (DAMA, EDM Council, DCMM, GB/T 34960.5‑2018, industry‑specific guidelines) are structured‑data centric. Although they mention document or content management, the overall architecture remains oriented toward structured data, leaving unstructured data without a systematic theory or practice guide.

3. AI Applications of Unstructured Data

Leading internet companies (Baidu, Tencent, Alibaba, ByteDance) focus AI products on text processing, large‑model inference, and audio‑video handling—core unstructured‑data tasks—making effective management of such data critical for AI advancement.

4. Practical Case Studies

4.1 Traditional Enterprise

Based on knowledge‑management trends and Sarbanes‑Oxley compliance, the enterprise tackled unstructured data by extracting explicit knowledge (20% of total knowledge but >80% of value) and delivering it via ECM for employee efficiency and managerial decision‑support.

4.2 Huawei

Huawei’s "Data Way" book focuses on structured‑data governance; unstructured data management is treated as a subsidiary activity, primarily through document management and a dedicated Archives & Official‑Document Center, illustrating a gradual integration of ECM into the enterprise IT architecture.

4.3 Ant Group

Ant Group progressed from basic ECM to cloud‑storage (CSP) and intelligent information management (IIM), building a multi‑phase platform: electronic document management → multimedia asset management → business‑process & archive integration, with middleware (ECI) enabling cross‑system content services.

5. Reference Framework for Unstructured Data Management

The dominant approach is Enterprise Content Management (ECM), which provides a holistic methodology distinct from structured‑data practices. A complete ECM system should include at least ten modules (document, digital‑asset, archive, permission, web‑content, portal, knowledge, BPM integration, etc.) to manage all unstructured formats.

5.1 Logical Framework (Seven Core Steps)

Strategic Planning & Governance

Infrastructure & Resource Construction

Data Architecture & Standards (naming, format, content, metadata)

Full Lifecycle Management

Quality & Security Controls

Data Operations

AI‑Driven Fusion (data as AI training material and AI optimizing governance)

5.2 Differences from Structured‑Data Governance

Architecture: Unstructured data requires classification and template design distinct from relational schemas.

Standards: Separate naming, format, content, and metadata standards.

Quality: Encompasses object, template, process, and metadata dimensions.

AI Techniques: Leverage OCR, image generation, text generation for automated labeling and classification.

6. Future Thoughts

Unstructured and structured data are interdependent; current fragmented management across departments hampers efficiency. A unified data‑management framework should combine strategic governance, shared infrastructure, integrated data & content governance, standardized lifecycle, AI‑enhanced processes, and a closed‑loop knowledge‑to‑process feedback loop.

Conclusion

Effective enterprise unstructured data management requires a systematic ECM‑based methodology, clear standards, and AI integration to transform raw multimedia assets into valuable, governed knowledge assets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataAIData Governanceunstructured dataenterprise content management
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.