Artificial Intelligence 14 min read

Enterprise Large‑Model Deployment, Data Governance, and Cost Economics – Insights from Deepexi

The interview with Deepexi President Bai Haifeng explains how enterprises can lower costs and boost efficiency by adopting domain‑specific large models, outlines the technical pipeline from self‑supervised pre‑training to fine‑tuning, discusses data‑governance challenges for unstructured data, and describes the product ecosystem built to support agile, high‑performance AI solutions.

DataFunSummit

Dec 15, 2023

Enterprise Large‑Model Deployment, Data Governance, and Cost Economics – Insights from Deepexi

The current stage of large‑model adoption focuses on domain‑specific models, driven by two prerequisites: demand‑side needs for cost reduction, efficiency gains, and new applications, and supply‑side readiness with mature training technologies.

Expert Bai Haifeng, President of Deepexi’s product line, is responsible for planning enterprise‑level large‑model products, designing technical architectures, and delivering end‑to‑end AI solutions, drawing on experience from Huawei, Microsoft, IBM, and SaaS startups.

He notes that traditional machine‑learning projects require large, multidisciplinary teams (data engineers, BI engineers, analysts, data scientists, algorithm engineers), leading to high personnel costs that many mid‑size enterprises cannot afford.

Even small models demand diverse skill sets and fragmented toolchains; consequently, building effective pipelines remains complex and expensive, both technically and in terms of team collaboration.

Large models lower the technical barrier: a single Copilot‑augmented user can replace an entire data‑science team, making AI accessible to business units.

The training workflow typically follows three steps: self‑supervised learning to create a general model, supervised fine‑tuning for specific tasks, and RLHF (reinforcement learning from human feedback) to align the model with human values.

Fine‑tuning reduces hallucinations and improves consistency and professionalism, requiring only a thousandth to ten‑thousandth of the data used for the original pre‑training.

Efficiency building hinges on data governance, which now emphasizes unstructured data for large‑model training rather than traditional structured data analytics.

Unstructured data governance faces high acquisition costs; instruction fine‑tuning (e.g., ChatGPT, Llama 2‑chat) uses prompt‑response pairs generated by stronger models or humans, while explanation tuning and techniques like Neftune further enhance data quality.

Large models can also compress and extract knowledge from unstructured sources using services such as Claude 2, GPT‑4, or locally deployed Llama 2 and ChatGLM2 for privacy‑sensitive scenarios.

Dataset quality must balance flexibility, diversity, and accuracy. In practice, a mix of 30 % domain data and 70 % general data yields models that are both adaptable and precise while lowering overall data‑collection costs.

Data types are divided into representation‑heavy tasks (e.g., re‑phrasing Java thread explanations) and knowledge‑intensive QA tasks, the latter requiring full‑parameter fine‑tuning and substantial hardware (e.g., 80 GB A800 for Llama 2‑13B).

Product-wise, Deepexi offers the Fast5000E training‑inference appliance that bundles hardware and model deployment, and the FastAGI platform for rapid agent development (Data Agent, Doc Agent, Plugin Agent), enabling enterprises to build AI‑enhanced applications with minimal effort.

Overall, Deepexi’s strategy combines agile data governance, cost‑effective model training, and a comprehensive product suite to accelerate large‑model adoption in the enterprise, positioning data governance as a technology‑driven, integrated discipline.

Artificial Intelligence Model Fine‑tuning

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.