How AI+Data Agents Are Transforming the Automotive Industry’s Digital Leap

In an interview, Di Xingxing of Autohome details their AI+Data framework—unified lake‑warehouse, intelligent engine, and agent services—that breaks data silos, blends traditional models with LLMs, leverages causal inference and RAG knowledge bases, and uses continuous feedback to build explainable, evolving data agents for accurate sales forecasting, competitive analysis, and end‑to‑end business automation in the automotive industry.

DataFunTalk
DataFunTalk
DataFunTalk
How AI+Data Agents Are Transforming the Automotive Industry’s Digital Leap

In the deep‑water stage of digital transformation for the automotive industry, data silos and shallow value extraction remain core pain points. Di Xingxing, head of the computing platform at Autohome, points out that the key to breaking this impasse lies in building an AI+Data system centered on “data agents.”

By constructing a three‑layer architecture—unified lake‑warehouse, intelligent engine, and agent services—Autohome has achieved precise implementation in scenarios such as sales forecasting and competitiveness analysis.

DataFun: You mentioned the automotive industry suffers from “data silos and shallow value mining.” How do you break data boundaries when building data agents, and can you share the key technical path for cross‑department data integration?

Di Xingxing: Full‑scale data ingestion is the foundation. Automated tools scan, tag, and intelligently recognize data assets, linking them to a standardized data catalog to create an enterprise‑level data asset map. This dramatically lowers the barriers to discover, understand, and reuse data, accelerating business innovation.

DataFun: The presentation highlighted that data agents need “perception, cognition, decision, and action” capabilities. How do you quantify the maturity of these four abilities, and which technologies (e.g., multimodal learning, causal inference) support their evolution?

Di Xingxing: A complete evaluation system is required, covering both end‑to‑end capabilities and sub‑capabilities. The maturity of a technology is judged by its ability to meet diverse application requirements, balancing accuracy, efficiency, and the need for human intervention.

The large‑model technology, especially reinforcement learning and massive pre‑training, is central, linking multimodal perception, planning, and tool invocation.

DataFun: In the data foundation layer, you chose Apache Paimon and StarRocks. Compared with Iceberg/Doris, how do they meet the real‑time and query performance needs of agents?

Di Xingxing: Paimon + StarRocks offers better stream support and we have extensive operational experience with StarRocks. The combination provides minute‑level data freshness and second‑level query performance, satisfying most analytical scenarios. For low‑latency, low‑volume cases, direct MySQL access is also supported to reduce processing costs.

DataFun: The “private knowledge” layer emphasizes RAG and domain knowledge integration. How do you address term ambiguity and long‑tail knowledge coverage? Do you have custom embedding optimization or fine‑tuning strategies?

Di Xingxing: We built a three‑dimensional knowledge base covering vehicle parameters, user reviews, and professional evaluations using our proprietary “Cangjie” large model, which ranked first in the SuperCLUE Chinese automotive knowledge benchmark. We continuously enrich the semantic layer with long‑tail knowledge and collect real‑world interaction data to refine terminology and knowledge.

DataFun: When orchestrating agents with Dify, how do you balance low‑code efficiency with complex business logic flexibility? Which custom extensions are critical for automotive scenarios?

Di Xingxing: Dify’s low‑code capabilities enable rapid iteration, while plugin extensions and Python code provide flexibility. For high‑stability requirements, we develop custom Golang services to ensure robustness.

DataFun: Given that AI accuracy cannot reach 100%, how does Autohome set evaluation standards (e.g., confidence thresholds) for different scenarios, and which scenarios allow tolerance?

Di Xingxing: Short‑term 100% accuracy is unrealistic due to model limits and semantic layer quality. We define evaluation sets per scenario and reach consensus with business owners on launch standards.

In serious data‑driven contexts such as financial reports, direct deployment is still limited. The Data Agent acts as a junior analyst, capable of problem decomposition, planning, execution, and providing transparent analysis that users can judge, adopt, or refine.

图片
图片

Future multi‑agent collaboration is being planned. Ideally, a data analysis agent discovers issues, an attribution agent identifies key factors, a strategy agent proposes marketing actions, and an execution agent carries out the plan, with the cycle closing back to the analysis agent for impact assessment.

data engineeringAIlarge language modelsRAGcausal inferenceautomotivedata agents
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.