Interview: Didi AI’s DELTA – A Unified Framework for NLP and Speech Model Development
In this interview, Didi AI Labs’ Han Kun explains how the DELTA platform unifies TensorFlow‑based NLP and speech models—supporting tasks from text classification to voice emotion recognition—through a modular, easily deployable architecture, accelerating development, powering Didi products, and now open‑sourced for broader AI collaboration.
Interview introduction: This article is part of CSDN’s “AI Technology Ecosystem” interview series, featuring Han Kun, the DELTA project lead at Didi AI Labs.
Han Kun has over ten years of experience in machine learning. He earned his Ph.D. at Ohio State University in 2014, worked at Facebook on speech recognition, natural language understanding, and recommendation systems, and joined Didi AI Labs in 2018 to lead a team focusing on NLP and speech research and product development.
Motivation: The team discovered that many different deep‑learning and NLP models were being used across projects. Unifying these models under a single framework would accelerate algorithm iteration and improve team collaboration.
Consequently, the technical team refactored the codebase, built DELTA as a unified system, and decided to open‑source it.
Version‑compatibility challenge: DELTA was originally built on TensorFlow 1.12. When TensorFlow upgraded to 1.14, the team quickly migrated core modules to leverage new features, but this caused incompatibility bugs for other developers and halted development. The experience taught the team to be cautious with upgrades. Later, when TensorFlow 2.0 arrived, DELTA supported both 1.14 and 2.0 before fully migrating.
Technical architecture: DELTA is primarily built on TensorFlow and supports both NLP and speech tasks as well as numeric feature training. It integrates important algorithms such as text classification, named‑entity recognition, natural‑language inference, question answering, seq2seq generation, speech recognition, speaker verification, and speech emotion recognition, all organized under a consistent code structure with unified interfaces.
Training pipeline: Users provide training data and a configuration file. The pipeline processes the data, selects the appropriate task and model, conducts training, and automatically saves the model. The saved model follows a unified interface that can be directly deployed, enabling rapid productization from research to production.
Key advantages: 1) Convenient usage – out‑of‑the‑box support for common speech and text tasks, multimodal learning, and highly configurable parameters. 2) Seamless deployment – training and serving are tightly coupled; all feature extraction and preprocessing are encapsulated as TensorFlow Ops, forming a unified TF graph that bridges data, model, and deployment. 3) Rapid development – modular components (CNN, RNN, attention, etc.) are well‑tested and reusable, allowing developers to build complex models quickly.
Applications: DELTA powers several Didi products, such as the “Didi Kua” praising system for drivers and the in‑car voice interaction system, both of which rely on DELTA’s natural‑language understanding modules.
Future plans: Reduce the usage barrier further, introduce AutoML for automated hyper‑parameter tuning, continue promoting DELTA in the ecosystem, and consider incubating top‑level open‑source projects.
Open‑source perspective: Didi has open‑sourced 39 projects covering AI, mini‑programs, smart transportation, middleware, front‑end frameworks, and development tools. DELTA started as an internal project, received positive feedback after being open‑sourced in 2019 at the ACL conference, and sparked discussion about the challenges of open‑source contributions in China.
<pre style="letter-spacing: 0.544px; line-height: 1.75em; background-color: rgb(255, 255, 255)"><pre><p><img src="https://mmbiz.qpic.cn/mmbiz_png/jE5bOw22iaBuWAdlAqTgfBBO17X2xFt6H3aF9JyDT8ibeCQdyxMjZq6UL0LLd82Fu9HmdgSLsTU6JPb43wjn6HAQ/640?wx_fmt=png" style="width: 100%; height: auto"/></p><pre style="letter-spacing: 0.544px"><section style="letter-spacing: 0.544px; text-align: center; line-height: 1.75em"><br/></section><section style="letter-spacing: 0.544px; text-align: center; line-height: 1.75em"><span style='font-size: 15px; font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif'><strong>关于DELTA</strong></span></section><section style='font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif; font-size: inherit; letter-spacing: 0.544px; text-align: center; line-height: 1.75em'><span style='font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif'><strong><span style="letter-spacing: 0.544px; font-size: inherit; color: rgb(255, 125, 65)">▬</span></strong></span></section></pre><p style="text-align: center"><img src="https://mmbiz.qpic.cn/mmbiz_png/jE5bOw22iaBuZxicakfRKJuQajSXNdm6TRHeWUXjLGicmhyibbQqMcAlQvM3OPEILAPqxIfMqRgeQiaApNKF4GHibzVQ/640?wx_fmt=png" style="width: 78%; height: auto !important"/></p><p style="text-align: center"><span style="color: rgb(136, 136, 136); font-size: 14px; text-align: center; font-family: Optima-Regular, PingFangTC-light">点击上图可进入Github项目页</span></p><p style="text-align: center"><span style="color: rgb(136, 136, 136); font-size: 14px; text-align: center"><br/></span></p><section style='font-family: mp-quote, -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif; text-align: justify'><span style="font-size: 14px"><span style="font-size: 14px; color: rgb(68, 68, 68)">DELTA 是滴滴AI Labs自主研发的基于深度学习的语音和自然语言理解的算法平台。DELTA 整合了包括文本分类、文本序列标注、语义理解、序列到序列文本生成、语音识别、语音特征分析等重要任务。DELTA还针对工业界常遇到的多模态数据提供了多模态训练。整个平台形成一致的代码组织架构,整体包装统一接口,也包含了完整的模型上线流程。可以做到便捷使用,简洁开发,</span><span style='font-size: 14px; letter-spacing: 0.544px; font-family: mp-quote, -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif'>( </span><span style='font-size: 14px; font-family: mp-quote, -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif; letter-spacing: 0.544px'>传送门: <a href="https://github.com/didi/delta" target="_blank">https://github.com/didi/delta</a> )</span></span></section><p style='font-family: mp-quote, -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif; text-align: justify'><span style="color: rgb(68, 68, 68); font-size: 14px"><br/></span></p></pre></pre> <pre><pre style="letter-spacing: 0.544px"><section style="letter-spacing: 0.544px; text-align: center; line-height: 1.75em"><span style='font-size: 15px; font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif'><strong>推荐阅读</strong></span></section><section style='font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif; font-size: inherit; letter-spacing: 0.544px; text-align: center; line-height: 1.75em'><span style='font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif'><strong><span style="letter-spacing: 0.544px; font-size: inherit; color: rgb(255, 125, 65)">▬</span></strong></span></section></pre></pre> <pre style="letter-spacing: 0.544px; background-color: rgb(255, 255, 255)"><section style="text-align: center"><a href="http://mp.weixin.qq.com/s?__biz=MzI1NDA3NzY4NA==&mid=2247486154&idx=1&sn=d4a61b1b34de35822f2e538157df8801&chksm=e9cbf551debc7c47767cc94e680a7471f81766c8c6877a7434b68eff4377932432cb00424a05&scene=21#wechat_redirect" target="_blank"><span><img src="https://mmbiz.qpic.cn/mmbiz_png/jE5bOw22iaBuWAdlAqTgfBBO17X2xFt6HlzNX1xV7YYLZG3Mbgicf2HmrPjyic5GDlG0UUP14kDaQS85EOh2YKsNQ/640?wx_fmt=png"/></span></a></section><section style="text-align: center"><a href="http://mp.weixin.qq.com/s?__biz=MzI1NDA3NzY4NA==&mid=2247486134&idx=1&sn=0b3408869f286a28b877d24a332b8372&chksm=e9cbf52ddebc7c3b4aa1b7e98ad38ef0c76929cf6572b2c27fd1aaf3fbff302dadd53480e68d&scene=21#wechat_redirect" style="letter-spacing: 0.544px" target="_blank"><span><img src="https://mmbiz.qpic.cn/mmbiz_png/jE5bOw22iaBtS4rtolh4zx8YzaFomDGU4B8VFqxiaEGRAo0Tq0fbbhTspdPJTzaSBwP7IK2QbOJsvvG54pfAxBvw/640?wx_fmt=png"/></span></a><br/></section></pre> <pre style="letter-spacing: 0.544px"><section style="letter-spacing: 0.544px; text-align: center; line-height: 1.75em"><br/></section><section style="letter-spacing: 0.544px; text-align: center; line-height: 1.75em"><span style='font-size: 15px; font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif'><strong>更多推荐</strong></span></section><section style='font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif; font-size: inherit; letter-spacing: 0.544px; text-align: center; line-height: 1.75em'><span style='font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif'><strong><span style="letter-spacing: 0.544px; font-size: inherit; color: rgb(255, 125, 65)">▬</span></strong></span></section></pre>Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
