How Voice AI Is Powering Alibaba's IoT Revolution
In this keynote, Alibaba's chief scientist explains how voice AI serves as the natural interface for IoT, detailing the company's strategy to connect billions of devices through cloud infrastructure, AI-driven perception, and multimodal interaction across consumer and industrial applications.
At the 2018 International Smart Technology Summit in Shenzhen, Alibaba DAMO Academy’s chief scientist of voice interaction, Yan Zhijie, delivered a keynote titled “Voice Interaction Intelligence in the IoT Era.” He outlined Alibaba’s new strategic focus on the Internet of Things (IoT) as a core business track alongside e‑commerce, finance, logistics, and cloud computing.
Alibaba aims to build IoT infrastructure that will connect 10 billion devices within five years. The company views computing as the heart, AI as the brain, and IoT as the nervous system, emphasizing a full‑stack approach from cloud computing to AI algorithms that enable natural voice interaction.
Voice is presented as the most natural way for humans to interact with IoT devices. Yan argues that because people naturally use voice to communicate with each other, extending this modality to machines creates seamless, hands‑free experiences—such as controlling a car while driving or accessing services without touching a screen.
Advances in AI have moved voice interaction from merely “usable” to “delightfully usable,” enabling personalized services and bridging human‑machine interaction with intelligent, context‑aware responses.
Alibaba leverages its extensive internet content and services—e‑commerce platforms, payment systems, video, navigation, travel, and more—to reach consumers through a variety of IoT endpoints, including smart speakers (e.g., Tmall Genie), smart TVs, connected cars, and robots. Multimodal interfaces combine voice, computer vision, and other sensors to create rich interaction experiences.
Specific examples include the Tmall Genie smart speaker, which sold one million units in a single day during Double 11 and has accumulated over two million sales; the Zebra network joint venture with SAIC for connected cars, enabling voice‑controlled functions like opening sunroofs; and the AliOS‑based smart TV boxes that support voice navigation for families.
Alibaba also explores public‑service applications, such as voice‑enabled ticketing kiosks in Shanghai Metro that integrate map data, route planning, and Alipay for seamless purchases, while handling noisy environments through microphone arrays and multimodal processing.
The company emphasizes end‑to‑end technology ownership—from microphone hardware and array design to signal processing, speech recognition, synthesis, and voiceprint verification—ensuring that core interaction technologies remain under Alibaba’s control and can be rapidly adapted to new products.
Alibaba’s broader vision is to create an open, low‑cost, easily replicable IoT solution stack that includes both hardware modules and software platforms, enabling third‑party devices to access Alibaba’s cloud services and content through natural interaction interfaces.
Finally, Yan reflects on strategic debates in the IoT space, such as centralized versus decentralized architectures, the role of IoT as a new internet entry point, and whether companies should build hardware themselves or partner with hardware leaders.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
