Alibaba Search Middle Platform DevOps Practices: Sophon, Bahamut, and AIOps
This article details Alibaba's three‑year journey building a search middle platform, describing how DevOps, goal‑driven operations, and AI‑assisted automation (Sophon, Bahamut, and AIOps) were introduced to improve scalability, stability, and efficiency for large‑scale search services.
In 2015 Alibaba launched a middle‑platform strategy, aiming to create a flexible "big middle, small front" architecture where the search middle platform would support agile front‑end businesses while consolidating digital operation and product technology capabilities.
Over three years the search middle platform evolved from manual, labor‑intensive operations to automated scripts, then to a DevOps‑integrated model, highlighting the need for full‑link, end‑to‑end operational control.
The Sophon system was built to address these challenges, dividing responsibilities into OPS, Online, and Offline layers and providing a unified coordination framework for complex multi‑service search scenarios.
Key DevOps goals include providing end‑to‑end experience, shifting from procedural to goal‑driven operations, abstracting operational and business concepts, and ensuring iteration efficiency while maintaining service stability.
Sophon implements target‑driven rolling upgrades, abstracts operational models into data‑relationship graphs, and adds a business‑level abstraction layer so users interact only with simplified business concepts.
Stability is ensured through SLA support, unitized online/offline services, automatic disaster‑recovery, and a rigorous multi‑step release verification process that includes performance testing, gray releases, and rollback safeguards.
Expert knowledge is captured in DAG‑based execution flows, allowing the platform to hide complex operational expertise from end users while automatically handling configuration dependencies and execution branching.
The offline component platform Bahamut translates user‑defined data‑source graphs into executable Blink jobs, handling merges, left‑joins, and incremental/full‑load pipelines to feed the search index.
Overall, the integrated DevOps practices, AI‑assisted operations (AIOps), and platform abstractions have dramatically improved iteration speed, cost efficiency, and reliability for Alibaba's large‑scale search services, with future work focusing on deeper AI integration and broader platform unification.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.