Design Considerations for Next‑Generation AI Platforms: Programming Languages, Runtime Environment, Scheduler, and Model Deployment
The article examines three key design dimensions of modern AI platforms—programming language choice, runtime environment isolation, and scheduling/resource management—while also discussing challenges in model deployment such as algorithm diversity, resource usage patterns, and architectural generality, proposing Kubernetes‑based solutions and Arrow‑based data sharing to achieve efficient, scalable AI services.
Artificial intelligence has become a commonplace term since 2021, with many real‑world applications in fintech, big data, and other fields. As the underlying platform for AI algorithms, AI platform technology is receiving increasing attention.
AI platform technology is known by many names—machine‑learning platform, deep‑learning platform, AI operating system, algorithm platform, analysis platform, compute platform—reflecting different vendor emphases.
This article focuses on three design dimensions of next‑generation AI platforms.
01 Programming Language
Programming languages are the ultimate human‑machine interface. Historical shifts from assembly to C enabled modern OSes, and from C to Java powered internet and mobile services. Over the past decade, Python has risen to become one of the top three languages, especially dominant in AI, influencing platform design in two ways.
1. Runtime Environment
Python’s popularity stems from its rich ecosystem of third‑party libraries. AI research produces many models using diverse libraries, making it hard to standardize a single software environment across projects. Moreover, Python often mixes managed (Python) and unmanaged (C/C++) code, increasing environment‑management complexity. AI platforms therefore must support project‑level environment isolation and lifecycle management, which cannot be achieved by a globally shared configuration.
Solutions involve three layers: user‑interaction, managed‑code, and unmanaged‑code. Managed‑code can use tools like conda or pip for dynamic package distribution. For unmanaged code, container technology (Kubernetes) provides full isolation, allowing different binary dependencies per container and automatic cleanup.
2. Code Execution Efficiency
When the platform and the application share the same language, interaction cost is minimal (native language advantage). In AI, this advantage is lost: business logic is written in Python, while many platform components are Java‑based, and the underlying compute resources (CPU/GPU/FPGA/ASIC) are native to C/C++. This mismatch raises execution overhead.
Improving efficiency includes using shared‑memory mechanisms (e.g., Apache Arrow’s Plasma Object Store) to enable zero‑copy data exchange, and leveraging GPU/ASIC acceleration via C/C++‑based compute engines with Python bindings, rather than Java‑based engines accessed through Py4J.
02 Scheduler
AI models rely on data, and model monitoring also requires data, making integration with big‑data platforms essential. Historically, data and models have been siloed, leading to costly data exchange.
Using a unified Kubernetes cluster for resource scheduling can mitigate isolation, but process‑level scheduling still faces challenges. Four technical problems must be solved:
Data‑locality aware scheduling (e.g., Arrow’s Plasma to place compute where data resides).
Memory‑sharing mechanisms (distributed Arrow to enable zero‑copy IPC).
Data‑ownership transfer (allowing the original producer to release resources after moving data to shared memory).
Support for both stateless and stateful tasks, with stateful tasks being common in deep‑learning workloads.
03 Model Deployment
Deploying AI models remains complex due to three factors:
Algorithm Diversity : A wide range of models (ML, DL, ensemble, optimization, RL, GNN) must be supported without becoming a bottleneck.
Resource Usage Characteristics : Unlike typical web micro‑services (I/O‑bound, serving thousands of requests), AI model services are CPU‑bound, serving only tens of requests per instance and requiring large model memory, making load‑balancing harder.
Architectural Generality : External model serving back‑ends (e.g., TensorFlow Serving) often lock users into a specific framework, increasing conversion costs and limiting flexibility.
Industry trends favor separating HTTP servers from model compute clusters (external micro‑service back‑ends) while maintaining a generic, framework‑agnostic serving layer.
04 Conclusion
AI development depends on seamless integration of data, algorithms, and compute. The article shares insights from the development of the “Two‑Yuan” AI platform at Xingyun Digital, highlighting the three design drivers discussed above and encouraging industry collaboration to democratize AI over the next decade.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.