Artificial Intelligence 17 min read

One‑Stop AI Platform for Cloud, Edge, Mobile, Flink, and Application Intelligence: Architecture, Challenges, and Solutions

The article presents a comprehensive one‑stop AI platform that unifies training, model, feature, and decision services across cloud, edge, mobile, Flink, and application environments, detailing its architecture, the limitations of cloud‑centric inference, the advantages of localized inference, and the challenges and solutions for model and feature localization, SDK design, and future AutoML enhancements.

HelloTech
HelloTech
HelloTech
One‑Stop AI Platform for Cloud, Edge, Mobile, Flink, and Application Intelligence: Architecture, Challenges, and Solutions

With the rise of the big‑data era and increased computing power, artificial intelligence has made significant progress across many fields. Traditional cloud‑centric architectures—where features are stored and models are trained in the cloud, inference is performed online in the cloud, and results are displayed on the client—exhibit growing limitations such as bandwidth, latency, data‑security, and cost.

Cloud Inference Mode

The typical cloud inference workflow includes:

Clients upload offline and online feature data to the cloud for storage.

The cloud feeds offline features to the model for training.

After training, the model is deployed or updated.

Clients send online inference requests to the cloud.

The cloud combines the request features with the model file to produce predictions.

The cloud returns sorted results to the client for display.

Problems of Cloud‑Centric Inference

Bandwidth and latency: every request must travel to the cloud, causing delay, especially when large feature payloads (e.g., images, video) are involved.

Data security: transmitting user data to the cloud introduces leakage risks.

Data privacy: some user data cannot be stored in the cloud.

Cost: massive online inference workloads require a large number of cloud servers.

Centralization: any network or cloud service outage disables inference.

Edge (Endpoint) Intelligence Mode

Edge intelligence moves computation, storage, and inference to edge devices (smartphones, IoT devices, etc.). The workflow is:

Clients still upload offline/online features to the cloud for storage.

The cloud continues to train models with the accumulated features.

When a model is ready, the cloud notifies edge devices to download the model file.

Edge devices periodically pull updated features from the cloud.

Edge devices perform local inference using the downloaded model and locally cached features.

Results are sorted and displayed locally, eliminating network latency.

Advantages of Edge Intelligence

No bandwidth or latency constraints for inference.

Data stays local, eliminating security and privacy concerns.

Local inference continues even if the cloud or network fails.

Reduced cloud resource consumption and cost.

Current One‑Stop AI Platform Architecture

The platform is divided into four major sub‑systems:

Training Platform : supports offline model training, resource management, distributed training/prediction, and offline inference.

Model Platform : manages TF1/TF2, PMML, Python, and Python‑GPU models, providing online inference capabilities.

Feature Platform : offers offline/online feature storage, real‑time feature computation, feature joining, cleaning, selection, and querying.

Decision Platform : based on DAG workflow orchestration, it integrates Groovy scripts, multi‑model, multi‑feature pipelines, and exposes a unified online inference service.

Overall Process

1. Feature creation and change → online feature storage. 2. Model training, creation, and version management → model upload, online/offline inference. 3. Online inference: business services call the decision service, which performs Groovy rule evaluation, feature retrieval, preprocessing, and model inference, then returns results.

Mobile‑Side Intelligence

Mobile devices become inference endpoints. Key differences from cloud mode include:

Model version management moves to the mobile side; the app must handle model updates.

Model files are distributed to each device instead of a central ModelService.

Runtime environment shifts from ModelService to the mobile device (e.g., using MNN).

Features are stored locally; online inference no longer depends on cloud data.

Flink‑Side Intelligence

Flink jobs act as inference nodes. Changes compared with cloud mode:

Inference is triggered by messages rather than SOA APIs.

Models are loaded in Flink via Apollo configuration changes.

Features are cached on local disks or in memory (RocksDB or in‑memory stores) and updated via Apollo.

Decision logic is packaged as an SDK and invoked inside Flink.

Application‑Side Intelligence

Business applications embed the entire inference stack:

Models run inside the application process (e.g., TensorFlow‑Java).

Features are persisted locally on disk or memory.

Decision SDK is integrated directly, eliminating remote service calls.

Challenges & Solutions Across All Endpoints

Model localization : replace SOA calls with local model loading; use dual‑model objects for seamless version transition; pre‑heat new models based on recent latency statistics.

Feature localization : parallelize feature loading, leverage high‑throughput SSD/NAS, and use RocksDB or in‑memory caches to reduce startup latency.

Online feature pressure : unified connection‑pool management, isolation, and rate‑limiting to protect online stores.

SDK design : separate Model SDK, Feature SDK, and Decision SDK to simplify integration for mobile, Flink, and application sides.

Model type compatibility : currently Java services support TF and PMML; plan to provide conversion tools or a unified runtime for other model formats.

Overall, the one‑stop AI platform already supports multi‑endpoint online inference, but continuous optimization is needed. Future work will focus on AutoML, AutoFE, model‑as‑a‑service, distributed training/prediction, and alignment with industry‑leading AI platform architectures.

distributed systemsFlinkEdge ComputingFeature Engineeringmobile AIModel InferenceAI Platform
HelloTech
Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.