Case Study: Integrating the AiFenxi BI Platform with Apache APISIX Gateway for Improved Performance and Stability
This case study details how the AiFenxi business intelligence platform integrated Apache APISIX as a high‑performance API gateway within Tencent Cloud TKE, addressing latency, scalability, and security challenges, and outlines the architectural changes, deployment steps, and resulting performance improvements.
Background
AiFenxi is the proprietary business intelligence (BI) analysis platform of the TAL Education Group, providing data analysis and decision‑support services for various business units within the group.
Platform Responsibilities
The platform handles massive data requests and complex business logic, with millions of OLAP queries per day. The traditional ingress architecture showed bottlenecks in performance, scalability, and security.
Core Pain Points
1. Tencent Cloud TKE CLB loopback issue caused increased latency and login cache problems. 2. Ingress configuration lacked flexibility for custom plugins. 3. Complex network topology required OpenResty to proxy services, adding performance overhead. 4. Higher SLA requirements demanded improved stability.
Why Adopt Apache APISIX?
APISIX offers a professional gateway team, high availability, rich plugin ecosystem, and easy migration from existing ingress configurations, making it suitable for the platform’s needs.
Integration with Group APISIX Gateway
The original ingress architecture required four hops to reach a service, involved OpenResty for internal‑external traffic split, and suffered from loopback issues. After replacing it with APISIX, the longest path reduced to three hops, the shortest to one hop, and zero‑trust routing eliminated loopbacks.
Stability Guarantees Required by AiFenxi
1. The gateway must ensure its own stability and provide alerting mechanisms. 2. Health‑check interfaces for AiFenxi services. 3. Log monitoring for non‑200 responses and high latency. 4. Configuration change alerts to prevent drift. 5. Open API for external health checks.
Implementation Challenges
High security requirements demanded zero‑trust access: non‑office‑network users must pass through a zero‑trust platform, and internal users are whitelisted. APISIX was extended with plugins to dynamically identify office‑network IP ranges and route traffic accordingly.
Implementation Plan
1. Resource preparation – machines, network, security, storage. 2. Cut‑over to production – test and pre‑release environments validated before production rollout. 3. Rollback plan – designated operators and acceptance criteria. 4. Gradual traffic shift – DNS‑based gray‑release starting at 3% and ramping to 100%.
Results and Benefits
1. No loopback issues after cut‑over, fully resolving latency spikes. 2. Login interface P90 latency improved by 30%. 3. Business API P90 latency improved by 11‑15%. 4. SLA levels increased, laying groundwork for a dual‑active architecture.
Future Planning
Goal: Enhance system robustness to handle high concurrency, cluster/room failures, and service‑unit failures. Plan: Implement a dual‑active (active‑active) architecture for AiFenxi.
TAL Education Technology
TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.