Insights and Highlights from the 2018 Gdevops Global Agile Ops Summit
The 2018 Gdevops Global Agile Operations Summit in Chengdu gathered industry experts who shared practical insights on AIOps implementation, sharding database ecosystems, DevOps adoption in traditional enterprises, large‑scale data management, ElasticSearch clustering, AWS blue‑green deployments, cloud database operations, Alibaba's double‑11 ops platform, 58 delivery mini‑program architecture, and scalable game service design.
Traditional Enterprise AIOps: Data, Algorithms, and Scenarios
Song Hui identified three essential factors for successful AIOps adoption:
Data: High‑quality, time‑series operational metrics, logs, and topology information must be collected, normalized, and stored in a scalable time‑series database.
Algorithms: Anomaly detection, root‑cause analysis, and predictive models are applied on the unified data set. Model training pipelines should be automated and continuously retrained with fresh data.
Scenarios: Concrete use cases (e.g., intelligent alert suppression, automated remediation, CMDB enrichment) drive value. Start with a limited, well‑defined scenario, validate the model, then expand to broader domains.
The Xinju AIOps platform demonstrates these principles through intelligent monitoring, automated operations, CMDB integration, and a unified operations portal.
Sharding‑JDBC → Sharding‑Sphere Ecosystem
Zhang Liang described the challenges of high‑availability, massive storage, and high concurrency in internet‑scale databases. Sharding‑JDBC provides four core capabilities:
Data sharding across multiple physical nodes.
Read/write separation to offload read traffic.
Flexible (soft) transaction support across shards.
Data governance features such as encryption and masking.
To address cloud‑native deployment, heterogeneous language environments, and diverse user contexts, the team built a Sharding‑Sphere ecosystem consisting of:
Sharding‑JDBC: JDBC driver‑level sharding, zero‑intrusion for existing Java applications.
Sharding‑Proxy: A lightweight proxy that enables language‑agnostic access (MySQL, PostgreSQL protocols) and centralizes routing logic.
Sharding‑Sidecar: Sidecar containers that run alongside services, providing automatic service discovery and configuration without code changes.
The goal is a cloud‑native, decentralized architecture with zero‑intrusion deployment.
DevOps Practices for Traditional Industries
Wang Qing presented three case studies (ING, Capital One, a large domestic bank) and extracted common pain points:
Fragmented toolchains and manual hand‑offs.
Lack of standardized CI/CD pipelines.
Insufficient visibility into production metrics.
Key recommendations:
Adopt a composite DevOps toolchain that integrates source control, automated testing, artifact repositories, and deployment orchestration.
Build a platform layer that provides self‑service pipelines, policy enforcement, and unified monitoring dashboards.
Start with a pilot project that automates a single end‑to‑end workflow, then scale the platform incrementally.
Enterprise‑Scale Data Management Framework
Fu Chengyong introduced the “one‑body three‑numbers one‑integration four‑platforms” model:
One‑body: A unified data governance framework covering data quality, security, and lifecycle policies.
Three‑numbers: Management of metadata , master data , and big data assets.
One‑integration: Centralized, policy‑driven data integration across business domains.
Four platforms:
Data Warehouse & Big‑Data Sharing Center.
Data Asset Management Platform.
Master Data Management (MDM) Platform.
Self‑Service Analytics Platform.
This architecture enables full‑lifecycle data stewardship, from ingestion to consumption.
ElasticSearch Cluster Management
Xu Peng outlined a practical approach to operating large ElasticSearch clusters:
Cluster Planning: Define node roles (master‑eligible, data, ingest, coordinating) based on workload and fault‑tolerance requirements.
Key Configuration Parameters: cluster.initial_master_nodes, node.attr.box_type, indices.fielddata.cache.size, and shard allocation awareness settings.
Health Monitoring: Track cluster.health, JVM heap usage, disk watermarks, and thread‑pool rejection rates via Kibana or external monitoring tools.
Common Pitfalls: Over‑sharding, uneven shard allocation, and neglecting circuit‑breaker thresholds.
Implementing these practices helps maintain query latency and indexing throughput as data volume grows.
AWS Blue‑Green Deployment Strategies
Mon Wei compared three AWS‑based blue‑green patterns:
Elastic Beanstalk Environment Swap: Deploy a new environment, run health checks, then swap CNAMEs. Risks include DNS propagation delay.
ALB Target Group Switching: Register new task set or EC2 instances to a separate target group, then shift traffic via weighted routing. Allows gradual rollout and instant rollback.
CodeDeploy Blue‑Green: Uses Lambda hooks to pre‑ and post‑traffic validation, with automatic rollback on failure.
Recommendations include automating health‑check scripts, using canary percentages for gradual exposure, and preserving immutable infrastructure snapshots for rapid rollback.
Tencent Cloud Database Massive‑Ops Platform
Lu Yue described an automated operations platform that provides:
Resource provisioning and lifecycle management for MySQL, PostgreSQL, and NoSQL instances.
Unified operation APIs for schema changes, backup/restore, and scaling.
Real‑time monitoring dashboards with anomaly detection.
Self‑healing mechanisms that trigger automated diagnosis and remediation scripts.
The platform addresses challenges of customized service configurations and large‑scale diagnostic automation.
Alibaba Intelligent Operations Platform for Double‑11
Ru Bo presented a scenario‑driven architecture consisting of:
Base‑level ops (resource provisioning, health checks).
Application‑level ops (service mesh, circuit breakers).
Scenario modules (traffic surge handling, auto‑scaling policies).
Key technical points include metric‑driven decision loops, point‑to‑area scaling (aggregating fine‑grained metrics into area‑level capacity plans), and a holistic design that integrates AIOps models for anomaly detection.
58 Delivery Mini‑Program Architecture
Zhang Kai emphasized a stepwise approach:
Perform detailed requirement analysis to identify core user flows.
Design incremental APIs and micro‑services that can be independently deployed.
Select technology stack (e.g., Node.js + GraphQL for driver app, Flutter for user app) based on latency and concurrency needs.
Implement comprehensive monitoring (request tracing, error rates, business KPI dashboards) to ensure reliability during peak loads.
The case demonstrates how modular design and observability enable rapid scaling.
Scalable Game Service Architecture
Yang Biao compared traditional internet services with high‑throughput game back‑ends. He highlighted three balancing factors:
Data: Use sharded NoSQL stores for player state, with eventual consistency for non‑critical data.
Services: Deploy stateless game logic containers behind a service mesh that provides traffic routing and fault isolation.
Infrastructure: Leverage auto‑scaling groups and spot instances to handle bursty traffic during events.
Concrete examples from several high‑revenue games illustrate how to achieve horizontal scalability while maintaining low latency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
