Operations 21 min read

Scaling Event Operations for Ten‑Million Online Securities Users

This article details how Ping An Securities built a technology‑first event‑handling team, created new reporting channels, developed a data‑construction platform, and implemented proactive monitoring to efficiently support over ten million internet securities users.

Efficient Ops

Jan 30, 2018

Scaling Event Operations for Ten‑Million Online Securities Users

Introduction

The talk “Event Operations for Ten‑Million‑Level Internet Securities Users” describes the challenges and solutions of handling production incidents at Ping An Securities as user volume grew from 1.5 million in 2014 to 10 million by the end of 2016.

1. Background

1.1 Internet Transformation

Ping An Securities started securities business in 1991; policy opening in 2013‑2014 and market boom led to rapid user growth, prompting the formation of an internet‑focused development team.

1.2 Impact of Rapid Growth

From 2014 the number of user‑reported incidents rose sharply, overwhelming the existing ITSM process and causing complaints and degraded user experience.

1.3 Formation of the Event‑Handling Team

The team adopted a technology‑first, business‑assisted model, staffed by developers and testers capable of handling most issues without involving core developers.

1.4 Event‑Handling Process

Incidents are received from client managers, analyzed using ELK logs, app telemetry and databases, then escalated to testers, developers, or operations as needed; resolved cases are documented in a knowledge base.

2. Reporting Channels

2.1 Legacy ITSM Channel

Traditional ITSM integrated with the group OA system handled over 800 channels and millions of tickets annually, but struggled with the surge in incident volume.

2.2 WeChat Channel

A dedicated WeChat group of 500 client managers was created, but low‑quality, brief reports increased handling cost.

2.3 Mobile Feedback System (MSS)

MSS provides an H5 front‑end for self‑service reporting, capturing user details, problem description and screenshots; it also offers progress tracking and offline chat for follow‑up.

2.4 Implementation Effect

After promotion, about 90 % of incidents are submitted through MSS, reducing noise and improving response time.

3. Data Construction

3.1 Reproducing Data

Testers need realistic data; manual SQL scripts were error‑prone and costly to maintain.

3.2 Platform Requirements

The platform must quickly generate both simple and complex test data via a visual interface, avoiding massive script maintenance.

3.3 UTA Data Construction Platform

A joint effort produced a platform that creates regular accounts, complex accounts with risk grades, and invokes dozens of APIs to assemble data.

3.4 Architecture

The platform supports three subsystems: account management, third‑party custodial binding, and transaction agreements.

3.5 Usage Statistics

Since June, the platform creates 80‑90 test records daily, with a 10 % failure rate due to deployment issues, and improves data‑construction efficiency by roughly eightfold.

4. Service Center

4.1 Data Analysis

A visual data‑analysis platform aggregates ITSM and MSS data for weekly and monthly trend reporting.

4.2 Incident Analysis

Incidents are categorized into data, program, and consultation types; consultation issues dominate (>80 % of volume), prompting knowledge‑sharing initiatives.

4.3 Live Streaming

Weekly one‑hour live sessions on the corporate “ZhiNiao” platform share incident handling best practices.

4.4 QA Service Center

A self‑service portal provides searchable solutions for client managers, consolidating knowledge‑base articles and hot‑issue FAQs.

5. Monitoring

5.1 From Passive to Proactive

Proactive alerts are introduced to detect issues before users report them, improving experience.

5.2 Implementation

Monitoring focuses on incident volume and severity, with specific rules for account‑system and trading‑system inconsistencies.

5.3 Effectiveness

Alerting via email enables rapid resolution; recent monitoring shows no new user complaints.

6. Conclusion

Address each incident‑handling step with a technology‑first approach.

Leverage platform tools to boost efficiency.

Continuously innovate methods to support frontline staff.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring event operations ITSM data construction Service Center

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.