Big Data 9 min read

Design and Implementation of a Data Application Platform for Business Opportunity Selection, Tagging, and Scheduling

The article describes a data application platform that enables business users to configure custom data selection rules for opportunities, create scheduled tasks, perform large‑scale data comparison, handle task dispatch with Redis queues, and implement rate‑limiting using sliding windows to ensure reliable processing.

58 Tech

Apr 20, 2023

Design and Implementation of a Data Application Platform for Business Opportunity Selection, Tagging, and Scheduling

Business Background : The platform processes business opportunities collected from 58.com, enriching them with attributes such as industry, size, address, contact times, and interaction counts, and allows operators to apply tags, silence, or recycle opportunities based on custom rules.

Platform Architecture consists of five layers:

Application layer – user interaction for task and tag configuration.

Configuration layer – definition of selection metrics, dictionaries, data source indexes, and action parameters.

Interaction layer – creation of tags and tasks, and monitoring of execution.

Execution layer – assembles SQL from selection rules, calls tagging/cleaning APIs, and computes data differences.

Storage layer – uses Elasticsearch/MySQL for selection data, Spark/Hive for diff computation, Redis list for task queues, and MySQL for persisting configurations.

Core Functions and Workflow :

Data Selection

Operators define JSON‑based selection criteria that are translated into SQL statements, e.g.,

(customer like '%装修%' or customer like '%餐饮%') and productLine = 56 and lastTime >= DATE_FORMAT(DATE_SUB(now(),INTERVAL 1 DAY),'%Y-%m-%d')

. The platform builds a binary expression tree (AND/OR) to generate the final query.

Task Scheduling

Tasks can run at various frequencies (once, daily, weekly). Two approaches were considered: using the company‑built wjob (xxl‑job) shard‑broadcast strategy, or treating tasks as messages in a queue. The final solution stores tasks in a Redis list, with each node popping the next task every minute for execution.

Overall Process

Different frequencies are handled via strategy pattern.

Data source selection (ES, MySQL, etc.) is configured in tables and executed via a template method.

Actions are defined in tables and dispatched through a unified ext map parameter.

Challenges and Iterations :

Large‑scale tagging comparison : For <1 million records, in‑memory diff is feasible; for >3 million, Spark‑generated Hive SQL compares opportunity tables and stores results in MySQL.

Rate‑limiting downstream APIs : Each downstream API is limited to 2000 calls per minute. Implemented a Redis + Lua script counter with a sliding‑window algorithm (10‑second windows, aggregating six windows for a 60‑second view) to throttle requests and queue excess calls.

Backend Processing reads Spark‑generated MySQL tables, paginates results, separates new‑tag and untag operations, and uses strategy pattern to generate appropriate where clauses for complex map‑type fields.

Conclusion : The platform successfully enables configurable data selection, scheduled task execution, large‑scale diff computation, and robust rate‑limiting, meeting business requirements while remaining extensible for future data‑driven applications.

Redis task scheduling Rate Limiting Spark

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.