Design and Implementation of a Data Application Platform for Business Opportunity Selection, Tagging, and Scheduling
The article describes a data application platform that enables business users to configure custom data selection rules for opportunities, create scheduled tasks, perform large‑scale data comparison, handle task dispatch with Redis queues, and implement rate‑limiting using sliding windows to ensure reliable processing.
Business Background : The platform processes business opportunities collected from 58.com, enriching them with attributes such as industry, size, address, contact times, and interaction counts, and allows operators to apply tags, silence, or recycle opportunities based on custom rules.
Platform Architecture consists of five layers:
Application layer – user interaction for task and tag configuration.
Configuration layer – definition of selection metrics, dictionaries, data source indexes, and action parameters.
Interaction layer – creation of tags and tasks, and monitoring of execution.
Execution layer – assembles SQL from selection rules, calls tagging/cleaning APIs, and computes data differences.
Storage layer – uses Elasticsearch/MySQL for selection data, Spark/Hive for diff computation, Redis list for task queues, and MySQL for persisting configurations.
Core Functions and Workflow :
Data Selection
Operators define JSON‑based selection criteria that are translated into SQL statements, e.g.,
(customer like '%装修%' or customer like '%餐饮%') and productLine = 56 and lastTime >= DATE_FORMAT(DATE_SUB(now(),INTERVAL 1 DAY),'%Y-%m-%d'). The platform builds a binary expression tree (AND/OR) to generate the final query.
Task Scheduling
Tasks can run at various frequencies (once, daily, weekly). Two approaches were considered: using the company‑built wjob (xxl‑job) shard‑broadcast strategy, or treating tasks as messages in a queue. The final solution stores tasks in a Redis list, with each node popping the next task every minute for execution.
Overall Process
Different frequencies are handled via strategy pattern.
Data source selection (ES, MySQL, etc.) is configured in tables and executed via a template method.
Actions are defined in tables and dispatched through a unified ext map parameter.
Challenges and Iterations :
Large‑scale tagging comparison : For <1 million records, in‑memory diff is feasible; for >3 million, Spark‑generated Hive SQL compares opportunity tables and stores results in MySQL.
Rate‑limiting downstream APIs : Each downstream API is limited to 2000 calls per minute. Implemented a Redis + Lua script counter with a sliding‑window algorithm (10‑second windows, aggregating six windows for a 60‑second view) to throttle requests and queue excess calls.
Backend Processing reads Spark‑generated MySQL tables, paginates results, separates new‑tag and untag operations, and uses strategy pattern to generate appropriate where clauses for complex map‑type fields.
Conclusion : The platform successfully enables configurable data selection, scheduled task execution, large‑scale diff computation, and robust rate‑limiting, meeting business requirements while remaining extensible for future data‑driven applications.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
