Big Data 20 min read

UData: Solving the Last Mile of Data Usage – Architecture, Query Engine Design, and Federated Query Enhancements

This article introduces the UData platform, explains its data‑integration architecture, details the StarRocks‑based query engine workflow from SQL parsing to distributed execution, and describes recent optimizations such as computation push‑down, support for JSF/HTTP/ClickHouse external tables, and a proxy‑based federated query framework.

JD Tech
JD Tech
JD Tech
UData: Solving the Last Mile of Data Usage – Architecture, Query Engine Design, and Federated Query Enhancements

The article begins with an overview of UData, a platform that bridges data assets and data applications, aiming to solve the "last mile" of data usage by managing data through four stages: ingest, governance, discovery, and consumption.

It then presents the system’s functional architecture, illustrating how data flows from source to service modules, and shows the resource‑isolation design that separates physical and logical resources for multi‑tenant environments.

Section 2 describes the query engine, which is built on StarRocks and consists of a FrontEnd (FE) that handles client requests, metadata, and plan generation, and a BackEnd (BE) that executes scans, projections, aggregations, and joins on distributed fragments.

The end‑to‑end SQL execution process is detailed: parsing, binding metadata to generate a Relation, transforming the abstract syntax tree into a logical plan, applying cost‑based optimization (CBO), generating a distributed physical plan, and finally executing fragments on BE nodes before returning results to the client.

Optimization techniques are introduced, focusing on computation push‑down for external tables (Elasticsearch, MySQL, ClickHouse, JSF, HTTP). The FE rewrites plans to match push‑down patterns, while the BE creates corresponding nodes to execute the pushed‑down operators, achieving several‑fold performance gains.

Support for new external data sources is added: JSF and HTTP tables are defined with CREATE EXTERNAL TABLE statements, and custom functions such as jsfparam , httpconfig , httpheader , and httpbody enable parameter push‑down and runtime filtering. ClickHouse is accessed via the MySQL wire protocol.

Example SQL for a JSF external table: CREATE EXTERNAL TABLE `jsf_f` ( `recv_count` int(11) NULL COMMENT "", `timeout_count` float NULL COMMENT "", `time_rate` float NULL COMMENT "" ) ENGINE=jsf COMMENT "jsf table test" PROPERTIES ( "api" = "com.jd.udata.query.xx", "jsf_alias" = "vx", "method" = "apiXxx", "token" = "xxx", "clazz" = "xxx", "mapping" = "" ); Querying the JSF table: SELECT * FROM jsf_sample WHERE recv_count >= 1000 AND jsfparam(time_rate, '{"requestId":"requestId","appToken":"appToken","appId":"appId","erp":"http","apiGroupName":"81X","apiName":"618api","params":{}}'); Example for an HTTP external table: CREATE EXTERNAL TABLE `http_sample` ( `recv_count` int(11) NULL COMMENT "", `timeout_count` float NULL COMMENT "", `time_rate` float NULL COMMENT "" ) ENGINE=http COMMENT "http table test" PROPERTIES ( "url" = "https://udata.xxxxx.com/query?", "mapping" = "" ); Querying the HTTP table: SELECT * FROM http_sample WHERE httpconfig(recv_count, '{"httpmethod":"post"}') AND httpheader(recv_count, '{"Content-Type":"application/x-www-form-urlencoded"}') AND httpbody(recv_count, 'params1=beijing&param2=2022') AND recv_count >= 1000; A proxy mechanism is introduced to enable federated queries over additional data sources (e.g., Hive, Iceberg, Hudi). The proxy can operate in batch or streaming mode, and logical‑read plugins can be hot‑plugged to extend support for new heterogeneous sources. Finally, the article summarizes that UData, built on StarRocks, improves query performance, supports extensive federated data sources, and continues to explore lake‑house integration and advanced push‑down capabilities.

Big DataStarRocksSQL Optimizationdata integrationQuery Enginefederated queryUData
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.