Big Data 13 min read

Integrating Heterogeneous Data Sources with openLooKeng and Upgrading the Apache Kylin Connector at AutoHome

This article describes how AutoHome tackled the complexity of managing multiple relational, NoSQL, and Hive data stores by adopting openLooKeng for unified, cross‑source SQL queries, outlines its key features such as ANSI‑SQL support, diverse connectors, and query optimizations, and details the custom enhancements made to the Apache Kylin connector to better serve their commercial data analysis workloads.

HomeTech
HomeTech
HomeTech
Integrating Heterogeneous Data Sources with openLooKeng and Upgrading the Apache Kylin Connector at AutoHome

Background and Current Situation AutoHome provides comprehensive automotive information and uses a variety of data management systems—including relational databases, NoSQL, document stores, key‑value stores, and object storage—creating challenges in data integration and analysis due to differing SQL dialects and multiple programming entry points.

Solution After evaluating several options, AutoHome selected openLooKeng because it can perform federated queries across RDBMS, NoSQL, Hive, and MPPDB warehouses, offering cross‑source heterogeneous query capabilities that simplify large‑scale data analysis and accelerate decision‑making.

Key Features

1. ANSI SQL2003 Support – Users can write queries using standard ANSI SQL without adapting to each data source’s dialect, while openLooKeng’s connector framework abstracts the underlying systems.

2. Rich Connector Ecosystem – Provides connectors for Oracle, Hive, HBase, Elasticsearch, and more, enabling seamless data retrieval from diverse sources for high‑performance in‑memory federated computation.

3. High‑Performance Query Optimizations – Includes bitmap, bloom‑filter, and min‑max indexes, multiple cache layers (metadata, plan, ORC data), dynamic filtering, and operator push‑down to leverage the computational power of source systems.

Commercial Data Analysis Application Practice By exposing openLooKeng via JDBC or its web UI, developers and analysts can issue a single SQL statement to query across isolated data islands, reducing development effort, learning cost, and error rates while improving analysis efficiency.

Upgrade of the Apache Kylin Connector

AutoHome identified gaps in Kylin support and contributed enhancements:

1. Metadata Handling – Adjusted metadata extraction to accommodate Kylin’s project, model, cube, and segment structures.

2. SQL Keyword Treatment – Implemented quoting and upper‑casing for reserved keywords and aggregate function aliases.

3. SQL Optimizer Adjustments – Disabled the SingleDistinctAggregationGroupBy rule for Kylin queries to prevent mismatches with existing cubes.

4. Custom Rule Framework – Added a blacklist configuration in the connector’s settings, loaded during server startup, and integrated rule filtering logic into OptimizerUtils to selectively disable incompatible optimizations.

Summary and Planning The adoption of openLooKeng has equipped AutoHome with cross‑source query capabilities, streamlining data analysis across business lines. Upgrading the Kylin connector enriches the openLooKeng ecosystem and aligns with AutoHome’s expanding analytical scenarios, with ongoing commitment to community collaboration and continuous improvement.

Instructor Introduction Yu Haijun, currently working in the automotive manufacturer’s technology division, is responsible for the architecture and data development of AutoHome’s commercial advertising and data products.

big dataSQLQuery Optimizationdata integrationKylinconnectorsopenLooKeng
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.