Databases 10 min read

Boost Real‑Time Lakehouse Queries with Hologres in 5 Minutes

This guide walks you through a 5‑minute challenge that shows how to enable Hologres real‑time lakehouse capabilities, configure OSS, DLF, and Hologres services, create external and internal tables, run TPCH Q11 queries, and submit results for prizes.

Alibaba Cloud Big Data AI Platform

Jan 9, 2024

Boost Real‑Time Lakehouse Queries with Hologres in 5 Minutes

Activity Overview

Participate in a 5‑minute challenge to quickly use Hologres real‑time lakehouse capabilities for accelerating analysis of data lake files (Hudi, Delta, Paimon, ORC) stored in OSS. The challenge runs from January 4 to January 30 2024.

Prizes

Challenge prize: 15 Xiaomi power banks (complete SQL execution and speed comparison).

First prize: LAMY fountain pen (top‑liked submission with ≥20 likes).

Second prize: 20 Xiaomi backpacks (ranked 2‑21 with ≥10 likes).

Participation prize: 100 community points per participant.

Preparation

Use the Shanghai region to enable OSS, DLF, and Hologres services. Follow the steps below:

Open the OSS console, create a bucket, and upload the tpch_10g_orc_3.zip test data (nation_orc, supplier_orc, partsupp_orc). Delete any .DS_Store files after upload.

Enable DLF, create a metadata database (e.g., mydatabase), and import the OSS data via a metadata extraction task.

Purchase or apply for a free‑trial Hologres instance (32C configuration recommended for trial).

Step 1: Configure Environment

In the Hologres instance, enable the Data Lake Acceleration feature (the instance will restart). Then create the dlf_fdw extension (if not already present) and define an external server pointing to the Shanghai region:

CREATE EXTENSION IF NOT EXISTS dlf_fdw;

CREATE SERVER IF NOT EXISTS dlf_server FOREIGN DATA WRAPPER dlf_fdw OPTIONS (</code>
<code>    dlf_region 'cn-shanghai',</code>
<code>    dlf_endpoint 'dlf-share.cn-shanghai.aliyuncs.com',</code>
<code>    oss_endpoint 'oss-cn-shanghai-internal.aliyuncs.com'</code>
<code>);

Step 2: Query OSS Data via Hologres External Table

Create external tables that map OSS data to Hologres:

IMPORT FOREIGN SCHEMA mydatabase LIMIT TO (nation_orc, supplier_orc, partsupp_orc)</code>
<code>FROM SERVER dlf_server INTO public OPTIONS (if_table_exist 'update');

Run a TPCH Q11 query against the external tables:

SELECT ps_partkey, SUM(ps_supplycost * ps_availqty) AS value</code>
<code>FROM partsupp_orc, supplier_orc, nation_orc</code>
<code>WHERE ps_suppkey = s_suppkey</code>
<code>  AND s_nationkey = n_nationkey</code>
<code>  AND RTRIM(n_name) = 'EGYPT'</code>
<code>GROUP BY ps_partkey HAVING SUM(ps_supplycost * ps_availqty) > (</code>
<code>    SELECT SUM(ps_supplycost * ps_availqty) * 0.000001</code>
<code>    FROM partsupp_orc, supplier_orc, nation_orc</code>
<code>    WHERE ps_suppkey = s_suppkey</code>
<code>      AND s_nationkey = n_nationkey</code>
<code>      AND RTRIM(n_name) = 'EGYPT'</code>
<code>)</code>
<code>ORDER BY value DESC;

Step 3: Query via Hologres Internal Table

Create internal tables with the same schema as the external tables and load data from the external tables:

<code>DROP TABLE IF EXISTS NATION;</code>
<code>BEGIN;</code>
<code>CREATE TABLE NATION (</code>
<code>    N_NATIONKEY INT NOT NULL PRIMARY KEY,</code>
<code>    N_NAME TEXT NOT NULL,</code>
<code>    N_REGIONKEY INT NOT NULL,</code>
<code>    N_COMMENT TEXT NOT NULL</code>
<code>);</code>
<code>CALL set_table_property('NATION','distribution_key','N_NATIONKEY');</code>
<code>COMMIT;</code>
... (similar statements for SUPPLIER and PARTSUPP) ...

Insert data from the external tables:

INSERT INTO nation SELECT * FROM nation_orc;</code>
<code>INSERT INTO supplier SELECT * FROM supplier_orc;</code>
<code>INSERT INTO partsupp SELECT * FROM partsupp_orc;

Run the same TPCH Q11 query against the internal tables (SQL identical to the external‑table version).

Results

Submit screenshots of query logs and performance for both external and internal table queries. Example result images are shown below:

Challenge Submission Rules

Use the account that opened the product to submit the challenge; the backend will verify product activation and SQL execution.

Upload original data tables and screenshots; plagiarism or vote‑rigging leads to disqualification.

Provide both external‑table and internal‑table SQL, logs, and results in a single combined screenshot.

SQL must involve multi‑table joins; single‑table queries are not accepted.

SQL Real-time analytics Hologres data lake TPCH

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Activity Overview

Prizes

Preparation

Step 1: Configure Environment

Step 2: Query OSS Data via Hologres External Table

Step 3: Query via Hologres Internal Table

Results

Challenge Submission Rules

Alibaba Cloud Big Data AI Platform

How this landed with the community

Was this worth your time?

0 Comments

Step 1: Configure Environment

Step 2: Query OSS Data via Hologres External Table

Step 3: Query via Hologres Internal Table