Doris 2.0.2 vs 1.2.3: Real‑World Query Performance Comparison
After upgrading a Doris cluster from version 1.2.3 to 2.0.2, the author runs a series of SQL benchmarks—including PK lookups, top‑client queries, distinct counts on low‑ and high‑cardinality columns, minute‑level session analysis, and full‑table deduplication—to measure execution times, revealing mixed performance gains and regressions across the seven test scenarios.
0. Preparation
The Doris cluster was upgraded from 1.2.3 to 2.0.2. The goal is to compare query efficiency between the two versions using the same data sets and SQL statements.
1. PK Test – Count domains where target_ip is empty (lower‑cased)
select domain, count(domain) as count
from (
select lower(domain) as domain
from logs_from_spark01
where target_ip='""'
) t
group by t.domain;Result on 1.2.3: 0.21 s . 2.0.2 (average of multiple runs): 0.35 s . The upgraded version is slightly slower.
2. Top 100 client_ip by access count (with location)
select t1.client_ip, t2.nature, t2.province, t2.city, t1.count
from (
select client_ip, count(client_ip) as count
from logs_from_spark01
group by client_ip
order by count desc
limit 100
) t1
inner join logs_from_spark01 t2 on t1.client_ip = t2.client_ip
group by client_ip, nature, province, city, count;Result on 1.2.3: 2.67 s . 2.0.2 (worst of several runs): 2.12 s . The newer version is faster.
3. Count distinct low‑cardinality column (client_ip)
select count(distinct client_ip) from logs_from_spark01;Result on 1.2.3: 0.25 s . 2.0.2: 0.24 s . No noticeable difference.
4. Count distinct high‑cardinality column (domain, case‑sensitive)
select count(distinct domain) from logs_from_spark01;Result on 1.2.3: 3.40 s . 2.0.2 (worst of several runs): 3.26 s . The upgraded version is faster.
5. Top 100 client_ip by continuous minute‑level sessions
select client_ip, max(row_num2) as max
from (
select client_ip, row_number() over (partition by client_ip, sub_date) as row_num2, date_min
from (
select client_ip, sub_date,
row_number() over (partition by client_ip, sub_date order by date_min) as row_num,
minutes_sub(to_date(date_min), row_num) as sub_date
from (
select client_ip, minute_floor(time) as date_min
from logs_from_spark01
where length(time)=14 and time like '20220730%'
) t
) A
where A.row_num = 1
) B
group by client_ip
order by max desc
limit 100;Result on 1.2.3: 17.57 s . 2.0.2 (worst of several runs): 13.26 s . The newer version is faster.
6. List all countries, provinces, cities, and operators per client_ip
select nature, province, city, operator
from logs_from_spark01
group by nature, province, city, operator;Result on 1.2.3: 0.68 s . 2.0.2 (average of runs): 0.81 s . Slightly slower after upgrade.
7. Count total rows without duplicates (full‑row deduplication)
SELECT count(*)
FROM (
SELECT *, row_number() over (partition by client_ip, nature, province, city, operator, domain, time, target_ip, rcode, query_type, authority_record, add_msg, dns_ip) as row_num
FROM logs_from_spark01
) t
WHERE t.row_num = 1;Result on 1.2.3: 33 s . 2.0.2 (average of runs): >40 s. The upgraded version is slower.
Overall Conclusion
The author selected the worst‑case execution time for queries that showed improvement and the average time for queries that regressed, arguing that this approach fairly reflects the real impact of the upgrade. Across the seven benchmark scenarios, Doris 2.0.2 delivers mixed results: some queries run faster, while others become marginally slower or significantly slower.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
