Tagged articles
5 articles
Page 1 of 1
Big Data Technology Tribe
Big Data Technology Tribe
Jan 20, 2026 · Big Data

Extending Spark SQL with LanceSparkSessionExtensions: A Complete Guide

This article explains how to inject the LanceSpark plugin into Spark, covering the core LanceSparkSessionExtensions class, various ways to register extensions, the custom parser and planner strategy implementations, and the underlying Spark mechanisms such as injectParser, injectPlannerStrategy, and PredicateHelper.

DataSourceV2LanceSparkPlannerStrategy
0 likes · 14 min read
Extending Spark SQL with LanceSparkSessionExtensions: A Complete Guide
DevOps Coach
DevOps Coach
Nov 13, 2025 · Databases

Explore ClickHouse 25.10: 20 JOIN Boosts, Vector Search & New SQL

ClickHouse 25.10 introduces a suite of enhancements—including 20 JOIN performance upgrades, lazy column replication, Bloom filter runtime filters, disjunction push‑down, automatic column statistics, the QBit vector type, expanded SQL operators, negative LIMIT/OFFSET, Arrow Flight support, and delayed secondary index materialization—backed by detailed benchmarks and contributor acknowledgments.

ClickHouseJOIN optimizationSQL Extensions
0 likes · 23 min read
Explore ClickHouse 25.10: 20 JOIN Boosts, Vector Search & New SQL
Bilibili Tech
Bilibili Tech
Nov 4, 2022 · Big Data

Advancements and Optimizations of FlinkSQL at Bilibili

Bilibili’s FlinkSQL team has enhanced the Flink engine—now based on 1.11 with back‑ported 1.15 features—by adding Delay‑Join, table‑valued functions, projection‑push‑down, UDF and object reuse, automatic mini‑batch/two‑phase aggregation, key‑group skew fixes, connector slot‑groups, real‑time projection with Hudi, and RocksDB state‑performance tweaks, while planning remote state backends and deeper stream‑batch integration.

FlinkSQLPerformance OptimizationReal-time Projection
0 likes · 29 min read
Advancements and Optimizations of FlinkSQL at Bilibili
vivo Internet Technology
vivo Internet Technology
Apr 20, 2022 · Big Data

Implementing Field Lineage in Spark SQL: A Technical Deep Dive

The article details how to add field‑lineage tracking to Spark SQL by creating a custom SparkSessionExtension that injects a check‑analysis rule and a parser, which capture INSERT statements, analyze the physical plan, and generate a JSON mapping of source‑to‑target fields for data governance.

Data GovernanceData QualityField Lineage
0 likes · 9 min read
Implementing Field Lineage in Spark SQL: A Technical Deep Dive