Tagged articles
29 articles
Page 1 of 1
DataFunTalk
DataFunTalk
Dec 17, 2025 · Artificial Intelligence

How Large Language Models Unlock Field‑Level Data Lineage at Scale

This talk explains how a data platform tackled massive, heterogeneous enterprise data by using large language models and prompt engineering to automatically extract field‑level lineage from SQL scripts, achieve over 80% coverage, and raise accuracy above 95%, dramatically cutting impact‑analysis time.

AI for data engineeringBig DataData Lineage
0 likes · 6 min read
How Large Language Models Unlock Field‑Level Data Lineage at Scale
Aikesheng Open Source Community
Aikesheng Open Source Community
Feb 27, 2025 · Information Security

Improving Data Export Workflows and Security: From 1.0 to 2.0 with Classification and Dynamic Approval

This article examines the security challenges of data export work orders in MySQL environments, outlines the shortcomings of the original 1.0 workflow, and presents a comprehensive 2.0 redesign that introduces dynamic approvals, data classification, execution‑plan analysis, and code‑level solutions to mitigate data leakage risks.

Data ExportDatabase SecuritySQL parsing
0 likes · 15 min read
Improving Data Export Workflows and Security: From 1.0 to 2.0 with Classification and Dynamic Approval
DataFunTalk
DataFunTalk
Mar 9, 2024 · Big Data

Construction and Application of Tencent Oula Data Lineage Platform

This article presents a comprehensive overview of Tencent Oula's data lineage system, detailing its background, goals, architecture, modular construction, key technologies such as graph databases and SQL parsing, and various internal application scenarios including data governance, cost insight, and baseline monitoring.

Data LineageGraph DatabaseSQL parsing
0 likes · 20 min read
Construction and Application of Tencent Oula Data Lineage Platform
Top Architecture Tech Stack
Top Architecture Tech Stack
Nov 5, 2023 · Databases

Understanding MySQL Communication Protocols, Parsing, Optimizer, Storage Engines, and Execution Engine

This article explains how MySQL establishes connections, the supported communication protocols and message formats, the lexical and syntactic parsing process, query optimization and execution plan generation, the role of different storage engines, and how the execution engine uses the plan to operate on the storage layer.

Communication ProtocolExecution EngineQuery Optimizer
0 likes · 15 min read
Understanding MySQL Communication Protocols, Parsing, Optimizer, Storage Engines, and Execution Engine
DaTaobao Tech
DaTaobao Tech
Jun 21, 2023 · Databases

Data Space Architecture and Metadata Models

The article outlines a data‑space architecture that employs a wide‑table design with dynamic columns and dedicated metadata tables, a metadata execution engine for business‑logic mapping, upgraded SQL parsing via Druid, MySQL‑proxy protocol handling, and distributed flow control using Redis and Zookeeper to enable scalable, multi‑tenant, low‑code and cloud‑native data management.

Data SpaceDatabase designOpen Platform
0 likes · 16 min read
Data Space Architecture and Metadata Models
DataFunSummit
DataFunSummit
May 10, 2023 · Big Data

Field-Level Data Lineage Extraction for FlinkSQL Using Apache Calcite

This article explains how to derive field‑level data lineage for FlinkSQL by leveraging Apache Calcite, covering the Calcite framework, FlinkSQL execution stages, the three‑step parsing approach, core source code details, practical Insert/Join examples, and extensions for lookup joins and UDTFs.

Apache CalciteData LineageFlinkSQL
0 likes · 12 min read
Field-Level Data Lineage Extraction for FlinkSQL Using Apache Calcite
Architect's Guide
Architect's Guide
Jan 7, 2023 · Databases

MySQL Execution Process Overview

This article explains the complete MySQL execution flow, covering the connector, permission verification, caching, parser, optimizer, executor, process states, SQL execution order, and the impact of WHERE‑clause condition ordering on query performance.

Execution ProcessPermissionsQuery Optimizer
0 likes · 13 min read
MySQL Execution Process Overview
政采云技术
政采云技术
Dec 6, 2022 · Fundamentals

How to Use Antlr4 for Custom SQL Parsing in Spark Projects

This guide explains common business scenarios that require custom SQL parsing, walks through setting up Antlr4 in IntelliJ IDEA, configuring Maven dependencies, generating parser code, and provides Java examples for extracting table names from Spark SQL statements, including handling of prediction modes and execution results.

Antlr4BackendParser
0 likes · 11 min read
How to Use Antlr4 for Custom SQL Parsing in Spark Projects
DeWu Technology
DeWu Technology
Nov 30, 2022 · Big Data

Fundamentals and Implementation of Data Lineage in Big Data Environments

Data lineage in big‑data environments tracks how data moves and transforms—from source tables through SQL processing to final storage—enabling management tasks such as domain segmentation, performance tuning, anomaly detection, and dependency verification, with implementations ranging from simple regex extraction to robust AST parsing and optimization, as used by tools like Alibaba DataWorks and Apache Atlas.

ASTBig DataData Lineage
0 likes · 7 min read
Fundamentals and Implementation of Data Lineage in Big Data Environments
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 22, 2022 · Databases

Apache Calcite Overview: Architecture, SQL Processing Flow, and Practical Example

This article introduces Apache Calcite as a modular data‑management framework, explains its architecture and SQL processing pipeline—from parsing and validation to relational‑algebra conversion, optimization, and execution—and demonstrates a complete CSV‑based query example with code snippets.

Apache CalciteRelNodeRelational Algebra
0 likes · 13 min read
Apache Calcite Overview: Architecture, SQL Processing Flow, and Practical Example
vivo Internet Technology
vivo Internet Technology
May 31, 2022 · Databases

Exploring Presto SQL Engine (3) - Implementing WHERE Condition Filtering with Antlr and Dynamic Code Generation

The third article in the Presto SQL Engine series demonstrates how to implement WHERE‑clause filtering with Antlr, contrasting a direct AST‑traversal visitor approach—hampered by branch prediction and JVM inlining issues—with runtime bytecode generation using airlift.bytecode, which yields roughly three‑fold speed gains but adds complexity.

Airlift BytecodeBytecode GenerationDynamic Code Generation
0 likes · 24 min read
Exploring Presto SQL Engine (3) - Implementing WHERE Condition Filtering with Antlr and Dynamic Code Generation
Bilibili Tech
Bilibili Tech
May 24, 2022 · Big Data

Metadata Infrastructure and Governance in Bilibili Data Platform

Bilibili’s data platform consolidates scattered metadata into a unified URN‑based model stored across TiDB, Elasticsearch, and HugeGraph, offering batch‑pull and embedded collection, flexible SQL‑like queries, comprehensive lineage mapping, and powering data‑map, lineage‑map, and impact‑analysis tools while planning expanded quality assurance and self‑service dictionaries.

Data GovernanceData LineageData Platform
0 likes · 21 min read
Metadata Infrastructure and Governance in Bilibili Data Platform
DataFunTalk
DataFunTalk
Feb 2, 2021 · Big Data

Metadata Management: Concepts, Architecture, and Applications in Data Warehousing

This article explains the fundamentals and value of metadata, describes a comprehensive metadata management system and its layered architecture, outlines key technologies such as automatic SQL metadata extraction, and showcases practical applications like metadata query, impact analysis, data lineage, and business‑driven data needs within modern data warehouses.

Data LineageSQL parsingdata-warehouse
0 likes · 17 min read
Metadata Management: Concepts, Architecture, and Applications in Data Warehousing
dbaplus Community
dbaplus Community
Jun 29, 2020 · Databases

How JDBC ResultSetType Settings Trigger SQL Parsing Errors and Performance Issues

The article examines a severe database performance slowdown caused by excessive library cache locks, traces it to improper JDBC ResultSetType settings that introduce unwanted ROWID columns during SQL parsing, and presents systematic experiments across various queries, ResultSetType values, JVM configurations, and database versions to recommend optimal parameter choices.

JDBCOracleResultSetType
0 likes · 11 min read
How JDBC ResultSetType Settings Trigger SQL Parsing Errors and Performance Issues
360 Tech Engineering
360 Tech Engineering
Jun 25, 2019 · Fundamentals

Building an LL(1) SQL Parser in Go

This tutorial explains how to implement a simple LL(1) parser in Go for SQL queries, covering lexical analysis, syntax analysis, finite‑state‑machine strategy, and testing, providing complete code snippets and practical guidance for developers interested in parser construction.

GoLL(1) parserSQL parsing
0 likes · 9 min read
Building an LL(1) SQL Parser in Go
dbaplus Community
dbaplus Community
Jul 25, 2018 · Big Data

How Ele.me Built a Scalable Metadata Governance System for Big Data

This article explains how Ele.me tackles big‑data challenges by designing a metadata governance platform that collects SQL execution data, parses lineage with Antlr, stores graph relationships in Neo4j, and enables table/column lineage queries, DAG scheduling, and hot‑data analysis.

Data LineageEle.meGraph Database
0 likes · 12 min read
How Ele.me Built a Scalable Metadata Governance System for Big Data
ITPUB
ITPUB
Jul 7, 2018 · Databases

Unlocking MySQL: How SQL Parsing Works and Boosts DBA Efficiency

This article explains why protecting database systems is critical, reviews existing SQL‑analysis tools, and dives deep into MySQL's lexical and syntax parsing techniques—including Bison‑generated parsers, core data structures, and practical applications such as useless‑condition removal and SQL feature generation—to help DBAs automate and optimize their workflows.

BisonDBA toolsSQL parsing
0 likes · 15 min read
Unlocking MySQL: How SQL Parsing Works and Boosts DBA Efficiency
Meituan Technology Team
Meituan Technology Team
May 17, 2018 · Databases

Understanding MySQL SQL Parsing and Optimization Techniques

The article explains how to extend MySQL’s built‑in lexer and Bison‑based parser to expose table names, query features, and optimization advice via a simple language‑agnostic service, illustrating core data structures, useless‑condition elimination, feature generation for slow‑query analysis, and practical learning tips.

DBASQL parsingcompiler
0 likes · 15 min read
Understanding MySQL SQL Parsing and Optimization Techniques
Architects' Tech Alliance
Architects' Tech Alliance
Apr 1, 2018 · Databases

Understanding Oracle AWR Reports and Key Performance Metrics

The article explains how to generate and interpret Oracle Automatic Workload Repository (AWR) reports, detailing key sections such as DB Time, Cache Sizes, Load Profile, parsing behavior, instance efficiency percentages, shared pool statistics, and top wait events to diagnose database performance issues.

AWRDatabase PerformanceInstance Efficiency
0 likes · 16 min read
Understanding Oracle AWR Reports and Key Performance Metrics
UCloud Tech
UCloud Tech
Jan 19, 2018 · Databases

From Middleware to Distributed Database: UDDB’s Evolution and Architecture

The article outlines UDDB’s roadmap from a MySQL‑compatible middleware to a full‑featured distributed database, detailing its three‑stage evolution, system architecture, SQL parsing and routing design, and innovative techniques that enable read‑write separation, vertical sharding, and future horizontal scaling.

MySQL compatibilitySQL parsingdistributed database
0 likes · 15 min read
From Middleware to Distributed Database: UDDB’s Evolution and Architecture
Baidu Waimai Technology Team
Baidu Waimai Technology Team
Mar 23, 2017 · Databases

Design and Implementation of the "Little Boy" Greenplum Optimization and Operations Platform

This article introduces the architecture, key modules, and implementation details of the Little Boy platform, a Greenplum optimization and operations system that parses SQL, applies index and distribution‑key tuning, manages resources, and outlines future enhancements for large‑scale data warehouses.

Big DataDatabase OptimizationGreenplum
0 likes · 15 min read
Design and Implementation of the "Little Boy" Greenplum Optimization and Operations Platform
dbaplus Community
dbaplus Community
Aug 3, 2016 · Databases

How to Build a Minimal Relational Database from Scratch

This article explains the theoretical foundations of relational databases, outlines the essential storage, engine, and UI layers, and walks through a concrete minimal implementation using fixed‑length tables, B+‑tree indexes, simple SQL parsing with regular expressions, and a TCP‑based client interface.

B+TreeRelationalSQL parsing
0 likes · 12 min read
How to Build a Minimal Relational Database from Scratch