Tagged articles
6 articles
Page 1 of 1
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 11, 2021 · Big Data

Deep Dive into Flink Table & SQL Window Functions, UDFs, and Hive Integration

This article provides a comprehensive guide to Flink Table and SQL window semantics—including group, tumbling, sliding, and session windows—covers over windows, demonstrates how to define windows in SQL, explains built‑in functions, shows how to implement scalar, table, aggregate and table‑aggregate UDFs, and details Flink's integration with Hive, complete with Maven dependencies and runnable examples.

FlinkHive IntegrationSQL
0 likes · 27 min read
Deep Dive into Flink Table & SQL Window Functions, UDFs, and Hive Integration
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 15, 2021 · Big Data

Spark SQL Interview Guide: Concepts, APIs, Optimization and Common Pitfalls

This article provides a comprehensive overview of Spark SQL, covering its architecture, DataSet/DataFrame APIs, code examples for creating and querying datasets, join strategy selection, handling Hive tables, small‑file issues, inefficient NOT‑IN subqueries, Cartesian products, and a catalog of useful built‑in functions.

DatasetHive IntegrationPerformance Optimization
0 likes · 40 min read
Spark SQL Interview Guide: Concepts, APIs, Optimization and Common Pitfalls
DataFunTalk
DataFunTalk
Jun 29, 2021 · Big Data

In-depth Analysis of Flink SQL 1.13 Features and Improvements

This article provides a comprehensive overview of Apache Flink SQL 1.13, detailing new Window TVF support, cumulate windows, performance optimizations, time‑zone handling, enhanced Hive compatibility, SQL client upgrades, DataStream‑Table conversion improvements, and outlines the roadmap for the upcoming 1.14 release.

DataStreamFlinkHive Integration
0 likes · 15 min read
In-depth Analysis of Flink SQL 1.13 Features and Improvements
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 23, 2020 · Big Data

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

This article introduces Apache Hudi, explaining its storage types, query views, timeline feature, typical use cases such as near‑real‑time ingestion and incremental pipelines, and provides a step‑by‑step Scala/Spark quick‑start guide with code examples for compiling, inserting, updating, querying, and syncing data to Hive.

Apache HudiBig DataData Lake
0 likes · 18 min read
Apache Hudi Overview, Core Concepts, and Quick‑Start Guide
Big Data Technology Architecture
Big Data Technology Architecture
Feb 12, 2020 · Big Data

Apache Flink 1.10 Release: New Features, Optimizations, and Kubernetes Integration

Apache Flink 1.10 introduces major performance and stability improvements, unified memory configuration, native Kubernetes session mode, enhanced Table API/SQL with production‑ready Hive integration, expanded Python UDF support, and a host of important bug fixes and connector updates, marking the largest community‑driven release to date.

Apache FlinkHive IntegrationKubernetes
0 likes · 17 min read
Apache Flink 1.10 Release: New Features, Optimizations, and Kubernetes Integration