Tagged articles

UDF

29 articles · Page 1 of 1

Aug 1, 2025 · Mobile Development

Revamping a Mobile Video Editor: MVVM + UDF Architecture and Redo/Undo Design

This article details the comprehensive redesign of a mobile video‑editing page, covering background challenges, requirement analysis, MVVM + UDF architectural choices, module decomposition, dependency‑injection implementation, and the design of a robust Redo/Undo system to improve maintainability and user experience.

Dependency InjectionMVVMUDF

0 likes · 19 min read

Revamping a Mobile Video Editor: MVVM + UDF Architecture and Redo/Undo Design

DataFunTalk

Dec 25, 2024 · Databases

Applying RisingWave to Real-Time Feature Engineering: Architecture, Capabilities, and Use Cases

This article introduces RisingWave, an open‑source streaming database, and explains how its SQL‑based interface, compute‑storage separation, UDF support, and materialized views enable efficient real‑time feature engineering, state management, and diverse downstream applications, including the enhancements in RisingWave 2.0.

Materialized ViewsReal-time Feature EngineeringRisingWave

0 likes · 17 min read

Applying RisingWave to Real-Time Feature Engineering: Architecture, Capabilities, and Use Cases

DaTaobao Tech

Jul 10, 2024 · Big Data

ODPS Development Guide: Parameters, Built‑in Functions, UDF Creation, and Performance Optimization

This comprehensive ODPS (MaxCompute) development guide serves as a mini‑encyclopedia, detailing common parameter tuning, built‑in SQL functions, step‑by‑step Java UDF creation, job lifecycle insights, and practical performance‑optimization techniques such as parallelism adjustment, map‑join hints, and small‑file mitigation.

MaxComputeODPSSQL

0 likes · 26 min read

ODPS Development Guide: Parameters, Built‑in Functions, UDF Creation, and Performance Optimization

DataFunSummit

Mar 20, 2023 · Backend Development

Unified UDF Implementation on Cloud Platform: Architecture, Features, and Open‑Source Contributions

This article introduces a unified User‑Defined Function (UDF) solution on a cloud data platform, detailing its remote execution architecture, compatibility with Hive UDFs, resource isolation, hot‑update capabilities, internal platform implementation, open‑source contributions to PrestoDB, and future work plans.

HiveServerlessUDF

0 likes · 11 min read

Unified UDF Implementation on Cloud Platform: Architecture, Features, and Open‑Source Contributions

Top Architect

Mar 20, 2023 · Databases

Two Approaches to Synchronize MySQL Data with Redis Cache

This article explains two methods for keeping MySQL data in sync with a Redis cache—using MySQL triggers with a UDF function and parsing MySQL binlog streams—detailing their processes, advantages, limitations, and related open‑source tools like Canal.

CanalUDFcache synchronization

0 likes · 7 min read

Two Approaches to Synchronize MySQL Data with Redis Cache

Java Architect Essentials

Mar 13, 2023 · Databases

Two Approaches to Synchronize MySQL Data with Redis Cache

This article explains two technical solutions for keeping MySQL data in sync with Redis cache—using a MySQL trigger with a UDF function and parsing MySQL binlog events—while also reviewing the Canal open‑source tool and discussing their trade‑offs and implementation details.

BinlogCanalRedis

0 likes · 6 min read

Java High-Performance Architecture

Aug 18, 2022 · Databases

How to Sync MySQL Data to Redis: Triggers, UDFs, and Binlog Parsing with Canal

This article compares two approaches for keeping MySQL and Redis in sync—using MySQL triggers combined with a UDF to write directly to Redis, and parsing MySQL binlogs (e.g., with Alibaba's Canal) to propagate changes—detailing their workflows, advantages, limitations, and implementation details.

BinlogCanalData synchronization

0 likes · 7 min read

How to Sync MySQL Data to Redis: Triggers, UDFs, and Binlog Parsing with Canal

ITPUB

Jul 1, 2022 · Databases

What’s New in Apache IoTDB? Exploring the Latest Features for Industrial IoT

This article introduces Apache IoTDB, an open‑source time‑series database for industrial IoT, outlines its recent feature releases, explains its data‑modeling and compression strategies, and discusses UDF, trigger, and quality‑control capabilities that guide technical selection and architecture design.

Apache IoTDBBig DataIndustrial IoT

0 likes · 12 min read

What’s New in Apache IoTDB? Exploring the Latest Features for Industrial IoT

DaTaobao Tech

Apr 27, 2022 · Big Data

Comparative Study of JSON Processing Methods in MaxCompute

The study compares MaxCompute JSON extraction functions—FROM_JSON, get_json_object, and custom JMESPath/JSONPath UDFs—showing simple field extraction with get_json_object is fastest, while complex queries benefit from FROM_JSON or JMESPath, and outlines corresponding JSON generation methods and best‑practice recommendations.

DataEngineeringJMESPathMaxCompute

0 likes · 11 min read

Comparative Study of JSON Processing Methods in MaxCompute

Big Data Technology & Architecture

Jan 12, 2022 · Big Data

Common Production Issues and Troubleshooting Guide for Apache Flink

This article compiles a comprehensive list of common production problems encountered with Apache Flink, covering cluster sizing, checkpoint failures, backpressure analysis, resource allocation, deployment errors, UDF definitions, data skew, Kafka configurations, and provides detailed troubleshooting steps and best‑practice recommendations.

Apache FlinkCheckpointProduction troubleshooting

0 likes · 39 min read

Common Production Issues and Troubleshooting Guide for Apache Flink

Big Data Technology & Architecture

Dec 28, 2021 · Big Data

Comprehensive Guide to Spark SQL: Concepts, DataSet/DataFrame, Functions, Optimization and Common Pitfalls

This article provides an in‑depth overview of Spark SQL, covering its architecture, DataSet/DataFrame creation, DSL and SQL usage, integration with Hive, custom UDF/UDAF/Aggregator implementations, handling of small files, Cartesian product detection, and a catalog of useful built‑in functions and window operations.

Big DataHiveSpark SQL

0 likes · 29 min read

Comprehensive Guide to Spark SQL: Concepts, DataSet/DataFrame, Functions, Optimization and Common Pitfalls

Big Data Technology & Architecture

Sep 11, 2021 · Big Data

Deep Dive into Flink Table & SQL Window Functions, UDFs, and Hive Integration

This article provides a comprehensive guide to Flink Table and SQL window semantics—including group, tumbling, sliding, and session windows—covers over windows, demonstrates how to define windows in SQL, explains built‑in functions, shows how to implement scalar, table, aggregate and table‑aggregate UDFs, and details Flink's integration with Hive, complete with Maven dependencies and runnable examples.

FlinkHive IntegrationSQL

0 likes · 27 min read

Deep Dive into Flink Table & SQL Window Functions, UDFs, and Hive Integration

Big Data Technology & Architecture

Aug 15, 2021 · Big Data

Spark SQL Interview Guide: Concepts, APIs, Optimization and Common Pitfalls

This article provides a comprehensive overview of Spark SQL, covering its architecture, DataSet/DataFrame APIs, code examples for creating and querying datasets, join strategy selection, handling Hive tables, small‑file issues, inefficient NOT‑IN subqueries, Cartesian products, and a catalog of useful built‑in functions.

Hive IntegrationPerformance OptimizationSpark SQL

0 likes · 40 min read

Spark SQL Interview Guide: Concepts, APIs, Optimization and Common Pitfalls

IT Xianyu

Jun 2, 2021 · Databases

Two Approaches to Synchronize MySQL Data to Redis Cache: UDF Trigger and Binlog Parsing (Canal)

This article explains two methods for keeping MySQL data in sync with a Redis cache—using MySQL triggers combined with a UDF function and parsing MySQL binlog via Alibaba's Canal—detailing their principles, implementation steps, advantages, limitations, and practical deployment considerations.

BinlogCanalUDF

0 likes · 6 min read

Two Approaches to Synchronize MySQL Data to Redis Cache: UDF Trigger and Binlog Parsing (Canal)

Architecture Digest

Mar 18, 2021 · Databases

Two Approaches to Synchronize MySQL Data with Redis Cache: Trigger + UDF and Binlog Parsing (Canal)

The article explains two technical methods for keeping MySQL and Redis in sync—using MySQL triggers with a custom UDF to write directly to Redis, and parsing MySQL binlog streams (or using Alibaba's Canal) to propagate changes, while discussing their suitable scenarios, challenges, and implementation details.

BinlogCanalUDF

0 likes · 5 min read

Two Approaches to Synchronize MySQL Data with Redis Cache: Trigger + UDF and Binlog Parsing (Canal)

Top Architect

Mar 15, 2021 · Databases

Two Approaches to Synchronize MySQL with Redis Cache: UDF Trigger and Binlog Parsing (Canal)

This article explains two technical methods for keeping MySQL data in sync with a Redis cache—using MySQL triggers combined with a UDF function and parsing MySQL binary logs (with Canal)—detailing their workflows, advantages, limitations, and implementation considerations.

BinlogCanalRedis

0 likes · 6 min read

Two Approaches to Synchronize MySQL with Redis Cache: UDF Trigger and Binlog Parsing (Canal)

Big Data Technology & Architecture

Feb 7, 2021 · Big Data

Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases

This guide walks through setting up Apache Zeppelin as a low‑cost, SQL‑centric development platform for Flink, covering environment preparation, installation, interpreter configuration, execution modes, verification, common pitfalls, dimension‑table joins, custom UDFs, Redis integration, and dual‑stream join techniques.

FlinkRedisSQL

0 likes · 24 min read

Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases

Didi Tech

Jan 25, 2021 · Big Data

Migrating Hive SQL to Spark SQL: Design, Implementation, and Performance Evaluation at DiDi

DiDi migrated over 10,000 Hive SQL tasks to Spark SQL using a lightweight dual‑run pipeline that extracts, rewrites, compares, and switches tasks, fixing syntax and UDF differences while adding features such as small‑file merging and enhanced partition pruning, resulting in Spark handling 85 % of workloads with 40 % faster execution, 21 % lower CPU and 49 % lower memory usage.

DataMigrationHiveSQLOptimization

0 likes · 18 min read

Migrating Hive SQL to Spark SQL: Design, Implementation, and Performance Evaluation at DiDi

Huolala Tech

Aug 4, 2020 · Big Data

How to Accelerate Hive UDFs by Caching Large Geo Data: A 140× Speed Boost

To dramatically improve Hive UDF performance when converting coordinates to administrative districts, this article compares two implementation strategies, details the technical challenges of repeatedly loading a 157 MB Geo data file, and presents a static‑cached solution that reduces query time from seconds to milliseconds, achieving roughly a 140‑fold speed increase.

HivePerformance OptimizationStatic Caching

0 likes · 15 min read

How to Accelerate Hive UDFs by Caching Large Geo Data: A 140× Speed Boost

Big Data Technology & Architecture

Jul 8, 2020 · Big Data

Using Spark SQL User-Defined Functions, Aggregate Functions, and Window Functions

This article demonstrates how to create and register custom scalar UDFs, untyped and type‑safe aggregate functions (UDAF and Aggregator) in Spark SQL, and how to apply window functions such as ROW_NUMBER, providing complete Scala code examples and execution results.

AggregatorBig DataSQL

0 likes · 16 min read

Using Spark SQL User-Defined Functions, Aggregate Functions, and Window Functions

Python Programming Learning Circle

Apr 28, 2020 · Big Data

Multiple Ways to Create New Columns in PySpark DataFrames

This tutorial explains several techniques for adding new columns to PySpark DataFrames—including native Spark functions, user‑defined functions, RDD transformations, Pandas UDFs, and SQL queries—while demonstrating data loading, schema handling, and code examples for each method.

Big DataColumn CreationPySpark

0 likes · 9 min read

Multiple Ways to Create New Columns in PySpark DataFrames

Big Data Technology & Architecture

Jan 17, 2020 · Big Data

Overview and Design of Google’s F1 Query: A Scalable Enterprise Data Processing System

The article reviews Google’s F1 Query paper, describing its architecture, three execution modes, data source handling, extensibility features such as UDF servers and TVFs, and performance optimizations that enable a unified, enterprise‑wide SQL engine for heterogeneous big‑data workloads.

Data PartitioningF1 QuerySQL

0 likes · 23 min read

Overview and Design of Google’s F1 Query: A Scalable Enterprise Data Processing System

DataFunTalk

Dec 24, 2019 · Big Data

Deep Dive into PySpark Implementation: Multi‑Process Architecture, Java Integration, RDD/SQL Interfaces, Executor Communication, and Pandas UDF

This article explains PySpark's multi‑process architecture, how the Python driver uses Py4J to call Java/Scala APIs, the implementation of RDD and DataFrame interfaces, executor‑side process communication and serialization with Arrow, and the design of Pandas UDFs, while also discussing current limitations and future directions.

ArrowBig DataDistributed Computing

0 likes · 13 min read

Deep Dive into PySpark Implementation: Multi‑Process Architecture, Java Integration, RDD/SQL Interfaces, Executor Communication, and Pandas UDF

Big Data Technology & Architecture

Jul 20, 2019 · Big Data

Registering UDF, UDTF, and UDAF Functions in Apache Flink – Common Pitfalls and Solutions

This article explains how to register scalar UDFs, table‑valued UDTFs, and aggregate UDAFs in Apache Flink, illustrates typical compilation and runtime pitfalls with concrete Scala code examples, and provides corrected implementations and best‑practice tips for reliable function registration.

Apache FlinkBig DataScala

0 likes · 13 min read

Registering UDF, UDTF, and UDAF Functions in Apache Flink – Common Pitfalls and Solutions

Big Data Technology & Architecture

Jun 17, 2019 · Big Data

Understanding Spark SQL: Concepts, Queries, Data Sources, and Practical Examples

This article introduces Spark SQL fundamentals, including its architecture, DataFrame and Dataset abstractions, query methods, interoperability with RDD, user-defined functions, integration with Hive, data source handling, and provides step‑by‑step Scala code examples for loading data, performing aggregations, and solving common analytical tasks.

DataFramesHiveSQL

0 likes · 15 min read

Understanding Spark SQL: Concepts, Queries, Data Sources, and Practical Examples

Aikesheng Open Source Community

May 22, 2019 · Databases

Understanding MySQL Group Replication Communication Protocol and New UDF Functions in 8.0.16

The article explains MySQL Group Replication 8.0.16's new segmented communication protocol, version‑compatibility rules for adding members, and introduces two UDFs that allow administrators to set and query the protocol version to maintain high‑availability across mixed‑version clusters.

Communication ProtocolDatabase VersioningGroup Replication

0 likes · 6 min read

Understanding MySQL Group Replication Communication Protocol and New UDF Functions in 8.0.16

dbaplus Community

Oct 11, 2017 · Databases

Master MySQL Advanced Features: Partitioning, Views, Stored Procedures, and More

This article explores MySQL’s advanced features—including partition tables, views, stored procedures, triggers, foreign key constraints, bind variables, user‑defined functions, and character set considerations—explaining their principles, usage patterns, performance implications, and practical tips for large‑scale data scenarios.

Character SetStored ProceduresTriggers

0 likes · 35 min read

Master MySQL Advanced Features: Partitioning, Views, Stored Procedures, and More

dbaplus Community

Sep 26, 2017 · Big Data

How to Avoid Common Spark SQL Pitfalls and Boost Performance

This article shares a comprehensive set of practical tips and solutions for common Spark SQL issues—including out‑of‑memory errors, UDF‑induced GC, thread blocking, system‑property initialization, speculation side‑effects, accumulator traps, concurrent job scheduling, and excessive logging—helping engineers improve stability and efficiency of their Spark‑based financial systems.

AccumulatorMemory ManagementPerformance Tuning

0 likes · 15 min read

How to Avoid Common Spark SQL Pitfalls and Boost Performance

Liulishuo Tech Team

Sep 24, 2016 · Backend Development

Developing Custom Presto SQL Functions (UDF) with Java Plugins

This tutorial explains how to create, register, and deploy custom scalar, aggregation, and window functions for the Presto distributed query engine using Java annotations, the Presto plugin mechanism, and code examples that illustrate UDF development, plugin packaging, and state handling for aggregation functions.

AggregationJavaPlugin

0 likes · 11 min read

Developing Custom Presto SQL Functions (UDF) with Java Plugins