Tagged articles
29 articles
Page 1 of 1
Bilibili Tech
Bilibili Tech
Aug 1, 2025 · Mobile Development

Revamping a Mobile Video Editor: MVVM + UDF Architecture and Redo/Undo Design

This article details the comprehensive redesign of a mobile video‑editing page, covering background challenges, requirement analysis, MVVM + UDF architectural choices, module decomposition, dependency‑injection implementation, and the design of a robust Redo/Undo system to improve maintainability and user experience.

MVVMUDFarchitecture
0 likes · 19 min read
Revamping a Mobile Video Editor: MVVM + UDF Architecture and Redo/Undo Design
DataFunTalk
DataFunTalk
Dec 25, 2024 · Databases

Applying RisingWave to Real-Time Feature Engineering: Architecture, Capabilities, and Use Cases

This article introduces RisingWave, an open‑source streaming database, and explains how its SQL‑based interface, compute‑storage separation, UDF support, and materialized views enable efficient real‑time feature engineering, state management, and diverse downstream applications, including the enhancements in RisingWave 2.0.

Materialized ViewsReal-time Feature EngineeringRisingWave
0 likes · 17 min read
Applying RisingWave to Real-Time Feature Engineering: Architecture, Capabilities, and Use Cases
DaTaobao Tech
DaTaobao Tech
Jul 10, 2024 · Big Data

ODPS Development Guide: Parameters, Built‑in Functions, UDF Creation, and Performance Optimization

This comprehensive ODPS (MaxCompute) development guide serves as a mini‑encyclopedia, detailing common parameter tuning, built‑in SQL functions, step‑by‑step Java UDF creation, job lifecycle insights, and practical performance‑optimization techniques such as parallelism adjustment, map‑join hints, and small‑file mitigation.

MaxComputeODPSSQL
0 likes · 26 min read
ODPS Development Guide: Parameters, Built‑in Functions, UDF Creation, and Performance Optimization
DataFunSummit
DataFunSummit
Mar 20, 2023 · Backend Development

Unified UDF Implementation on Cloud Platform: Architecture, Features, and Open‑Source Contributions

This article introduces a unified User‑Defined Function (UDF) solution on a cloud data platform, detailing its remote execution architecture, compatibility with Hive UDFs, resource isolation, hot‑update capabilities, internal platform implementation, open‑source contributions to PrestoDB, and future work plans.

HivePrestoServerless
0 likes · 11 min read
Unified UDF Implementation on Cloud Platform: Architecture, Features, and Open‑Source Contributions
Top Architect
Top Architect
Mar 20, 2023 · Databases

Two Approaches to Synchronize MySQL Data with Redis Cache

This article explains two methods for keeping MySQL data in sync with a Redis cache—using MySQL triggers with a UDF function and parsing MySQL binlog streams—detailing their processes, advantages, limitations, and related open‑source tools like Canal.

CanalDatabase ReplicationUDF
0 likes · 7 min read
Two Approaches to Synchronize MySQL Data with Redis Cache
Java Architect Essentials
Java Architect Essentials
Mar 13, 2023 · Databases

Two Approaches to Synchronize MySQL Data with Redis Cache

This article explains two technical solutions for keeping MySQL data in sync with Redis cache—using a MySQL trigger with a UDF function and parsing MySQL binlog events—while also reviewing the Canal open‑source tool and discussing their trade‑offs and implementation details.

BinlogCanalDatabase Replication
0 likes · 6 min read
Two Approaches to Synchronize MySQL Data with Redis Cache
ITPUB
ITPUB
Jul 1, 2022 · Databases

What’s New in Apache IoTDB? Exploring the Latest Features for Industrial IoT

This article introduces Apache IoTDB, an open‑source time‑series database for industrial IoT, outlines its recent feature releases, explains its data‑modeling and compression strategies, and discusses UDF, trigger, and quality‑control capabilities that guide technical selection and architecture design.

Apache IoTDBBig DataIndustrial IoT
0 likes · 12 min read
What’s New in Apache IoTDB? Exploring the Latest Features for Industrial IoT
DaTaobao Tech
DaTaobao Tech
Apr 27, 2022 · Big Data

Comparative Study of JSON Processing Methods in MaxCompute

The study compares MaxCompute JSON extraction functions—FROM_JSON, get_json_object, and custom JMESPath/JSONPath UDFs—showing simple field extraction with get_json_object is fastest, while complex queries benefit from FROM_JSON or JMESPath, and outlines corresponding JSON generation methods and best‑practice recommendations.

DataEngineeringJMESPathJSON
0 likes · 11 min read
Comparative Study of JSON Processing Methods in MaxCompute
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 12, 2022 · Big Data

Common Production Issues and Troubleshooting Guide for Apache Flink

This article compiles a comprehensive list of common production problems encountered with Apache Flink, covering cluster sizing, checkpoint failures, backpressure analysis, resource allocation, deployment errors, UDF definitions, data skew, Kafka configurations, and provides detailed troubleshooting steps and best‑practice recommendations.

Apache FlinkCheckpointKafka
0 likes · 39 min read
Common Production Issues and Troubleshooting Guide for Apache Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 28, 2021 · Big Data

Comprehensive Guide to Spark SQL: Concepts, DataSet/DataFrame, Functions, Optimization and Common Pitfalls

This article provides an in‑depth overview of Spark SQL, covering its architecture, DataSet/DataFrame creation, DSL and SQL usage, integration with Hive, custom UDF/UDAF/Aggregator implementations, handling of small files, Cartesian product detection, and a catalog of useful built‑in functions and window operations.

Big DataDatasetHive
0 likes · 29 min read
Comprehensive Guide to Spark SQL: Concepts, DataSet/DataFrame, Functions, Optimization and Common Pitfalls
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 11, 2021 · Big Data

Deep Dive into Flink Table & SQL Window Functions, UDFs, and Hive Integration

This article provides a comprehensive guide to Flink Table and SQL window semantics—including group, tumbling, sliding, and session windows—covers over windows, demonstrates how to define windows in SQL, explains built‑in functions, shows how to implement scalar, table, aggregate and table‑aggregate UDFs, and details Flink's integration with Hive, complete with Maven dependencies and runnable examples.

FlinkHive IntegrationSQL
0 likes · 27 min read
Deep Dive into Flink Table & SQL Window Functions, UDFs, and Hive Integration
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 15, 2021 · Big Data

Spark SQL Interview Guide: Concepts, APIs, Optimization and Common Pitfalls

This article provides a comprehensive overview of Spark SQL, covering its architecture, DataSet/DataFrame APIs, code examples for creating and querying datasets, join strategy selection, handling Hive tables, small‑file issues, inefficient NOT‑IN subqueries, Cartesian products, and a catalog of useful built‑in functions.

DatasetHive IntegrationPerformance Optimization
0 likes · 40 min read
Spark SQL Interview Guide: Concepts, APIs, Optimization and Common Pitfalls
Architecture Digest
Architecture Digest
Mar 18, 2021 · Databases

Two Approaches to Synchronize MySQL Data with Redis Cache: Trigger + UDF and Binlog Parsing (Canal)

The article explains two technical methods for keeping MySQL and Redis in sync—using MySQL triggers with a custom UDF to write directly to Redis, and parsing MySQL binlog streams (or using Alibaba's Canal) to propagate changes, while discussing their suitable scenarios, challenges, and implementation details.

BinlogCanalDatabase Replication
0 likes · 5 min read
Two Approaches to Synchronize MySQL Data with Redis Cache: Trigger + UDF and Binlog Parsing (Canal)
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 7, 2021 · Big Data

Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases

This guide walks through setting up Apache Zeppelin as a low‑cost, SQL‑centric development platform for Flink, covering environment preparation, installation, interpreter configuration, execution modes, verification, common pitfalls, dimension‑table joins, custom UDFs, Redis integration, and dual‑stream join techniques.

FlinkSQLStreaming
0 likes · 24 min read
Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases
Didi Tech
Didi Tech
Jan 25, 2021 · Big Data

Migrating Hive SQL to Spark SQL: Design, Implementation, and Performance Evaluation at DiDi

DiDi migrated over 10,000 Hive SQL tasks to Spark SQL using a lightweight dual‑run pipeline that extracts, rewrites, compares, and switches tasks, fixing syntax and UDF differences while adding features such as small‑file merging and enhanced partition pruning, resulting in Spark handling 85 % of workloads with 40 % faster execution, 21 % lower CPU and 49 % lower memory usage.

DataMigrationHiveSQLOptimization
0 likes · 18 min read
Migrating Hive SQL to Spark SQL: Design, Implementation, and Performance Evaluation at DiDi
Huolala Tech
Huolala Tech
Aug 4, 2020 · Big Data

How to Accelerate Hive UDFs by Caching Large Geo Data: A 140× Speed Boost

To dramatically improve Hive UDF performance when converting coordinates to administrative districts, this article compares two implementation strategies, details the technical challenges of repeatedly loading a 157 MB Geo data file, and presents a static‑cached solution that reduces query time from seconds to milliseconds, achieving roughly a 140‑fold speed increase.

HivePerformance OptimizationStatic Caching
0 likes · 15 min read
How to Accelerate Hive UDFs by Caching Large Geo Data: A 140× Speed Boost
DataFunTalk
DataFunTalk
Dec 24, 2019 · Big Data

Deep Dive into PySpark Implementation: Multi‑Process Architecture, Java Integration, RDD/SQL Interfaces, Executor Communication, and Pandas UDF

This article explains PySpark's multi‑process architecture, how the Python driver uses Py4J to call Java/Scala APIs, the implementation of RDD and DataFrame interfaces, executor‑side process communication and serialization with Arrow, and the design of Pandas UDFs, while also discussing current limitations and future directions.

ArrowBig DataPySpark
0 likes · 13 min read
Deep Dive into PySpark Implementation: Multi‑Process Architecture, Java Integration, RDD/SQL Interfaces, Executor Communication, and Pandas UDF
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 17, 2019 · Big Data

Understanding Spark SQL: Concepts, Queries, Data Sources, and Practical Examples

This article introduces Spark SQL fundamentals, including its architecture, DataFrame and Dataset abstractions, query methods, interoperability with RDD, user-defined functions, integration with Hive, data source handling, and provides step‑by‑step Scala code examples for loading data, performing aggregations, and solving common analytical tasks.

DataFramesHiveSQL
0 likes · 15 min read
Understanding Spark SQL: Concepts, Queries, Data Sources, and Practical Examples
Aikesheng Open Source Community
Aikesheng Open Source Community
May 22, 2019 · Databases

Understanding MySQL Group Replication Communication Protocol and New UDF Functions in 8.0.16

The article explains MySQL Group Replication 8.0.16's new segmented communication protocol, version‑compatibility rules for adding members, and introduces two UDFs that allow administrators to set and query the protocol version to maintain high‑availability across mixed‑version clusters.

Communication ProtocolDatabase VersioningGroup Replication
0 likes · 6 min read
Understanding MySQL Group Replication Communication Protocol and New UDF Functions in 8.0.16
dbaplus Community
dbaplus Community
Oct 11, 2017 · Databases

Master MySQL Advanced Features: Partitioning, Views, Stored Procedures, and More

This article explores MySQL’s advanced features—including partition tables, views, stored procedures, triggers, foreign key constraints, bind variables, user‑defined functions, and character set considerations—explaining their principles, usage patterns, performance implications, and practical tips for large‑scale data scenarios.

Character SetPartitioningStored Procedures
0 likes · 35 min read
Master MySQL Advanced Features: Partitioning, Views, Stored Procedures, and More
dbaplus Community
dbaplus Community
Sep 26, 2017 · Big Data

How to Avoid Common Spark SQL Pitfalls and Boost Performance

This article shares a comprehensive set of practical tips and solutions for common Spark SQL issues—including out‑of‑memory errors, UDF‑induced GC, thread blocking, system‑property initialization, speculation side‑effects, accumulator traps, concurrent job scheduling, and excessive logging—helping engineers improve stability and efficiency of their Spark‑based financial systems.

AccumulatorMemory ManagementSpark
0 likes · 15 min read
How to Avoid Common Spark SQL Pitfalls and Boost Performance
Liulishuo Tech Team
Liulishuo Tech Team
Sep 24, 2016 · Backend Development

Developing Custom Presto SQL Functions (UDF) with Java Plugins

This tutorial explains how to create, register, and deploy custom scalar, aggregation, and window functions for the Presto distributed query engine using Java annotations, the Presto plugin mechanism, and code examples that illustrate UDF development, plugin packaging, and state handling for aggregation functions.

JavaPrestoSQL
0 likes · 11 min read
Developing Custom Presto SQL Functions (UDF) with Java Plugins