Tag

UDF

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Dec 25, 2024 · Databases

Applying RisingWave to Real-Time Feature Engineering: Architecture, Capabilities, and Use Cases

This article introduces RisingWave, an open‑source streaming database, and explains how its SQL‑based interface, compute‑storage separation, UDF support, and materialized views enable efficient real‑time feature engineering, state management, and diverse downstream applications, including the enhancements in RisingWave 2.0.

Materialized ViewsReal-time Feature EngineeringRisingWave
0 likes · 17 min read
Applying RisingWave to Real-Time Feature Engineering: Architecture, Capabilities, and Use Cases
DaTaobao Tech
DaTaobao Tech
Jul 10, 2024 · Big Data

ODPS Development Guide: Parameters, Built‑in Functions, UDF Creation, and Performance Optimization

This comprehensive ODPS (MaxCompute) development guide serves as a mini‑encyclopedia, detailing common parameter tuning, built‑in SQL functions, step‑by‑step Java UDF creation, job lifecycle insights, and practical performance‑optimization techniques such as parallelism adjustment, map‑join hints, and small‑file mitigation.

Big DataMaxComputeODPS
0 likes · 26 min read
ODPS Development Guide: Parameters, Built‑in Functions, UDF Creation, and Performance Optimization
DataFunSummit
DataFunSummit
Mar 20, 2023 · Backend Development

Unified UDF Implementation on Cloud Platform: Architecture, Features, and Open‑Source Contributions

This article introduces a unified User‑Defined Function (UDF) solution on a cloud data platform, detailing its remote execution architecture, compatibility with Hive UDFs, resource isolation, hot‑update capabilities, internal platform implementation, open‑source contributions to PrestoDB, and future work plans.

Cloud PlatformHiveRemote Execution
0 likes · 11 min read
Unified UDF Implementation on Cloud Platform: Architecture, Features, and Open‑Source Contributions
Top Architect
Top Architect
Mar 20, 2023 · Databases

Two Approaches to Synchronize MySQL Data with Redis Cache

This article explains two methods for keeping MySQL data in sync with a Redis cache—using MySQL triggers with a UDF function and parsing MySQL binlog streams—detailing their processes, advantages, limitations, and related open‑source tools like Canal.

Cache SynchronizationCanalDatabase Replication
0 likes · 7 min read
Two Approaches to Synchronize MySQL Data with Redis Cache
Java Architect Essentials
Java Architect Essentials
Mar 13, 2023 · Databases

Two Approaches to Synchronize MySQL Data with Redis Cache

This article explains two technical solutions for keeping MySQL data in sync with Redis cache—using a MySQL trigger with a UDF function and parsing MySQL binlog events—while also reviewing the Canal open‑source tool and discussing their trade‑offs and implementation details.

Cache SynchronizationCanalDatabase Replication
0 likes · 6 min read
Two Approaches to Synchronize MySQL Data with Redis Cache
DaTaobao Tech
DaTaobao Tech
Apr 27, 2022 · Big Data

Comparative Study of JSON Processing Methods in MaxCompute

The study compares MaxCompute JSON extraction functions—FROM_JSON, get_json_object, and custom JMESPath/JSONPath UDFs—showing simple field extraction with get_json_object is fastest, while complex queries benefit from FROM_JSON or JMESPath, and outlines corresponding JSON generation methods and best‑practice recommendations.

DataEngineeringJMESPathJSON
0 likes · 11 min read
Comparative Study of JSON Processing Methods in MaxCompute
IT Xianyu
IT Xianyu
Jun 2, 2021 · Databases

Two Approaches to Synchronize MySQL Data to Redis Cache: UDF Trigger and Binlog Parsing (Canal)

This article explains two methods for keeping MySQL data in sync with a Redis cache—using MySQL triggers combined with a UDF function and parsing MySQL binlog via Alibaba's Canal—detailing their principles, implementation steps, advantages, limitations, and practical deployment considerations.

Cache SynchronizationCanalDatabase Replication
0 likes · 6 min read
Two Approaches to Synchronize MySQL Data to Redis Cache: UDF Trigger and Binlog Parsing (Canal)
Architecture Digest
Architecture Digest
Mar 18, 2021 · Databases

Two Approaches to Synchronize MySQL Data with Redis Cache: Trigger + UDF and Binlog Parsing (Canal)

The article explains two technical methods for keeping MySQL and Redis in sync—using MySQL triggers with a custom UDF to write directly to Redis, and parsing MySQL binlog streams (or using Alibaba's Canal) to propagate changes, while discussing their suitable scenarios, challenges, and implementation details.

Cache SynchronizationCanalDatabase Replication
0 likes · 5 min read
Two Approaches to Synchronize MySQL Data with Redis Cache: Trigger + UDF and Binlog Parsing (Canal)
Top Architect
Top Architect
Mar 15, 2021 · Databases

Two Approaches to Synchronize MySQL with Redis Cache: UDF Trigger and Binlog Parsing (Canal)

This article explains two technical methods for keeping MySQL data in sync with a Redis cache—using MySQL triggers combined with a UDF function and parsing MySQL binary logs (with Canal)—detailing their workflows, advantages, limitations, and implementation considerations.

Cache SynchronizationCanalDatabase Replication
0 likes · 6 min read
Two Approaches to Synchronize MySQL with Redis Cache: UDF Trigger and Binlog Parsing (Canal)
Didi Tech
Didi Tech
Jan 25, 2021 · Big Data

Migrating Hive SQL to Spark SQL: Design, Implementation, and Performance Evaluation at DiDi

DiDi migrated over 10,000 Hive SQL tasks to Spark SQL using a lightweight dual‑run pipeline that extracts, rewrites, compares, and switches tasks, fixing syntax and UDF differences while adding features such as small‑file merging and enhanced partition pruning, resulting in Spark handling 85 % of workloads with 40 % faster execution, 21 % lower CPU and 49 % lower memory usage.

BigDataDataMigrationHive
0 likes · 18 min read
Migrating Hive SQL to Spark SQL: Design, Implementation, and Performance Evaluation at DiDi
Python Programming Learning Circle
Python Programming Learning Circle
Apr 28, 2020 · Big Data

Multiple Ways to Create New Columns in PySpark DataFrames

This tutorial explains several techniques for adding new columns to PySpark DataFrames—including native Spark functions, user‑defined functions, RDD transformations, Pandas UDFs, and SQL queries—while demonstrating data loading, schema handling, and code examples for each method.

Big DataColumn CreationPySpark
0 likes · 9 min read
Multiple Ways to Create New Columns in PySpark DataFrames
DataFunTalk
DataFunTalk
Dec 24, 2019 · Big Data

Deep Dive into PySpark Implementation: Multi‑Process Architecture, Java Integration, RDD/SQL Interfaces, Executor Communication, and Pandas UDF

This article explains PySpark's multi‑process architecture, how the Python driver uses Py4J to call Java/Scala APIs, the implementation of RDD and DataFrame interfaces, executor‑side process communication and serialization with Arrow, and the design of Pandas UDFs, while also discussing current limitations and future directions.

ARROWBig DataPySpark
0 likes · 13 min read
Deep Dive into PySpark Implementation: Multi‑Process Architecture, Java Integration, RDD/SQL Interfaces, Executor Communication, and Pandas UDF
Aikesheng Open Source Community
Aikesheng Open Source Community
May 22, 2019 · Databases

Understanding MySQL Group Replication Communication Protocol and New UDF Functions in 8.0.16

The article explains MySQL Group Replication 8.0.16's new segmented communication protocol, version‑compatibility rules for adding members, and introduces two UDFs that allow administrators to set and query the protocol version to maintain high‑availability across mixed‑version clusters.

Database VersioningGroup ReplicationMySQL
0 likes · 6 min read
Understanding MySQL Group Replication Communication Protocol and New UDF Functions in 8.0.16
Liulishuo Tech Team
Liulishuo Tech Team
Sep 24, 2016 · Backend Development

Developing Custom Presto SQL Functions (UDF) with Java Plugins

This tutorial explains how to create, register, and deploy custom scalar, aggregation, and window functions for the Presto distributed query engine using Java annotations, the Presto plugin mechanism, and code examples that illustrate UDF development, plugin packaging, and state handling for aggregation functions.

AggregationJavaPlugin
0 likes · 11 min read
Developing Custom Presto SQL Functions (UDF) with Java Plugins