Big Data 13 min read

Registering UDF, UDTF, and UDAF Functions in Apache Flink – Common Pitfalls and Solutions

This article explains how to register scalar UDFs, table‑valued UDTFs, and aggregate UDAFs in Apache Flink, illustrates typical compilation and runtime pitfalls with concrete Scala code examples, and provides corrected implementations and best‑practice tips for reliable function registration.

Big Data Technology & Architecture

Jul 20, 2019

Registering UDF, UDTF, and UDAF Functions in Apache Flink – Common Pitfalls and Solutions

The author discovered several subtle but frequent problems when updating user‑defined functions (UDFs) in Apache Flink and shares a detailed guide to avoid them.

1. Registering scalar UDFs – Flink’s TableEnvironment.registerFunction is used to register a ScalarFunction. The method signature is shown below:

/**
 * Registers a [[ScalarFunction]] under a unique name. Replaces already existing
 * user-defined functions under this name.
 */
 def registerFunction(name: String, function: ScalarFunction): Unit = {
   // check if class could be instantiated
   checkForInstantiation(function.getClass)
   // register in Table API
   functionCatalog.registerFunction(name, function.getClass)
   // register in SQL API
   functionCatalog.registerSqlFunction(
     createScalarSqlFunction(name, name, function, typeFactory)
   )
 }

Example usage:

tableEnv.registerFunction("hashCode", new HashCode())
myTable.select("item,item.hashCode(),hashCode(item)")
val hcTest = tableEnv.sqlQuery("select item,hashCode(item) from myTable")

2. Registering UDTF functions – Table‑valued functions require a generic type signature. The registration methods in StreamTableEnvironment and TableEnvironment are illustrated, and a sample Split UDTF is provided:

class Split(separator: String) extends TableFunction[(String, Int)] {
  def eval(str: String): Unit = {
    str.split(separator).foreach(x => collect(x, x.length))
  }
}

Registration example:

// register udtf
val split = new Split(",")
val dslTable = mySplit.join(split('a) as ('word,'length)).select('a,'word,'length)
val sqlJoin = tableEnv.sqlQuery("select a,item,counts from mySplit, LATERAL TABLE(split(a)) as T(item,counts)")

3. Registering UDAF functions – The article defines an accumulator class WeightedAvgAccum and an aggregate function WeightedAvg. Initial implementations caused runtime NullFieldException because of mismatched generic types and tuple definitions.

class WeightedAvgAccum extends JTuple2[JLong, JInteger] {
  var sum = 0L
  var count = 0
}

class WeightedAvg extends AggregateFunction[JLong, WeightedAvgAccum] {
  override def createAccumulator(): WeightedAvgAccum = new WeightedAvgAccum
  override def getValue(acc: WeightedAvgAccum): JLong = {
    if (acc.count == 0) null else acc.sum / acc.count
  }
  def accumulate(acc: WeightedAvgAccum, iValue: JLong, iWeight: JInteger): Unit = {
    acc.sum += iValue * iWeight
    acc.count += iWeight
  }
  // other methods omitted for brevity
}

After correcting the accumulator to a plain class and aligning the generic parameters, the function registers and runs correctly.

4. Common error analysis – The article shows a Flink job failure log caused by a null field in a tuple, explains that the mismatch between the declared generic types ( Object, Object) and the actual accumulator fields leads to serialization errors, and demonstrates the proper way to specify concrete types.

5. Summary – Understanding Flink’s registration APIs, respecting generic type constraints, and providing accurate accumulator definitions are essential to avoid compilation and runtime issues when working with UDF, UDTF, and UDAF in production pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data stream processing Apache Flink UDF Scala UDAF UDTF

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.