Hive UDFs

Learn how to create and use a Hive UDF for Databricks.

Written by Adam Pavlacka

Last published at: May 31st, 2022

This article shows how to create a Hive UDF, register it in Spark, and use it in a Spark SQL query.

Here is a Hive UDF that takes a long as an argument and returns its hexadecimal representation.

%scala

import org.apache.hadoop.hive.ql.exec.UDF
import org.apache.hadoop.io.LongWritable

// This UDF takes a long integer and converts it to a hexadecimal string.

class ToHex extends UDF {
  def evaluate(n: LongWritable): String = {
    Option(n)
    .map { num =>
        // Use Scala string interpolation. It's the easiest way, and it's
        // type-safe, unlike String.format().
        f"0x${num.get}%x"
    }
    .getOrElse("")
  }
}

Register the function:

%scala

spark.sql("CREATE TEMPORARY FUNCTION to_hex AS 'com.ardentex.spark.hiveudf.ToHex'")

Use your function as any other registered function:

%scala

spark.sql("SELECT first_name, to_hex(code) as hex_code FROM people")

You can find more examples and compilable code at the Sample Hive UDF project.