This article shows how to create a Hive UDF, register it in Spark, and use it in a Spark SQL query.
Here is a Hive UDF that takes a long as an argument and returns its hexadecimal representation.
%scala import org.apache.hadoop.hive.ql.exec.UDF import org.apache.hadoop.io.LongWritable // This UDF takes a long integer and converts it to a hexadecimal string. class ToHex extends UDF { def evaluate(n: LongWritable): String = { Option(n) .map { num => // Use Scala string interpolation. It's the easiest way, and it's // type-safe, unlike String.format(). f"0x${num.get}%x" } .getOrElse("") } }
Register the function:
%scala spark.sql("CREATE TEMPORARY FUNCTION to_hex AS 'com.ardentex.spark.hiveudf.ToHex'")
Use your function as any other registered function:
%scala spark.sql("SELECT first_name, to_hex(code) as hex_code FROM people")
You can find more examples and compilable code at the Sample Hive UDF project.