Problem
You use the melt()
function with a wildcard as the `values`
argument in Databricks Runtime 14.3 LTS. The following code provides an example.
```python
from pyspark.sql.types import *
dfschema = StructType([StructField('id', LongType(), False), StructField('json_column', StructType([StructField('a', IntegerType(), True), StructField('b', IntegerType(), True), StructField('c', IntegerType(), True)]), True)])
df = spark.createDataFrame([
(1, {"a":-1, "b":-2,"c":-3}),
(2, {"a":-4, "c":6}),
(3, {"b":1, "c":2})
], schema=dfschema)
melted_df = df.melt("id", "json_column.*", "Key", "Value")
display(melted_df)
```
You then receive the following error.
[FIELD_NOT_FOUND] No such struct field `*` in `a`, `b`, `c`. SQLSTATE: 42704
Using a wildcard should mean you do not need to list columns, but you may notice you must explicitly list all JSON column objects. Further, you may remember you did not encounter this issue in Databricks Runtime 13.3 LTS or below.
Cause
There is change in the implementation of the melt()
function between Databricks Runtime versions 13.3 LTS and 14.3 LTS.
Solution
To resolve this issue, use Databricks Runtime version 15.4 LTS or above.
If you need to continue using Databricks Runtime 14.3 LTS, list all the columns explicitly when using the melt()
function. The following Python code provides an example.
melted_df = df.melt("id", ["json_column.a", "json_column.b", "json_column.c"], "Key", "Value")
For more information on the melt()
function, review the Apache Spark pyspark.sql.DataFrame.melt documentation.