Problem
When using the SKEW
hint in Apache Spark, the query fails with the following casting error.
Py4JJavaError: An error occurred while calling t.addCustomDisplayData.
: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
You notice if you remove the SKEW hint, the query runs successfully.
Cause
The SKEW
hint contains a mix of different numeric types (for example, 32-bit INT
and 64-bit LONG
values).
Internally, Spark builds an IN
list from the skew values. First, it infers the type from the first value in the list. Then all subsequent values are cast to that type. If a wider type (LONG
) is forced into a narrower type (INT
), Spark throws a ClassCastException
.
Example
A source column (user_id
) has type DECIMAL(38,0)
. Some values are small enough to fit in an INT
, while others require a LONG
.
- 32-bit only (works):
729873934, 29445284, 448594252
- 64-bit only (works):
1125900220373621, 1125900222752227
- Mixed (fails):
729873934, 1125900220373621
Solution
Ensure all skew hint values are of the same type. You have two options.
Option 1
Add an L
suffix to make the values all LONG
(64-bit). The following code provides an example.
SELECT /*+ SKEW(user_id, 729873934L, 1125900218435903L) */ ...
Option 2
Keep all values as INT
if they fit within 32-bit limits.