Problem
When you try to decode a protocol buffer (protobuf) message containing timestamp data using the from_protobuf()
Apache Spark SQL built-in function, you encounter an error message.
[PROTOBUF_DEPENDENCY_NOT_FOUND] Could not find dependency: google/protobuf/timestamp.proto
Cause
The from_protobuf()
function cannot find the required protobuf dependency google/protobuf/timestamp.proto
because the protobuf descriptor file does not include this dependency when it is created.
Solution
Use the option --include_imports
while creating the protobuf descriptor file, and then use this descriptor file in the from_protobuf()
function.
Example
protoc --descriptor_set_out=sample.desc --include_imports sample.proto
df.select(from_protobuf("value", "AppEvent", sample.desc).alias("event"))
Note
You only need an explicit import for TimestampType
and DayTimeIntervalType
.
Timestamp
is represented as {seconds: Long, nanos: Int}
and maps to the TimestampType in Spark SQL. Duration
maps to DayTimeIntervalType
in Spark SQL.
For more information on data type mapping, please refer to the Spark Protobuf Data Source Guide.