Trying to decode a protocol buffer and getting error [PROTOBUF_DEPENDENCY_NOT_FOUND]

Use the option --include_imports while creating the protobuf descriptor file, and then use this descriptor file in the from_protobuf() function.

Written by saikrishna.pujari

Last published at: November 4th, 2024

Problem

When you try to decode a protocol buffer (protobuf) message containing timestamp data using the from_protobuf() Apache Spark SQL built-in function, you encounter an error message. 

[PROTOBUF_DEPENDENCY_NOT_FOUND] Could not find dependency: google/protobuf/timestamp.proto

 

Cause

The from_protobuf() function cannot find the required protobuf dependency google/protobuf/timestamp.proto because the protobuf descriptor file does not include this dependency when it is created. 

 

Solution

Use the option --include_imports while creating the protobuf descriptor file, and then use this descriptor file in the from_protobuf() function.

 

Example

protoc --descriptor_set_out=sample.desc --include_imports sample.proto

 

df.select(from_protobuf("value", "AppEvent", sample.desc).alias("event"))

 

Note

You only need an explicit import for TimestampType and DayTimeIntervalType

Timestamp is represented as {seconds: Long, nanos: Int} and maps to the TimestampType in Spark SQL. Duration maps to DayTimeIntervalType in Spark SQL.

 

 

For more information on data type mapping, please refer to the Spark Protobuf Data Source Guide