Problem
You are selecting columns from a DataFrame and you get an error message.
ERROR: AttributeError: 'function' object has no attribute '_get_object_id' in job
Cause
The DataFrame API contains a small number of protected keywords.
If a column in your DataFrame uses a protected keyword as the column name, you will get an error message.
For example, summary is a protected keyword. If you use summary as a column name, you will see the error message.
This sample code uses summary as a column name and generates the error message when run.
%python df=spark.createDataFrame([1,2], "int").toDF("id") df.show() from pyspark.sql.types import StructType,StructField, StringType, IntegerType df1 = spark.createDataFrame( [(10,), (11,), (13,)], StructType([StructField("summary", IntegerType(), True)])) df1.show() ResultDf = df1.join(df, df1.summary == df.id, "inner").select(df.id,df1.summary) ResultDf.show()
Solution
You should not use DataFrame API protected keywords as column names.
If you must use protected keywords, you should use bracket based column access when selecting columns from a DataFrame. Do not use dot notation when selecting columns that use protected keywords.
%python ResultDf = df1.join(df, df1["summary"] == df.id, "inner").select(df.id,df1["summary"])