Column values assigning in the order they are passed into Row() as arguments, not to the column name indicated

Create the DataFrame from a list of dictionaries or use the row.toDict() method.

Written by Raghavan Vaidhyaraman

Last published at: April 28th, 2025

Problem

When creating a DataFrame using Row(), you pass in arguments defining the first and second column names and values in any order. In the output, you notice the column values are assigned in the order they were passed in, not to the column you indicated.

 

Example code

from pyspark.sql import Row

row1 = Row(FirstColumn=1, SecondColumn=2)
row2 = Row(SecondColumn=3, FirstColumn=4)

df = spark.createDataFrame([row1, row2])
df.show()

 

Expected output

FirstColumn

SecondColumn

1

2

4

3

 

Actual output

FirstColumn

SecondColumn

1

2

3

4

 

Cause

When you create a DataFrame using Row()  with named arguments, it inherits a tuple instead of a dictionary, so input argument mapping does not occur.

 

Solution

Create the DataFrame from a list of dictionaries, or use the row.toDict() method.

 

To create the DataFrame from a list of dictionaries, adapt the following example code.

data = [
    {"FirstColumn": 1, "SecondColumn": 2},
    {"SecondColumn": 3, "FirstColumn": 4}
]

df2 = spark.createDataFrame(data)
df2.show()

 

Alternatively, to use the row.toDict() method, adapt the following example code. 

row1 = Row(FirstColumn=1, SecondColumn=2)
row2 = Row(SecondColumn=3, FirstColumn=4)

row1_dict = row1.asDict()
row2_dict = row2.asDict()

df = spark.createDataFrame([row1_dict, row2_dict])
df.show()