Apache Spark job output only giving the first JSON object instead of all records

Add appropriate line breaks between each JSON object or use Photon.

Last published at: January 31st, 2025

Problem

In your Apache Spark jobs, you notice some JSON files are processed incorrectly, leading to output containing only the first JSON object instead of all the records in the file.

Cause

You’re missing newline characters to separate each JSON record. Without them, Spark reads only the first JSON record from a file.

Example of single-line JSON records without newline separators

{"col_1":"us-east-1","col_3":"prod"}{"col_1":"us-east-2","col_3":"dev"}{"col_1":"us-east-3","col_3":"stage"}

Example of JSON records with newline separators

{"col_1":"us-east-1","col_3":"prod"}  
{"col_1":"us-east-2","col_3":"dev"}  
{"col_1":"us-east-3","col_3":"stage"}

Solution

Add newline characters in your source file to separate each JSON object and make sure Spark can read them correctly.

Alternatively, enable Photon runtime to leverage its ability to handle single-line JSON objects without newlines.

Databricks Help Center

Problem

Cause

Example of single-line JSON records without newline separators

Example of JSON records with newline separators

Solution

Contact Us