Problem
In your Apache Spark jobs, you notice some JSON files are processed incorrectly, leading to output containing only the first JSON object instead of all the records in the file.
Cause
You’re missing newline characters to separate each JSON record. Without them, Spark reads only the first JSON record from a file.
Example of single-line JSON records without newline separators
{"col_1":"us-east-1","col_3":"prod"}{"col_1":"us-east-2","col_3":"dev"}{"col_1":"us-east-3","col_3":"stage"}
Example of JSON records with newline separators
{"col_1":"us-east-1","col_3":"prod"}
{"col_1":"us-east-2","col_3":"dev"}
{"col_1":"us-east-3","col_3":"stage"}
Solution
Add newline characters in your source file to separate each JSON object and make sure Spark can read them correctly.
Alternatively, enable Photon runtime to leverage its ability to handle single-line JSON objects without newlines.