Create tables on JSON datasets

Create tables on JSON datasets; requires SerDe JAR.

Written by ram.sankarasubramanian

Last published at: May 31st, 2022

In this article we cover how to create a table on JSON datasets using SerDe.

Download the JSON SerDe JAR

Open the hive-json-serde 1.3.8 download page.
Click on json-serde-1.3.8-jar-with-dependencies.jar to download the file json-serde-1.3.8-jar-with-dependencies.jar.

Delete

Info

You can review the Hive-JSON-Serde GitHub repo for more information on the JAR, including source code.

Install the JSON SerDe JAR on your cluster

Select your cluster in the workspace.
Click the Libraries tab.
Click Install new.
In the Library Source button list, select Upload.
In the Library Type button list, select JAR.
Click Drop JAR here.
Select the json-serde-1.3.8-jar-with-dependencies.jar file.
Click Install.

Configure SerDe properties in the create table statement

%sql

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '<path-to-json-files>'

For example:

%sql

create table <name-of-table> (timestamp_unix string, comments string, start_date string, end_date string)
partitioned by (yyyy string, mm string, dd string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '<path-to-json-files>'
This example creates a table that is partitioned by the columns yyyy, mm, and dd.

Run a repair table statement after the table is created