Create tables on JSON datasets
In this article we cover how to create a table on JSON datasets using SerDe.
Download the JSON SerDe JAR
- Open the hive-json-serde 1.3.8 download page.
- Click on json-serde-1.3.8-jar-with-dependencies.jar to download the file
json-serde-1.3.8-jar-with-dependencies.jar
.
Note
You can review the Hive-JSON-Serde GitHub repo for more information on the JAR, including source code.
Install the JSON SerDe JAR on your cluster
- Select your cluster in the workspace.
- Click the Libraries tab.
- Click Install new.
- In the Library Source button list, select Upload.
- In the Library Type button list, select JAR.
- Click Drop JAR here.
- Select the
json-serde-1.3.8-jar-with-dependencies.jar
file. - Click Install.
Configure SerDe properties in the create table statement
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '<path-to-json-files>'
For example:
create table <name-of-table> (timestamp_unix string, comments string, start_date string, end_date string)
partitioned by (yyyy string, mm string, dd string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '<path-to-json-files>'
This example creates a table that is partitioned by the columns yyyy, mm, and dd.