Create tables on JSON datasets

In this article we cover how to create a table on JSON datasets using SerDe.

Download the JSON SerDe JAR

  1. Open the hive-json-serde 1.3.8 download page.
  2. Click on json-serde-1.3.8-jar-with-dependencies.jar to download the file json-serde-1.3.8-jar-with-dependencies.jar.

Note

You can review the Hive-JSON-Serde GitHub repo for more information on the JAR, including source code.

Install the JSON SerDe JAR on your cluster

  1. Select your cluster in the workspace.
  2. Click the Libraries tab.
  3. Click Install new.
  4. In the Library Source button list, select Upload.
  5. In the Library Type button list, select JAR.
  6. Click Drop JAR here.
  7. Select the json-serde-1.3.8-jar-with-dependencies.jar file.
  8. Click Install.

Configure SerDe properties in the create table statement

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '<path-to-json-files>'

For example:

create table <name-of-table> (timestamp_unix string, comments string, start_date string, end_date string)
partitioned by (yyyy string, mm string, dd string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '<path-to-json-files>'

This example creates a table that is partitioned by the columns yyyy, mm, and dd.

Run a repair table statement after the table is created

For example:

msck repair table <name-of-table>