In this article we cover how to create a table on JSON datasets using SerDe.
Download the JSON SerDe JAR
- Open the hive-json-serde 1.3.8 download page.
- Click on json-serde-1.3.8-jar-with-dependencies.jar to download the file json-serde-1.3.8-jar-with-dependencies.jar.
Install the JSON SerDe JAR on your cluster
- Select your cluster in the workspace.
- Click the Libraries tab.
- Click Install new.
- In the Library Source button list, select Upload.
- In the Library Type button list, select JAR.
- Click Drop JAR here.
- Select the json-serde-1.3.8-jar-with-dependencies.jar file.
- Click Install.
Configure SerDe properties in the create table statement
%sql ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '<path-to-json-files>'
For example:
%sql create table <name-of-table> (timestamp_unix string, comments string, start_date string, end_date string) partitioned by (yyyy string, mm string, dd string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '<path-to-json-files>' This example creates a table that is partitioned by the columns yyyy, mm, and dd.
Run a repair table statement after the table is created
For example:
%sql msck repair table <name-of-table>