Hive Tutorial 20 : Hive Serde

July 21, 2017

Apache Hive uses SerDe (and FileFormat) to read and write data from tables.A SerDe is a short name for a Serializer Deserializer.

An important concept behind Hive is that it DOES NOT own the Hadoop File System (HDFS) format that data is stored in. Users can write files to HDFS with whatever tools/mechanism and use Hive to correctly "parse" that file format in a way that can be used by Hive.

So while selecting the data from Apache Hive SerDe.deserialize() method is called and while inserting the data SerDe.serialize() method is called

Built-in SerDes

Avro (Hive 0.9.1 and later)
ORC (Hive 0.11 and later)
RegEx
Thrift
Parquet (Hive 0.13 and later)
CSV (Hive 0.14 and later)
JsonSerDe (Hive 0.12 and later in hcatalog-core)

We can add serde to Hive table by using "ROW FORMAT SERDE"

CREATE TABLE test

  PARTITIONED BY (ds string)

  ROW FORMAT SERDE

  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'

  STORED AS INPUTFORMAT...

  ....

Search This Blog

BigD360

Hive Tutorial 20 : Hive Serde

Comments

Post a Comment

Popular posts from this blog

Hive Tutorial 31 : Analytic Functions

MongoDB Tutorial 10 : Operations on Collections

Top 10 best free books for learning R