Hive Tutorial 20 : Hive Serde
Apache Hive uses SerDe (and FileFormat) to read and write data from tables.A SerDe is a short name for a Serializer Deserializer.
An important concept behind Hive is that it DOES NOT own the Hadoop File System (HDFS) format that data is stored in. Users can write files to HDFS with whatever tools/mechanism and use Hive to correctly "parse" that file format in a way that can be used by Hive.
So while selecting the data from Apache Hive SerDe.deserialize() method is called and while inserting the data SerDe.serialize() method is called
Built-in SerDes
- Avro (Hive 0.9.1 and later)
- ORC (Hive 0.11 and later)
- RegEx
- Thrift
- Parquet (Hive 0.13 and later)
- CSV (Hive 0.14 and later)
- JsonSerDe (Hive 0.12 and later in hcatalog-core)
We can add serde to Hive table by using "ROW FORMAT SERDE"
CREATE TABLE test
PARTITIONED BY (ds string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT...
....
Comments
Post a Comment