Hive Tutorial 14 : Static partition vs Dynamic Partition
Consider a sample employee data.
1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C
6,Margaret,Tech,12000,A
7,Nirmal,Tech,12000,B
8,jinju,Engineer,45000,B
9,Nancy,Admin,50000,A
10,Andrew,Manager,40000,A
Create a non-partitioned table:
create external table emp (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String) row format delimited fields terminated by "," location '/user/prabhanj/HiveTrail';
Create a partitioned table:
create table emp_partition (EmployeeID Int,FirstName String,Designation String,Salary Int) PARTITIONED BY (Department String) row format delimited fields terminated by ","
There are 2 ways to insert data into partition table.
1. Static Partition - Using individual insert:
INSERT INTO TABLE emp_partition PARTITION(department='A')
SELECT EmployeeID, FirstName,Designation,Salary FROM emp WHERE department='A';
INSERT INTO TABLE emp_partition PARTITION (department='B')
SELECT EmployeeID, FirstName,Designation,Salary FROM emp WHERE department='B';
INSERT INTO TABLE emp_partition PARTITION (department='C')
SELECT EmployeeID, FirstName,Designation,Salary FROM emp WHERE department='C';
If we go for the above approach , if we have 50 partitions we need to do the insert statement 50 times. That is a tedeous task and it is known as Static Partition.
2. Dynamic Partition – Single insert to partition table
Instead of loading each partition with single SQL statement as shown above, which will result in writing lot of SQL statements for huge no of partitions, Hive supports dynamic partitioning with which we can add any number of partitions with single SQL execution. Hive will automatically splits our data into separate partition files based on the values of partition keys present in the input files.It gives the advantages of easy coding and no need of manual identification of partitions.
Inorder to achieve the same we need to set 4 things,1. set hive.exec.dynamic.partition=true
This enable dynamic partitions, by default it is false.
2. set hive.exec.dynamic.partition.mode=nonstrict
We are using the dynamic partition without a static
partition (A table can be partitioned based
on multiple columns in hive) in such case we have to
enable the non strict mode. In strict mode we can use
dynamic partition only with a Static Partition.
3. set hive.exec.max.dynamic.partitions.pernode=3
The default value is 100, we have to modify the
same according to the possible no of partitions
4. hive.exec.max.created.files=150000
The default values is 100000 but for larger tables
it can exceed the default, so we may have to update the same.
INSERT OVERWRITE TABLE emp_partition PARTITION(department) SELECT EmployeeID, FirstName,Designation,Salary,department FROM emp;
We can also load data by using LOAD statement:
hive> LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
INTO TABLE emp_partition
PARTITION (Department='A');
Comments
Post a Comment