Hive Tutorial 14 : Static partition vs Dynamic Partition

Consider  a sample employee data.


1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C
6,Margaret,Tech,12000,A
7,Nirmal,Tech,12000,B
8,jinju,Engineer,45000,B
9,Nancy,Admin,50000,A
10,Andrew,Manager,40000,A


Create a non-partitioned table:
create external table emp (EmployeeID Int,FirstName String,Designation  String,Salary Int,Department String) row format delimited fields terminated by "," location '/user/prabhanj/HiveTrail';

Create a partitioned table:
create  table emp_partition (EmployeeID Int,FirstName String,Designation  String,Salary Int) PARTITIONED BY (Department String) row format delimited fields terminated by ","

There are 2 ways to insert data into partition table.

 1. Static Partition - Using individual insert:


INSERT INTO TABLE emp_partition PARTITION(department='A') 
SELECT EmployeeID, FirstName,Designation,Salary FROM emp WHERE department='A'; 

INSERT INTO TABLE emp_partition PARTITION (department='B') 
SELECT EmployeeID, FirstName,Designation,Salary FROM emp WHERE department='B'; 

INSERT INTO TABLE emp_partition PARTITION (department='C') 

SELECT EmployeeID, FirstName,Designation,Salary FROM emp WHERE department='C';

 If we go for the above approach , if we have 50 partitions we need to do the insert statement 50 times. That is a tedeous task and it is known as Static Partition.


 2. Dynamic Partition – Single insert to partition table


Instead of loading each partition with single SQL statement as shown above, which will result in writing lot of SQL statements for huge no of partitions, Hive supports dynamic partitioning with which we can add any number of partitions with single SQL execution. Hive will automatically splits our data into separate partition files based on the values of partition keys present in the input files.It gives the advantages of easy coding and no need of manual identification of partitions.
Inorder to achieve the same we need to set 4 things,

1. set hive.exec.dynamic.partition=true
     This enable dynamic partitions, by default it is false.
2. set hive.exec.dynamic.partition.mode=nonstrict
     We are using the dynamic partition without a static
     partition (A table can be partitioned based    
     on multiple columns in hive) in such case we have to           
     enable the non strict mode. In strict mode we can use             
     dynamic partition  only with a Static Partition.
3. set hive.exec.max.dynamic.partitions.pernode=3
     The default value is 100, we have to modify the   
     same according to the possible no of partitions
4. hive.exec.max.created.files=150000
     The default values is 100000 but for larger tables  

     it can exceed the default, so we may have to update the same.  

INSERT OVERWRITE TABLE emp_partition PARTITION(department) SELECT EmployeeID, FirstName,Designation,Salary,department FROM emp;

We can also load data by using LOAD statement:

hive> LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
      INTO TABLE emp_partition

      PARTITION (Department='A');

Comments

Popular posts from this blog

Hive Tutorial 31 : Analytic Functions

Hive Tutorial 37 : Performance Tuning