Hadoop Hive analytic functions compute an aggregate value that is based on a group of rows. A Hadoop Hive HQL analytic function works on the group of rows and ignores the NULL in the data if you specify Hadoop Hive COUNT Analytic Function Returns number of rows in query or group of rows. Syntax: COUNT(column reference | value expression | *) over(window_spec) For Example; select pat_id, dept_id, count(*) over (partition by dept_id order by dept_id asc) as pat_cnt from patient; at_id dept_id pat_cnt 6 111 4 2 111 4 5 111 4 1 111 4 4 222 3 5 222 3 3 222 3 7 333 1 8 444 1 Hadoop Hive SUM Analytic Function Just like count function, sum Hive analytic function is used to compute the sum of columns or expression. Sum analytic function is used to compute the sum of all rows of table or rows within the groups. Syntax: SUM(column | expression) OVER( window_spec) For example: Calculate sum insured amount of all patients within each department...
Performance plays key role in big data related projects as they deals which huge amount of data. So when you are using Hive if you keep few things in mind then we can see dramatic change in the performance. Performance tuning in hive: Partitions Bucketing File formats Compression Sampling Tez Vectorization Parallel execution CBO Partitions : The concept of partitioning in Hive is very similar to what we have in RDBMS. A table can be partitioned by one or more keys. This will determine how the data will be stored in the table. For example, if a table has two columns, id, name and age; and is partitioned by age, all the rows having same age will be stored together. So when we try to query based on age range, then hive will retrieve the data by going into particular folders instead of parsing through whole data. /hdfs/user/tablename/age/10 /hdfs/user/tablename/age/11 Bucketing : Bucketing is more efficient for sampling,data will be segre...
Comments
Post a Comment