Hadoop Hive analytic functions compute an aggregate value that is based on a group of rows. A Hadoop Hive HQL analytic function works on the group of rows and ignores the NULL in the data if you specify Hadoop Hive COUNT Analytic Function Returns number of rows in query or group of rows. Syntax: COUNT(column reference | value expression | *) over(window_spec) For Example; select pat_id, dept_id, count(*) over (partition by dept_id order by dept_id asc) as pat_cnt from patient; at_id dept_id pat_cnt 6 111 4 2 111 4 5 111 4 1 111 4 4 222 3 5 222 3 3 222 3 7 333 1 8 444 1 Hadoop Hive SUM Analytic Function Just like count function, sum Hive analytic function is used to compute the sum of columns or expression. Sum analytic function is used to compute the sum of all rows of table or rows within the groups. Syntax: SUM(column | expression) OVER( window_spec) For example: Calculate sum insured amount of all patients within each department...
Performance plays key role in big data related projects as they deals which huge amount of data. So when you are using Hive if you keep few things in mind then we can see dramatic change in the performance. Performance tuning in hive: Partitions Bucketing File formats Compression Sampling Tez Vectorization Parallel execution CBO Partitions : The concept of partitioning in Hive is very similar to what we have in RDBMS. A table can be partitioned by one or more keys. This will determine how the data will be stored in the table. For example, if a table has two columns, id, name and age; and is partitioned by age, all the rows having same age will be stored together. So when we try to query based on age range, then hive will retrieve the data by going into particular folders instead of parsing through whole data. /hdfs/user/tablename/age/10 /hdfs/user/tablename/age/11 Bucketing : Bucketing is more efficient for sampling,data will be segre...
1. The Art of R Programming by Norm Matloff Link : https://bit.ly/1ydgTrJ 2. Mining with Rattle and R by Graham Williams Link : https://bit.ly/2h7YgGm 3. ggplot2 by Hadley Wickham Link : https://bit.ly/2Dt8AmP 4. R for Data Science by Garrett Grolemund , Hadley Wickham Link : https://bit.ly/2tfmalX 5. R in Action by Robert Kabacoff Link : https://bit.ly/2tLgOQp 6. Machine Learning with R by Brett Lantz Link : https://bit.ly/2KtNPaT 7. R and Data Mining: Examples and Case Studies by Yanchang Zhao Link : https://bit.ly/2sD4QtW 8. The R Book by Michael J. Crawley Link : https://bit.ly/2BPobcO 9. An Introduction to Statistical Learning in R by Gareth James, Daniela Witten Link : https://bit.ly/1iUJso0 10. R through Excel Link : https://bit.ly/2lF0sVN NOTE : PDF links is not a copyright infringement, they are f...
Comments
Post a Comment