Search This Blog

BigD360

MongoDB Tutorial 21 : Increasing Shell batch Size

Get link
Facebook
X
Pinterest
Email
Other Apps

June 04, 2018

By default Mongo prints out first 20 documents IIRC. If you want more you can define new value to batch size in Mongo shell, e.g.

DBQuery.shellBatchSize = 100

Get link
Facebook
X
Pinterest
Email
Other Apps

Comments

Hive Tutorial 31 : Analytic Functions

May 29, 2018

Hadoop Hive analytic functions compute an aggregate value that is based on a group of rows. A Hadoop Hive HQL analytic function works on the group of rows and ignores the NULL in the data if you specify Hadoop Hive COUNT Analytic Function Returns number of rows in query or group of rows. Syntax: COUNT(column reference | value expression | *) over(window_spec) For Example; select pat_id, dept_id, count(*) over (partition by dept_id order by dept_id asc) as pat_cnt from patient; at_id dept_id pat_cnt 6 111 4 2 111 4 5 111 4 1 111 4 4 222 3 5 222 3 3 222 3 7 333 1 8 444 1 Hadoop Hive SUM Analytic Function Just like count function, sum Hive analytic function is used to compute the sum of columns or expression. Sum analytic function is used to compute the sum of all rows of table or rows within the groups. Syntax: SUM(column | expression) OVER( window_spec) For example: Calculate sum insured amount of all patients within each department...

Hive Tutorial 37 : Performance Tuning

July 07, 2018

Performance plays key role in big data related projects as they deals which huge amount of data. So when you are using Hive if you keep few things in mind then we can see dramatic change in the performance. Performance tuning in hive: Partitions Bucketing File formats Compression Sampling Tez Vectorization Parallel execution CBO Partitions : The concept of partitioning in Hive is very similar to what we have in RDBMS. A table can be partitioned by one or more keys. This will determine how the data will be stored in the table. For example, if a table has two columns, id, name and age; and is partitioned by age, all the rows having same age will be stored together. So when we try to query based on age range, then hive will retrieve the data by going into particular folders instead of parsing through whole data. /hdfs/user/tablename/age/10 /hdfs/user/tablename/age/11 Bucketing : Bucketing is more efficient for sampling,data will be segre...