Hive Tutorial 25 : Hive Vectorization
Vectorization allows Hive to process a batch of rows together instead of processing one row at a time. Each batch is usually an array of primitive types. Operations are performed on the entire column vector, which improves the instruction pipelines and cache usage. Introduced in hive-0.13, helps to improve performance of operations like scans, aggregations, filters and joins.
Standard Query execution:
A standard query execution system processes one row at a time. This involves long code paths and significant metadata interpretation in the inner loop of execution.
Row at a time execution is slow because of following:
- Hive uses Object Inspectors to work on a row, Enables level of abstraction ,Costs major performance, makes it worse by using lazy serdes.
- Inner loop has many method, new(), and if-then-else calls; to executed these Hive requires Lots of CPU instructions.
Vectorized query execution
Vectorized query execution streamlines operations by processing a block of 1024 rows at a time. Within the block, each column is stored as a vector (an array of a primitive data type). Simple operations like arithmetic and comparisons are done by quickly iterating through the vectors in a tight loop, with no or very few function calls or conditional branches inside the loop.
Enabling Vectorized execution
Vectorized execution is off by default, so your queries only utilize it if this property is enabled. To enable it set the following property as true:
set hive.vectorized.execution.enabled = true;
To disable vectorized execution and go back to standard execution set it false again.
Note:
- 1. To use vectorized query execution, you must store your data in ORC format.
- 2. Timestamps only work correctly with vectorized execution if the timestamp value is between 1677-09-20 and 2262-04-11. This limitation is due to the fact that a vectorized timestamp value is stored as a long value representing nanoseconds before/after the Unix Epoch time of 1970-01-01 00:00:00 UTC.
Comments
Post a Comment