Hive Tutorial 12 : Hive External table vs Internal/Managed table difference
1. For Managed table , the life cycle of table will be maintained by Hive.For external hive wont manage the lifecycle.
2. For Managed table ,data will be stored in /user/hive/warehouse directory, for external table we can specify other path.
3. By default hive will create managed table, we have to provide External key word in create query to create external table.
4. If we delete managed table, hive will delete schema & data as well, but if we delete external table only schema will be deleted from Metadata.
5. So if we want our data to be available for other tools in hadoop ecosystem like pig,mapreduce etc,, we have to create external tables.
Use EXTERNAL tables when:
The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn't lock the files.
Data needs to remain in the underlying location even after a DROP TABLE. This can apply if you are pointing multiple schemas (tables or views) at a single data set or if you are iterating through various possible schemas.
You want to use a custom location such as ASV.
Hive should not own data and control settings, dirs, etc., you have another program or process that will do those things.
You are not creating table based on existing table (AS SELECT).
Use INTERNAL tables when:
The data is temporary.
You want Hive to completely manage the lifecycle of the table and data.
2. For Managed table ,data will be stored in /user/hive/warehouse directory, for external table we can specify other path.
3. By default hive will create managed table, we have to provide External key word in create query to create external table.
4. If we delete managed table, hive will delete schema & data as well, but if we delete external table only schema will be deleted from Metadata.
5. So if we want our data to be available for other tools in hadoop ecosystem like pig,mapreduce etc,, we have to create external tables.
Use EXTERNAL tables when:
The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn't lock the files.
Data needs to remain in the underlying location even after a DROP TABLE. This can apply if you are pointing multiple schemas (tables or views) at a single data set or if you are iterating through various possible schemas.
You want to use a custom location such as ASV.
Hive should not own data and control settings, dirs, etc., you have another program or process that will do those things.
You are not creating table based on existing table (AS SELECT).
Use INTERNAL tables when:
The data is temporary.
You want Hive to completely manage the lifecycle of the table and data.
Comments
Post a Comment