Sqoop tutorial 12 : Importing All Your Tables

Rather than using the import tool for one table, you can use the import-all-tables
tool.
For example, to import all tables from our example database, you would use the following Sqoop command:

sqoop import-all-tables \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop

When using the import-all-tables tool, Sqoop will use catalog queries to retrieve a list of all tables in your database. Subsequently for each table, Sqoop will internally call the import tool to import your data to Hadoop. Tables will be imported in sequential order to avoid any unnecessary burden on the database server that would be created by importing several tables at the same time. With this one command, Sqoop will import the entire content of a given database to Hadoop.

If you need to import all but a few tables, you can use the parameter --excludetables that accepts a comma-separated list of table names that should be excluded from the bulk import.

For example, if you need to import all tables from the database except cities and countries, you would use the following command:

sqoop import-all-tables \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--exclude-tables cities,countries

Importing all tables will internally call the usual import tool for transferring each table. You’ll find that many of the import parameters can’t be used in conjunction with the
import-all-tables tool.

For example, you can’t use the parameter --target-dir, as that would instruct Sqoop to import all tables into the same directory, resulting in a total file mess on HDFS. Using the --warehouse-dir parameter is fine, as this parameter can be easily used for all imported tables. You can take advantage of the parameter --exclude-tables to skip importing tables that need special parameters; you can then import them separately using the import tool, which allows you to specify additional parameters.

Comments

Popular posts from this blog

Hive Tutorial 31 : Analytic Functions

Hive Tutorial 37 : Performance Tuning

How to change sqoop saved job parameters