Sqoop tutorial 5 : Specifying a Target Directory

Sqoop offers two parameters for specifying custom output directories: --target-dir
and --warehouse-dir. 

Sqoop will reject importing data when the final output directory already exists.

Use the --target-dir parameter to specify the directory on HDFS where Sqoop should import your data. For example, use the following command to import the table cities into the directory /etl/input/cities:
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--target-dir /etl/input/cities

To specify the parent directory for all your Sqoop jobs, instead use the --warehousedir
parameter:
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--warehouse-dir /etl/input/

By default, Sqoop will create a directory with the same name as the imported table inside your home directory on HDFS and import all data there. For example, when the user jarcec imports the table cities, it will be stored in /user/jarcec/cities. This directory can be changed to any arbitrary directory on your HDFS using the --targetdir parameter. The only requirement is that this directory must not exist prior to running the Sqoop command.

If you want to run multiple Sqoop jobs for multiple tables, you will need to change the
--target-dir parameter with every invocation. As an alternative, Sqoop offers another
parameter by which to select the output directory. Instead of directly specifying the finaldirectory, the parameter --warehouse-dir allows you to specify only the parent directory.Rather than writing data into the warehouse directory, Sqoop will create a directorywith the same name as the table inside the warehouse directory and import data there.This is similar to the default case where Sqoop imports data to your home directory on HDFS, with the notable exception that the --warehouse-dir parameter allows you touse a directory other than the home directory. Note that this parameter does not need to change with every table import unless you are importing tables with the same name.

Comments

Popular posts from this blog

Hive Tutorial 31 : Analytic Functions

Hive Tutorial 37 : Performance Tuning

How to change sqoop saved job parameters