Sqoop tutorial 5 : Specifying a Target Directory
Sqoop offers two
parameters for specifying custom output directories: --target-dir
and --warehouse-dir.
Sqoop will reject
importing data when the final output directory already exists.
Use the
--target-dir parameter to specify the directory on HDFS where Sqoop should
import your data. For example, use the following command to import the table
cities into the directory /etl/input/cities:
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--target-dir /etl/input/cities
To specify the parent directory for all your Sqoop jobs, instead
use the --warehousedir
parameter:
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--warehouse-dir /etl/input/
By default, Sqoop will create a directory with the same name as
the imported table inside your home directory on HDFS and import all data
there. For example, when the user jarcec imports the table cities, it will be
stored in /user/jarcec/cities. This directory can be changed to any arbitrary
directory on your HDFS using the --targetdir parameter. The only
requirement is that this directory must not exist prior to running the Sqoop
command.
If you want to run multiple Sqoop jobs for multiple tables, you
will need to change the
--target-dir parameter with every invocation. As an alternative,
Sqoop offers another
parameter by which to select the output directory. Instead of
directly specifying the finaldirectory, the parameter --warehouse-dir allows
you to specify only the parent directory.Rather than writing data into the
warehouse directory, Sqoop will create a directorywith the same name as the
table inside the warehouse directory and import data there.This is similar to
the default case where Sqoop imports data to your home directory on HDFS, with
the notable exception that the --warehouse-dir parameter allows you touse a
directory other than the home directory. Note that this parameter does not need
to change with every table import unless you are importing tables with the same
name.
Comments
Post a Comment