Apache pig Tutorial 2 : Execution modes


We can run Apache pig using various modes.

Pig has six execution modes or exectypes:
  • Local Mode - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local).
  • Tez Local Mode - To run Pig in tez local mode. It is similar to local mode, except internally Pig will invoke tez runtime engine. Specify Tez local mode using the -x flag (pig -x tez_local).
    Note: Tez local mode is experimental. There are some queries which just error out on bigger data in local mode.
  • Spark Local Mode - To run Pig in spark local mode. It is similar to local mode, except internally Pig will invoke spark runtime engine. Specify Spark local mode using the -x flag (pig -x spark_local).
    Note: Spark local mode is experimental. There are some queries which just error out on bigger data in local mode.
  • Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, but don't need to, specify it using the -x flag (pig OR pig -x mapreduce).
  • Tez Mode - To run Pig in Tez mode, you need access to a Hadoop cluster and HDFS installation. Specify Tez mode using the -x flag (-x tez).
  • Spark Mode - To run Pig in Spark mode, you need access to a Spark, Yarn or Mesos cluster and HDFS installation. Specify Spark mode using the -x flag (-x spark). In Spark execution mode, it is necessary to set env::SPARK_MASTER to an appropriate value (local - local mode, yarn-client - yarn-client mode, mesos://host:port - spark on mesos or spark://host:port - spark cluster. For more information refer to spark documentation on Master URLs, yarn-cluster mode is currently not supported). Pig scripts run on Spark can take advantage of the dynamic allocation feature. The feature can be enabled by simply enabling spark.dynamicAllocation.enabled. Refer to spark configuration for additional configuration details. In general all properties in the pig script prefixed with spark. are copied to the Spark Application Configuration.

/* local mode */
$ pig -x local ...
 
/* Tez local mode */
$ pig -x tez_local ...
 
/* Spark local mode */
$ pig -x spark_local ...

/* mapreduce mode */
$ pig ...
or
$ pig -x mapreduce ...

/* Tez mode */
$ pig -x tez ...

/* Spark mode */
$ pig -x spark ...

Comments

Popular posts from this blog

Hive Tutorial 31 : Analytic Functions

Hive Tutorial 37 : Performance Tuning

How to change sqoop saved job parameters