Apache pig Tutorial 4: Pig Latin Statements


Pig Latin statements are the basic constructs you use to process data using Pig. A Pig Latin statement is an operator that takes a relation as input and produces another relation as output. (This definition applies to all Pig Latin operators except LOAD and STORE which read data from and write data to the file system.) Pig Latin statements may include expressions and schemas. Pig Latin statements can span multiple lines and must end with a semi-colon ( ; ). By default, Pig Latin statements are processed using multi-query execution.
Pig Latin statements are generally organized as follows:
  • A LOAD statement to read data from the file system.
  • A series of "transformation" statements to process the data.
  • A DUMP statement to view results or a STORE statement to save the results.
Note that a DUMP or STORE statement is required to generate output.
  • In this example Pig will validate, but not execute, the LOAD and FOREACH statements.
    A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
    B = FOREACH A GENERATE name;
    
  • In this example, Pig will validate and then execute the LOAD, FOREACH, and DUMP statements.
    A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
    B = FOREACH A GENERATE name;
    DUMP B;
    (John)
    (Mary)
    (Bill)
    (Joe)

Comments

Popular posts from this blog

Hive Tutorial 31 : Analytic Functions

Hive Tutorial 37 : Performance Tuning

How to change sqoop saved job parameters