How to change sqoop saved job parameters
At the initial stage I thought it was impossible to reset the incremental.last.value in a sqoop job. After creating a sqoop job for incremental import of data from sqlserver to hadoop, there comes a request to re-dump the data.
Although, flushing the previously ingested data in hadoop seems to be effortless using the command hdfs dfs -rm /location/to/dir/part* the tough one comes after trying to execute the sqoop job again. The job executes without pulling any record because it sees no increment in the number of records. So, instead of removing the sqoop job and creating another one afresh, the way to go is to reset the incremental.last.value to 0. And this can be done by changing the value of the last record in the sqoop metastore. The steps involves:
Navigate to the home directory using cd ~
Locate the sqoop metastore using ls -a
You'll see a dir named .sqoop, cd into it with cd .sqoop
vi metastore.db.script or nano metastore.db.script
Scroll all the way down to see the details of the most recent job executed. Then locate the line with 'incremental.last.value','xxxxxx','SqoopOptions') where xxxxxx represent the last record pulled. Then change the value to 0 or whatever number you want you next job execution to start with.
Save the file and execute the sqoop job again
Although, flushing the previously ingested data in hadoop seems to be effortless using the command hdfs dfs -rm /location/to/dir/part* the tough one comes after trying to execute the sqoop job again. The job executes without pulling any record because it sees no increment in the number of records. So, instead of removing the sqoop job and creating another one afresh, the way to go is to reset the incremental.last.value to 0. And this can be done by changing the value of the last record in the sqoop metastore. The steps involves:
Navigate to the home directory using cd ~
Locate the sqoop metastore using ls -a
You'll see a dir named .sqoop, cd into it with cd .sqoop
vi metastore.db.script or nano metastore.db.script
Scroll all the way down to see the details of the most recent job executed. Then locate the line with 'incremental.last.value','xxxxxx','SqoopOptions') where xxxxxx represent the last record pulled. Then change the value to 0 or whatever number you want you next job execution to start with.
Save the file and execute the sqoop job again
Comments
Post a Comment