hadoop - Unable to run pig script in pentaho -
i’m using hadoop in distributed mode. wanted execute pig script on hadoop cluster through remote machine. achive i’m using pentaho & pig script utility. set parameter such hdfs hostname: hadoop master name hdfs port: 8020 job tracker hostname: slave machine name job tracker port: 8021 pig script path
i followed link enter link description here
but pig script getting failed below error log
2015/03/27 16:10:20 – repositoriesmeta – reading repositories xml file: c:\users\vijay.shinde\.kettle\repositories.xml 2015/03/27 16:10:21 – version checker – ok 2015/03/27 16:10:45 – spoon – connected metastore : pentaho, added delegating metastore 2015/03/27 16:11:03 – spoon – spoon 2015/03/27 16:11:28 – spoon – starting job… **2015/03/27 16:11:28 – job_pig – start of job execution** 2015/03/27 16:11:28 – job_pig – starting entry [pig script executor] 2015/03/27 16:11:29 – pig script executor – 2015/03/27 16:11:29 – connecting hadoop file system at: hdfs://server_name:8020 2015/03/27 16:11:31 – pig script executor – 2015/03/27 16:11:31 – connecting map-reduce job tracker at:job_tracker:8021 2015/03/27 16:11:32 – pig script executor – 2015/03/27 16:11:32 – pig features used in script: group_by,filter 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – {rules_enabled=[addforeach, columnmapkeyprune, duplicateforeachcolumnrewrite, filterlogicexpressionsimplifier, groupbyconstparallelsetter, implicitsplitinserter, limitoptimizer, loadtypecastinserter, mergefilter, mergeforeach, newpartitionfilteroptimizer, partitionfilteroptimizer, pushdownforeachflatten, pushupfilter, splitfilter, streamtypecastinserter]} 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – file concatenation threshold: 100 optimistic? false 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – choosing move algebraic foreach combiner 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – mr plan size before optimization: 1 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – mr plan size after optimization: 1 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – pig script settings added job 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – mapred.job.reduce.markreset.buffer.percent not set, set default 0.3 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – reduce phase detected, estimating # of required reducers. 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapreducelayer.inputsizereducerestimator 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – bytesperreducer=1000000000 maxreducers=999 totalinputfilesize=110 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – setting parallelism 1 2015/03/27 16:11:33 – pig script executor – 2015/03/27 16:11:33 – creating jar file job9065727596293143224.jar 2015/03/27 16:11:38 – pig script executor – 2015/03/27 16:11:38 – jar file job9065727596293143224.jar created 2015/03/27 16:11:38 – pig script executor – 2015/03/27 16:11:38 – setting single store job 2015/03/27 16:11:38 – pig script executor – 2015/03/27 16:11:38 – key [pig.schematuple] false, not generate code. 2015/03/27 16:11:38 – pig script executor – 2015/03/27 16:11:38 – starting process move generated code distributed cache 2015/03/27 16:11:38 – pig script executor – 2015/03/27 16:11:38 – setting key [pig.schematuple.classes] classes deserialize [] 2015/03/27 16:11:39 – pig script executor – 2015/03/27 16:11:39 – 1 map-reduce job(s) waiting submission. **2015/03/27 16:37:31 – pig script executor – 2015/03/27 16:37:31 – 0% complete 2015/03/27 16:37:36 – pig script executor – 2015/03/27 16:37:36 – ooops! job has failed! specify -stop_on_failure if want pig stop on failure.** 2015/03/27 16:37:36 – pig script executor – 2015/03/27 16:37:36 – job null has failed! stop running dependent jobs 2015/03/27 16:37:36 – pig script executor – 2015/03/27 16:37:36 – 100% complete 2015/03/27 16:37:36 – pig script executor – 2015/03/27 16:37:36 – there no log file write to. 2015/03/27 16:37:36 – pig script executor – 2015/03/27 16:37:36 – backend error message during job submission n/a filtered_records,grouped_records,max_temp,records group_by,combiner message: java.net.connectexception: call server_name/ip_address server_name:8050 failed on connection exception: java.net.connectexception: connection refused: no further information; more details see: http://wiki.apache.org/hadoop/connectionrefused 2015/03/27 16:37:36 – pig script executor – @ sun.reflect.generatedconstructoraccessor26.newinstance(unknown source) 2015/03/27 16:37:36 – pig script executor – @ sun.reflect.delegatingconstructoraccessorimpl.newinstance(delegatingconstructoraccessorimpl.java:45) 2015/03/27 16:37:36 – pig script executor – @ java.lang.reflect.constructor.newinstance(constructor.java:408) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.net.netutils.wrapwithmessage(netutils.java:783) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.net.netutils.wrapexception(netutils.java:730) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.ipc.client.call(client.java:1351) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.ipc.client.call(client.java:1300) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.ipc.protobufrpcengine$invoker.invoke(protobufrpcengine.java:206) 2015/03/27 16:37:36 – pig script executor – @ com.sun.proxy.$proxy21.getnewapplication(unknown source) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.yarn.api.impl.pb.client.applicationclientprotocolpbclientimpl.getnewapplication(applicationclientprotocolpbclientimpl.java:167) 2015/03/27 16:37:36 – pig script executor – @ sun.reflect.generatedmethodaccessor20.invoke(unknown source) 2015/03/27 16:37:36 – pig script executor – @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) 2015/03/27 16:37:36 – pig script executor – @ java.lang.reflect.method.invoke(method.java:483) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.io.retry.retryinvocationhandler.invokemethod(retryinvocationhandler.java:186) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.io.retry.retryinvocationhandler.invoke(retryinvocationhandler.java:102) 2015/03/27 16:37:36 – pig script executor – @ com.sun.proxy.$proxy22.getnewapplication(unknown source) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.yarn.client.api.impl.yarnclientimpl.getnewapplication(yarnclientimpl.java:127) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.yarn.client.api.impl.yarnclientimpl.createapplication(yarnclientimpl.java:135) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.mapred.resourcemgrdelegate.getnewjobid(resourcemgrdelegate.java:175) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.mapred.yarnrunner.getnewjobid(yarnrunner.java:229) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.mapreduce.jobsubmitter.submitjobinternal(jobsubmitter.java:355) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.mapreduce.job$10.run(job.java:1268) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.mapreduce.job$10.run(job.java:1265) 2015/03/27 16:37:36 – pig script executor – @ java.security.accesscontroller.doprivileged(native method) 2015/03/27 16:37:36 – pig script executor – @ javax.security.auth.subject.doas(subject.java:422) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1491) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.mapreduce.job.submit(job.java:1265) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.mapreduce.lib.jobcontrol.controlledjob.submit(controlledjob.java:335) 2015/03/27 16:37:36 – pig script executor – @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) 2015/03/27 16:37:36 – pig script executor – @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62) 2015/03/27 16:37:36 – pig script executor – @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) 2015/03/27 16:37:36 – pig script executor – @ java.lang.reflect.method.invoke(method.java:483) 2015/03/27 16:37:36 – pig script executor – @ org.apache.pig.backend.hadoop23.pigjobcontrol.submit(pigjobcontrol.java:128) 2015/03/27 16:37:36 – pig script executor – @ org.apache.pig.backend.hadoop23.pigjobcontrol.run(pigjobcontrol.java:191) 2015/03/27 16:37:36 – pig script executor – @ java.lang.thread.run(thread.java:745) 2015/03/27 16:37:36 – pig script executor – @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher$1.run(mapreducelauncher.java:270) 2015/03/27 16:37:36 – pig script executor – caused by: java.net.connectexception: connection refused: no further information 2015/03/27 16:37:36 – pig script executor – @ sun.nio.ch.socketchannelimpl.checkconnect(native method) 2015/03/27 16:37:36 – pig script executor – @ sun.nio.ch.socketchannelimpl.finishconnect(socketchannelimpl.java:716) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.net.socketiowithtimeout.connect(socketiowithtimeout.java:206) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.net.netutils.connect(netutils.java:529) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.net.netutils.connect(netutils.java:493) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.ipc.client$connection.setupconnection(client.java:547) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.ipc.client$connection.setupiostreams(client.java:642) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.ipc.client$connection.access$2600(client.java:314) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.ipc.client.getconnection(client.java:1399) 2015/03/27 16:37:36 – pig script executor – @ org.apache.hadoop.ipc.client.call(client.java:1318) 2015/03/27 16:37:36 – pig script executor – … 30 more 2015/03/27 16:37:36 – pig script executor – /data/pig/input/test1, **input(s): failed read data “/data/pig/input/pigtest.txt” output(s): failed produce result in “/data/pig/input/test1″ counters: total records written : 0 total bytes written : 0 spillable memory manager spill count : 0 total bags proactively spilled: 0 total records proactively spilled: 0 job dag:** null 2015/03/27 16:37:36 – pig script executor – 2015/03/27 16:37:36 – failed! 2015/03/27 16:37:36 – pig script executor – 2015/03/27 16:37:36 – error 2244: job failed, hadoop not return error message 2015/03/27 16:37:36 – pig script executor – 2015/03/27 16:37:36 – there no log file write to. 2015/03/27 16:37:36 – pig script executor – 2015/03/27 16:37:36 – org.apache.pig.backend.executionengine.execexception: error 2244: job failed, hadoop not return error message 2015/03/27 16:37:36 – pig script executor – @ org.apache.pig.tools.grunt.gruntparser.executebatch(gruntparser.java:148) 2015/03/27 16:37:36 – pig script executor – @ org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser.java:202) 2015/03/27 16:37:36 – pig script executor – @ org.pentaho.hadoop.shim.common.commonpigshim.executescript(commonpigshim.java:105) 2015/03/27 16:37:36 – pig script executor – @ org.pentaho.di.job.entries.pig.jobentrypigscriptexecutor.execute(jobentrypigscriptexecutor.java:492) 2015/03/27 16:37:36 – pig script executor – @ org.pentaho.di.job.job.execute(job.java:678) 2015/03/27 16:37:36 – pig script executor – @ org.pentaho.di.job.job.execute(job.java:815) 2015/03/27 16:37:36 – pig script executor – @ org.pentaho.di.job.job.execute(job.java:500) 2015/03/27 16:37:36 – pig script executor – @ org.pentaho.di.job.job.run(job.java:407) 2015/03/27 16:37:36 – pig script executor – num successful jobs: 0 num failed jobs: 1 2015/03/27 16:37:36 – job_pig – finished job entry [pig script executor] (result=[false]) 2015/03/27 16:37:36 – job_pig – job execution finished 2015/03/27 16:37:36 – spoon – job has ended.
this pig script.
records = load ‘/data/pig/input/pigtest.txt’ using pigstorage(‘,’) (year:chararray,temperature:int,quality:int); filtered_records = filter records quality==1; grouped_records = group filtered_records year; max_temp = foreach grouped_records generate group,max(filtered_records.temperature); store max_temp ‘/data/pig/input/test1′;
thanks