mongodb - Mongo-Hadoop streaming -


i'm new mongodb , hadoop. i'm trying access mongodb data input hadoop mapreduce job. don't quite know how specify collection use data from. tried:

hadoop jar/usr/local/cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar  -input user/test/input/ -output user/test/output/ -inputformat com.mongodb.hadoop.mapred.mongoinputformat -outputformat com.mongodb.hadoop.mapred.mongooutputformat -io mongodb -d mongo.input.uri=mongodb://localhost/my_dbs.collectionname  -d stream.io.identifier.resolver.class=com.mongodb.hadoop.streaming.io.mongoidentifierresolver  -mapper /users/wordcountmapper.py  -reducer /users/wordcountreducer.py  -libjars /usr/local/cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/mongo-hadoop-streaming.jar 

but following error:

error streaming.streamjob: unrecognized option: -d usage: $hadoop_prefix/bin/hadoop jar hadoop-streaming.jar [options] 

and when tried this, error:

 hadoop jar /usr/local/cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar  -input user/input/  -output user/test/output  -inputformat com.mongodb.hadoop.mapred.mongoinputformat  -outputformat com.mongodb.hadoop.mapred.mongooutputformat  -io mongodb -jobconf mongo.input.uri=mongodb://localhost/my_dbs.collectionname  -jobconf stream.io.identifier.resolver.class=com.mongodb.hadoop.streaming.io.mongoidentifierresolver  -mapper /users/wordcountmapper.py  -reducer /users/wordcountreducer.py  -libjars /usr/local/cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/mongo-hadoop-streaming.jar  `error streaming.streamjob: unrecognized option: -libjars usage: $hadoop_prefix/bin/hadoop jar hadoop-streaming.jar [options]` 

please help.

please check this link better idea on how connect mongodb hadoop.

edit:

or,

instead of giving jar -libjars option, can directly write in driver program as:

args.add("-libjars"); args.add("/some/path/to/your/jar"); 

Popular posts from this blog