mongodb - Mongo-Hadoop streaming -
i'm new mongodb , hadoop. i'm trying access mongodb data input hadoop mapreduce job. don't quite know how specify collection use data from. tried:
hadoop jar/usr/local/cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -input user/test/input/ -output user/test/output/ -inputformat com.mongodb.hadoop.mapred.mongoinputformat -outputformat com.mongodb.hadoop.mapred.mongooutputformat -io mongodb -d mongo.input.uri=mongodb://localhost/my_dbs.collectionname -d stream.io.identifier.resolver.class=com.mongodb.hadoop.streaming.io.mongoidentifierresolver -mapper /users/wordcountmapper.py -reducer /users/wordcountreducer.py -libjars /usr/local/cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/mongo-hadoop-streaming.jar
but following error:
error streaming.streamjob: unrecognized option: -d usage: $hadoop_prefix/bin/hadoop jar hadoop-streaming.jar [options]
and when tried this, error:
hadoop jar /usr/local/cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -input user/input/ -output user/test/output -inputformat com.mongodb.hadoop.mapred.mongoinputformat -outputformat com.mongodb.hadoop.mapred.mongooutputformat -io mongodb -jobconf mongo.input.uri=mongodb://localhost/my_dbs.collectionname -jobconf stream.io.identifier.resolver.class=com.mongodb.hadoop.streaming.io.mongoidentifierresolver -mapper /users/wordcountmapper.py -reducer /users/wordcountreducer.py -libjars /usr/local/cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/mongo-hadoop-streaming.jar `error streaming.streamjob: unrecognized option: -libjars usage: $hadoop_prefix/bin/hadoop jar hadoop-streaming.jar [options]`
please help.
please check this link better idea on how connect mongodb hadoop.
edit:
or,
instead of giving jar -libjars option, can directly write in driver program as:
args.add("-libjars"); args.add("/some/path/to/your/jar");