hadoop - How can I read from one HBase instance but write to another? -
currently have 2 hbase tables (lets call them tablea
, tableb
). using single stage mapreduce job data in tablea
read processed , saved tableb
. both tables reside on same hbase cluster. however, need relocate tableb
on cluster.
is possible configure single stage map reduce job in hadoop read , write separate instances of hbase?
it possible, hbase's copytable mapreduce job using tablemapreduceutil.inittablereducerjob()
allows set alternative quorumaddress in case need write remote clusters:
public static void inittablereducerjob(string table, class<? extends tablereducer> reducer, org.apache.hadoop.mapreduce.job job, class partitioner, string quorumaddress, string serverclass, string serverimpl)
quorumaddress - distant cluster write to; default null output cluster designated in hbase-site.xml. set string zookeeper ensemble of alternate remote cluster when have reduce write cluster other default; e.g. copying tables between clusters, source designated hbase-site.xml , param have ensemble address of remote cluster. format pass particular. pass :: such server,server2,server3:2181:/hbase.
another option implement own custom reducer write remote table instead of writing context. similar this:
public static class myreducer extends reducer<text, result, text, text> { protected table remotetable; protected connection connection; @override protected void setup(context context) throws ioexception, interruptedexception { super.setup(context); // clone configuration , provide new quorum address remote cluster configuration config = hbaseconfiguration.create(context.getconfiguration()); config.set("hbase.zookeeper.quorum","quorum1,quorum2,quorum3"); connection = connectionfactory.createconnection(config); // hbase 0.99+ //connection = hconnectionmanager.createconnection(config); // hbase <0.99 remotetable = connection.gettable("mytable".getbytes()); remotetable.setautoflush(false); remotetable.setwritebuffersize(1024l*1024l*10l); // 10mb buffer } public void reduce(text boardkey, iterable<result> results, context context) throws ioexception, interruptedexception { /* write puts remotetable */ } @override protected void cleanup(context context) throws ioexception, interruptedexception { super.cleanup(context); if (remotetable!=null) { remotetable.flushcommits(); remotetable.close(); } if(connection!=null) { connection.close(); } } }