Spark with custom Hadoop FileSystem -


i have cluster yarn, configured use custom hadoop filesystem in core-site.xml:

<property>     <name>fs.custom.impl</name>     <value>package.of.custom.class.customfilesystem</value> </property> 

i want run spark job on yarn cluster, reads input rdd customfilesystem:

final javapairrdd<string, string> files =          sparkcontext.wholetextfiles("custom://path/to/directory"); 

is there way can without re-configuring spark? i.e. can point spark existing core-site.xml, , best way that?

set hadoop_conf_dir directory contains core-site.xml. (this documented in running spark on yarn.)

you still need make sure package.of.custom.class.customfilesystem on classpath.


Popular posts from this blog