Spark with custom Hadoop FileSystem -
i have cluster yarn, configured use custom hadoop filesystem in core-site.xml:
<property> <name>fs.custom.impl</name> <value>package.of.custom.class.customfilesystem</value> </property>
i want run spark job on yarn cluster, reads input rdd customfilesystem:
final javapairrdd<string, string> files = sparkcontext.wholetextfiles("custom://path/to/directory");
is there way can without re-configuring spark? i.e. can point spark existing core-site.xml, , best way that?
set hadoop_conf_dir
directory contains core-site.xml
. (this documented in running spark on yarn.)
you still need make sure package.of.custom.class.customfilesystem
on classpath.