Spark with custom Hadoop FileSystem -
i have cluster yarn, configured use custom hadoop filesystem in core-site.xml:
<property> <name>fs.custom.impl</name> <value>package.of.custom.class.customfilesystem</value> </property> i want run spark job on yarn cluster, reads input rdd customfilesystem:
final javapairrdd<string, string> files = sparkcontext.wholetextfiles("custom://path/to/directory"); is there way can without re-configuring spark? i.e. can point spark existing core-site.xml, , best way that?
set hadoop_conf_dir directory contains core-site.xml. (this documented in running spark on yarn.)
you still need make sure package.of.custom.class.customfilesystem on classpath.