hdfs - What is the difference between Block, chunk and file split in Hadoop? -


please clarify me 1)what difference between chunk,block , file split in hadoop?? 2)what internal process of $hadoop fs -put command ?

block : hdfs talks in terms of blocks eg : if have file of 256 mb , have configured block size 128 mb 2 blocks gets created 256 mb.

block size configurable across cluster , file basis also.

split : has related map reduce , have option can change split size , means can modify split size greater block size or split size less block size . default if don't configuration split size approximately equal block size .

in map reduce processing, number of mapper spawned equal number of splits : file if 10 splits there 10 mappers spawned.

when put command being fired , goes namenode , namenode asks client (in case hadoop fs utility behaving client) , break file blocks , per block size , defined in hdfs-site.xml ,namenode ask client write different blocks different data nodes .

actual data store on data nodes , meta data of data means file's block location , file attributes stored on name node .

client first establish connection name node , once gets confirmation store block , directly make tcp connection data nodes , writes data .

based on replication factor other copies maintained in hadoop cluster , blocks information stored on namenode .

but in scenario data node won't have duplicate copies of block , means same block not replicating on same node .


Popular posts from this blog