hadoop大文件不拆分
hadoop large file does not split
我有一个大小为 136MB
的输入文件,我启动了一些 WordCount
测试并且我只监控一个映射器。然后我在 hdfs-site.xml
中将 dfs.blocksize
设置为 64MB
,我仍然得到一个映射器。我做错了吗?
dfs.block.size
is not alone playing a role and it's recommended not
to change because it applies globally to HDFS.
Split size in mapreduce is calculated by this formula
max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))
So you can set these properties in driver class as
conf.setLong("mapred.max.split.size", maxSplitSize);
conf.setLong("mapred.min.split.size", minSplitSize);
Or in Config file as
<property>
<name>mapred.max.split.size</name>
<value>134217728</value>
</property>
<property>
<name>mapred.min.split.size</name>
<value>134217728</value>
</property>
我有一个大小为 136MB
的输入文件,我启动了一些 WordCount
测试并且我只监控一个映射器。然后我在 hdfs-site.xml
中将 dfs.blocksize
设置为 64MB
,我仍然得到一个映射器。我做错了吗?
dfs.block.size
is not alone playing a role and it's recommended not to change because it applies globally to HDFS.Split size in mapreduce is calculated by this formula
max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))
So you can set these properties in driver class as
conf.setLong("mapred.max.split.size", maxSplitSize); conf.setLong("mapred.min.split.size", minSplitSize);
Or in Config file as
<property> <name>mapred.max.split.size</name> <value>134217728</value> </property> <property> <name>mapred.min.split.size</name> <value>134217728</value> </property>