Hive:GC 开销或堆 space 错误 - 动态分区 table
Hive: GC Overhead or Heap space error - dynamic partitioned table
能否指导我解决此 GC 开销和堆 space 错误。
我正在尝试使用以下查询从另一个 table(动态分区)插入分区 table:
INSERT OVERWRITE table tbl_part PARTITION(county)
SELECT col1, col2.... col47, county FROM tbl;
我有运行以下参数:
export HADOOP_CLIENT_OPTS=" -Xmx2048m"
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=2048;
SET hive.exec.max.dynamic.partitions.pernode=256;
set mapreduce.map.memory.mb=2048;
set yarn.scheduler.minimum-allocation-mb=2048;
set hive.exec.max.created.files=250000;
set hive.vectorized.execution.enabled=true;
set hive.merge.smallfiles.avgsize=283115520;
set hive.merge.size.per.task=209715200;
也加入了yarn-site.xml :
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
运行 免费-m:
total used free shared buffers cached
Mem: 15347 11090 4256 0 174 6051
-/+ buffers/cache: 4864 10483
Swap: 15670 18 15652
它是一个具有 1 个核心的独立集群。正在为 运行 我在 spark 中的单元测试用例准备测试数据。
你能指导我还能做些什么吗?
来源 table 具有以下属性:
Table Parameters:
COLUMN_STATS_ACCURATE true
numFiles 13
numRows 10509065
rawDataSize 3718599422
totalSize 3729108487
transient_lastDdlTime 1470909228
谢谢。
添加DISTRIBUTE BY county
您的查询:
INSERT OVERWRITE table tbl_part PARTITION(county) SELECT col1, col2.... col47, county FROM tbl DISTRIBUTE BY county;
能否指导我解决此 GC 开销和堆 space 错误。
我正在尝试使用以下查询从另一个 table(动态分区)插入分区 table:
INSERT OVERWRITE table tbl_part PARTITION(county)
SELECT col1, col2.... col47, county FROM tbl;
我有运行以下参数:
export HADOOP_CLIENT_OPTS=" -Xmx2048m"
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=2048;
SET hive.exec.max.dynamic.partitions.pernode=256;
set mapreduce.map.memory.mb=2048;
set yarn.scheduler.minimum-allocation-mb=2048;
set hive.exec.max.created.files=250000;
set hive.vectorized.execution.enabled=true;
set hive.merge.smallfiles.avgsize=283115520;
set hive.merge.size.per.task=209715200;
也加入了yarn-site.xml :
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
运行 免费-m:
total used free shared buffers cached
Mem: 15347 11090 4256 0 174 6051
-/+ buffers/cache: 4864 10483
Swap: 15670 18 15652
它是一个具有 1 个核心的独立集群。正在为 运行 我在 spark 中的单元测试用例准备测试数据。
你能指导我还能做些什么吗?
来源 table 具有以下属性:
Table Parameters:
COLUMN_STATS_ACCURATE true
numFiles 13
numRows 10509065
rawDataSize 3718599422
totalSize 3729108487
transient_lastDdlTime 1470909228
谢谢。
添加DISTRIBUTE BY county
您的查询:
INSERT OVERWRITE table tbl_part PARTITION(county) SELECT col1, col2.... col47, county FROM tbl DISTRIBUTE BY county;