使用 Hiveql 循环

Question

我正在尝试合并 2 个数据集，比如 A 和 B。数据集 A 有一个变量 "Flag"，它有 2 个值。我没有将两个数据合并在一起，而是试图根据 "flag" 变量合并 2 个数据集。

合并代码如下：

create table new_data as
select a.*,b.y
from A as a left join B as b
on a.x=b.x

因为我是运行通过 CLI 编写 Hive 代码，所以我通过以下命令调用它

hive -f new_data.hql

我调用的基于 "Flag" 变量合并数据的代码循环部分如下：

for flag in 1 2;
do
  hive -hivevar flag=$flag -f new_data.hql
done

我将上面的代码放在另一个“.hql”文件中并调用它：

hive -f loop_data.hql

但是它抛出错误。

cannot recognize input near 'for' 'flag' 'in'

谁能告诉我哪里出错了。

谢谢！

Answer 1

文件名：loop_data.sh

for flag in 1 2;
do
  hive -hivevar flag=$flag -f new_data.hql
done

并执行如下脚本：

sh loop_data.sh

DDL：创建_new_data.hql

create table new_data as
select 
  a.*,
  b.y
from 
  A as a left join 
  B as b on 
  a.x = b.x
where 
  1 = 0;

DML: insert_new_data.hql

insert into new_data 
select 
  a.*,
  b.y
from 
  A as a left join 
  B as b on 
  a.x = b.x
where
  flag = ${hiveconf:flag}

并更新您的 shell 脚本，例如：

文件名：loop_new_data.sh

# Create table
hive -f create_new_data.hql

# Insert data
for flag in 1 2;
do
  hive -hiveconf flag=$flag -f insert_new_data.hql
done

并像这样执行：

sh loop_new_data.sh

如果您需要更多信息，请告诉我。

Looping using Hiveql