如何在 Hive 中对整行进行 md5?
How to md5 an entire row in Hive?
使用 Hive,我想散列查询中的整行。
我尝试了以下方法(不要介意 ${xxx},查询是从 bash 脚本构建的):
SELECT md5(*) FROM ${DATABASE_NAME_SUFFIXE}.${DATABASE_PREFIXE}_${TABLE_NAME} WHERE
${TABLE_DATE_FIELD} <= '${LIMIT_DATE}' ORDER BY ${CREATION_DATE_FIELD} DESC LIMIT 1
此returns以下错误:
Line 1:7 Wrong arguments 'md5': No matching method for class
org.apache.hadoop.hive.ql.udf.UDFMd5 with (bigint, int, varchar(128),
timestamp, timestamp, varchar(64), varchar(64), varchar(64), int,
bigint, int, varchar(50), varchar(255), bigint, timestamp, timestamp,
varchar(64), bigint, timestamp, timestamp, varchar(64), int, int,
char(38), varchar(40), varchar(1)). Possible choices: FUNC(binary)
FUNC(string)
如果我从错误和md5函数的文档中理解正确,我需要传递二进制或字符串。我怎样才能做到这一点?
编辑:也试过:
SELECT md5(SELECT * FROM ${DATABASE_NAME_SUFFIXE}.${DATABASE_PREFIXE}_${TABLE_NAME} WHERE ${TABLE_DATE_FIELD} <= '${LIMIT_DATE}' ORDER BY ${CREATION_DATE_FIELD} DESC LIMIT 1)
其中 returns
cannot recognize input near 'SELECT' '*' 'FROM' in function
specification
concat
所有列,然后在串联列上有 md5()
。
select md5(concat(a,b)) as md5 from (select string("abc")a,int("2")b)e;
+--------------------------------+
|md5 |
+--------------------------------+
|63872b5565b2179bd72ea9c339192543|
+--------------------------------+
我们也可以定义所有的列名然后在concat
函数中使用!
Try with concat(*):
select md5(concat(*)) as md5 from (select string("abc")a,int("2")b)e;
+--------------------------------+
|md5 |
+--------------------------------+
|63872b5565b2179bd72ea9c339192543|
+--------------------------------+
使用 Hive,我想散列查询中的整行。
我尝试了以下方法(不要介意 ${xxx},查询是从 bash 脚本构建的):
SELECT md5(*) FROM ${DATABASE_NAME_SUFFIXE}.${DATABASE_PREFIXE}_${TABLE_NAME} WHERE
${TABLE_DATE_FIELD} <= '${LIMIT_DATE}' ORDER BY ${CREATION_DATE_FIELD} DESC LIMIT 1
此returns以下错误:
Line 1:7 Wrong arguments 'md5': No matching method for class org.apache.hadoop.hive.ql.udf.UDFMd5 with (bigint, int, varchar(128), timestamp, timestamp, varchar(64), varchar(64), varchar(64), int, bigint, int, varchar(50), varchar(255), bigint, timestamp, timestamp, varchar(64), bigint, timestamp, timestamp, varchar(64), int, int, char(38), varchar(40), varchar(1)). Possible choices: FUNC(binary) FUNC(string)
如果我从错误和md5函数的文档中理解正确,我需要传递二进制或字符串。我怎样才能做到这一点?
编辑:也试过:
SELECT md5(SELECT * FROM ${DATABASE_NAME_SUFFIXE}.${DATABASE_PREFIXE}_${TABLE_NAME} WHERE ${TABLE_DATE_FIELD} <= '${LIMIT_DATE}' ORDER BY ${CREATION_DATE_FIELD} DESC LIMIT 1)
其中 returns
cannot recognize input near 'SELECT' '*' 'FROM' in function specification
concat
所有列,然后在串联列上有 md5()
。
select md5(concat(a,b)) as md5 from (select string("abc")a,int("2")b)e;
+--------------------------------+
|md5 |
+--------------------------------+
|63872b5565b2179bd72ea9c339192543|
+--------------------------------+
我们也可以定义concat
函数中使用!
Try with concat(*):
select md5(concat(*)) as md5 from (select string("abc")a,int("2")b)e;
+--------------------------------+
|md5 |
+--------------------------------+
|63872b5565b2179bd72ea9c339192543|
+--------------------------------+