在impala/hive中，如何提取字符串中特定关键字前后的单词？

Question

我在 impala 中有一个名为 text 的字符串列，其中包含描述。我想获取特定关键字前后的单词。

示例：

text= 这是一个很棒的属性就在海滩前面。 50m2的公寓被分成了一间卧室....
keyword=m2

想要的结果：两列，word before = 50 和 word after= apartment

有什么想法吗？

Answer 1

可以用regexp_extract匹配m2前后的词，分别提取。

with t as ( select "This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom" as text)
select 
    regexp_extract(t.text , "(\w+)\s+m2", 1) as word_before,
    regexp_extract(t.text , "m2\s+(\w+)", 1) as word_after
from t ;

+--------------+-------------+--+
| word_before  | word_after  |
+--------------+-------------+--+
| 50           | apartment   |
+--------------+-------------+--+

在impala/hive中，如何提取字符串中特定关键字前后的单词？

In impala/hive, How can I extract the word before and after a specific keyword in a string?

string

hive

impala