在impala/hive中,如何提取字符串中特定关键字前后的单词?

In impala/hive, How can I extract the word before and after a specific keyword in a string?

我在 impala 中有一个名为 text 的字符串列,其中包含描述。我想获取特定关键字前后的单词。

示例:

想要的结果:两列,word before = 50word after= apartment

有什么想法吗?

可以用regexp_extract匹配m2前后的词,分别提取。

with t as ( select "This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom" as text)
select 
    regexp_extract(t.text , "(\w+)\s+m2", 1) as word_before,
    regexp_extract(t.text , "m2\s+(\w+)", 1) as word_after
from t ;

+--------------+-------------+--+
| word_before  | word_after  |
+--------------+-------------+--+
| 50           | apartment   |
+--------------+-------------+--+