在impala/hive中,如何提取字符串中特定关键字前后的单词?
In impala/hive, How can I extract the word before and after a specific keyword in a string?
我在 impala 中有一个名为 text
的字符串列,其中包含描述。我想获取特定关键字前后的单词。
示例:
text
= 这是一个很棒的 属性 就在海滩前面。 50m2的公寓被分成了一间卧室....
keyword
=m2
想要的结果:两列,word before
= 50 和 word after
= apartment
有什么想法吗?
可以用regexp_extract
匹配m2
前后的词,分别提取。
with t as ( select "This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom" as text)
select
regexp_extract(t.text , "(\w+)\s+m2", 1) as word_before,
regexp_extract(t.text , "m2\s+(\w+)", 1) as word_after
from t ;
+--------------+-------------+--+
| word_before | word_after |
+--------------+-------------+--+
| 50 | apartment |
+--------------+-------------+--+
我在 impala 中有一个名为 text
的字符串列,其中包含描述。我想获取特定关键字前后的单词。
示例:
text
= 这是一个很棒的 属性 就在海滩前面。 50m2的公寓被分成了一间卧室....keyword
=m2
想要的结果:两列,word before
= 50 和 word after
= apartment
有什么想法吗?
可以用regexp_extract
匹配m2
前后的词,分别提取。
with t as ( select "This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom" as text)
select
regexp_extract(t.text , "(\w+)\s+m2", 1) as word_before,
regexp_extract(t.text , "m2\s+(\w+)", 1) as word_after
from t ;
+--------------+-------------+--+
| word_before | word_after |
+--------------+-------------+--+
| 50 | apartment |
+--------------+-------------+--+