正则表达式:从数组单元格中提取多个 URL 字符串

Regex: extract multiple URL strings from a cell of arrays

什么是用于匹配 在第一个逗号处停止 的 URL 字符串的干净正则表达式模式?尝试从 Google 表格中的数组中提取值。

单元格 A1

{https://www.myshop.com/shop/the_first_shop,Marcus. White's. Shop.,ACTIVE,US};{https://www.myshop.com/shop/a-second-shop,The first! Shop,CLOSED,UK};{EMPTY,ClosedShop,CLOSED,IN}

所需输出(单元格 B1)

https://www.myshop.com/shop/the_first_shop,https://www.myshop.com/shop/a-second-shop

我已经想出如何在我想要的输出单元格中获得一个干净的匹配值数组:

=trim(regexreplace(regexreplace(regexreplace(REGEXREPLACE(A2,"/(https?:\/\/[^ ]*)/"," "),";"," "),"}"," "),"{"," "))

但我找不到以逗号结尾的正则表达式模式。例如,这个解决方案:

"/(https?:\/\/[^ ]*)/" 

匹配第一个 URL,但返回给我:

https://www.myshop.com/shop/the_first_shop,Marcus. White's. Shop.,ACTIVE,US https://www.myshop.com/shop/a-second-shop,The first! Shop,CLOSED,UK EMPTY,ClosedShop,CLOSED,IN

regex pattern that stops at a comma

=REGEXEXTRACT(A1, "(https?:\/\/[^,]*)")

我会选择 REGEXREPLACE 并使用:

=REGEXREPLACE(A1,".*?(?:(https.*?)|$)","")

只需一个尾随逗号来处理...

=REGEXREPLACE(REGEXREPLACE(A1,".*?(?:(https.*?(,))|$)",""),",$","")

REGEXREPLACE 更长的替代方案可能是:

=TEXTJOIN(",",,QUERY(TRANSPOSE(SPLIT(SUBSTITUTE(SUBSTITUTE(A1,"{","}"),"}",","),",")),"Select Col1 where Col1 like 'http%'"))