使用正则表达式多次匹配模式

Question

我正在尝试匹配字符串中同一模式的多次出现。不幸的是，仅在第一个匹配项中使用 ustrregexs 和 ustrregexm returns。此外，我不知道会有多少匹配项，因此硬编码 n 匹配项不是一个选项。有没有办法在 Stata 中找到所有匹配项？

示例：

clear all

input x str250 y
1 "123 12"
2 "345 678"
3 "000 000 000"
4 "111"
5 "00"
6 "000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000000 000 000"
end

* Returns only the first match
gen match = ustrregexs(0) if ustrregexm(y, "(\d{3})+")

Answer 1

来自 SSC 的

moss 正是致力于解决这个问题。如果“本地”排除了社区贡献的命令，那么您需要编写自己的代码。

clear all

input x str20 y
1 "123 12"
2 "345 678"
3 "000 000 000"
4 "111"
5 "00"
end

moss y, match("([0-9][0-9][0-9])") regex 

list 

     +--------------------------------------------------------------------------------+
     | x             y   _count   _match1   _pos1   _match2   _pos2   _match3   _pos3 |
     |--------------------------------------------------------------------------------|
  1. | 1        123 12        1       123       1                 .                 . |
  2. | 2       345 678        2       345       1       678       5                 . |
  3. | 3   000 000 000        3       000       1       000       5       000       9 |
  4. | 4           111        1       111       1                 .                 . |
  5. | 5            00        0                 .                 .                 . |
     +--------------------------------------------------------------------------------+

使用正则表达式多次匹配模式

Match a pattern multiple times using regex

regex

stata