正则表达式的位数比预期的多

Question

~~不要给我建议任何链接，我都看了一百万次。~~
我查看了很多建议 - 例如 Regex credit card number tests。但是，我主要不关心验证潜在的信用号码。

我想通过识别 12 到 19 个数字的序列（加上它们之间的一些公共分隔符）在文档中定位（潜在的）信用卡号。这正在讨论中，例如，@TimBiegeleisen 指出的 Finding or Verifying Credit Card Numbers。但是建议的解决方案会导致一些假阴性。（请参阅下面的 "Problems..." 部分。）

示例输入：

[ '232625427', 'please stop check 220 2000000 that was sent 6/10 reg mail and reissu fedex. Please charge to credit card 4610 0000 0000 0000 exp 05/99...thanks, Sxxx' ]
[ '232653042', 'MARKET PLACE: Exxxx or Bxxxx-Please set husband and wife up on monthly credit card payments. Name on the credit card is Hxxxx-Jxxxx Lxxxx (Maiden name, name on policy is different) Master card number 5424 0000 0000 0000 Exp 11-30-00. Thanks so much.' ]

我的 RegEx101.com attempt.

有更多示例输入

我的正则表达式是

[1-9](\d[ ]?[ ]*?[-]?[-]*?[:]*?[:]?){11,18}\b

我的 RegEx 有问题

12-19位数字紧跟字符串时不匹配。它失败了，例如 4554-4545-4545-4545Visa.
更长的运行数字序列在末尾而不是开头匹配：对于 999999999999994190000000000000 我确实得到 9994190000000000000 而不是 9999999999999941900

我正在 RegEx101.com 测试它。

Answer 1

要解决标题 "Reg Ex is getting more digits than expected" 中的问题（虽然 "digits" 读作 "characters"），请尝试：

[1-9]([- :]*\d){11,18}\b

这样，您就不再匹配样本输入中的尾随空白。在 RegEx101.com.

查看实际效果

更接近于您在 "Problems..." 下指出的应该是：

[1-9]([- :]*\d){11,18}

从末尾删除单词边界后，紧跟数字序列的字符串将不再导致漏报。并且比赛也不再偏向于潜在比赛的结束。然而，这与您的方法处理 001 111111111111 不同： RegEx101.com.

这可以用

来解释

[1-9][0-9]([- :]*\d){10,17}

以允许来自“5452 0000 0000 0000000”的更多零为代价：RegEx101.com。

仅根据您的示例输入检查所有建议。不同的输入可能需要进一步调整。

请评论，如果这需要调整/进一步的细节。

正则表达式的位数比预期的多

Reg Ex is getting more digits than expected

regex

regex-negation

regex-group

我的 RegEx 有问题