Python 证券正则表达式

Question

我有一个包含证券名称、金额和投资组合百分比的文本文件。我想弄清楚如何使用正则表达式来区分公司。我有一个原始的解决方案，允许我 .split('%') 然后创建我需要的 3 个变量，但我发现一些证券的名称中包含 %，因此该解决方案不合适。

字符串示例：

Pinterest, Inc. Series F, 8.00%,808,9320.022%ResMed,Inc.,495,3260.021%Eaton Corp. PLC,087,8430.047%

当前正则表达式

[a-zA-Z0-9,$.\s]+[.0-9%]$

我当前的正则表达式只能找到最后一家公司。例如，Eaton Corp. PLC,087,8430.047%

关于如何找到公司的每个实例的任何帮助？

需要解决方案

["Pinterest, Inc. Series F, 8.00%,808,9320.022%","ResMed,Inc.,495,3260.021%","Eaton Corp. PLC,087,8430.047%"]

Answer 1

在Python 3:

import re
p = re.compile(r'[^$]+$[^%]+%')
p.findall('Pinterest, Inc. Series F, 8.00%,808,9320.022%ResMed,Inc.,495,3260.021%Eaton Corp. PLC,087,8430.047%')

结果：

['Pinterest, Inc. Series F, 8.00%,808,9320.022%', 'ResMed,Inc.,495,3260.021%', 'Eaton Corp. PLC,087,8430.047%']

您最初的问题是 $ 锚点使正则表达式仅匹配行尾。然而，删除 $ 仍然将 Pinterest 在 8.00 之后的 % 处分成两个条目。

为了解决这个问题，正则表达式先查找 $，然后查找 %，然后通过 % 获取所有内容作为条目。该模式适用于您提供的示例，但是，当然，我不知道它是否适用于您的所有数据。

编辑正则表达式的工作方式如下：

r'               Use a raw string so you don't have to double the backslashes
  [^$]+          Look for anything up to the next $
       $        Match the $ itself ($ because $ alone means end-of-line)
         [^%]+   Now anything up to the next %
              %  And the % itself
               ' End of the string

Answer 2

Python 的工作解决方案，具有命名组：https://regex101.com/r/sqkFaN/2

(?P<item>(?P<name>.*?)$(?P<usd>[\d,\.]*?%))

在link我提供了你可以实时看到变化的效果，侧边栏提供了对所用语法的解释。

Python 证券正则表达式

Python Regex for Securities

python

regex

finance