使用 Python 在原始文本文件中提取正负浮点值

Question

所以我正尝试从这样的原始文本文件中提取某些值

Number of zero columns: 4
Memory requirement - global matrix: 1571340 solver (totally): 1571340
P1127_VELOCITIES #001000  Step:    59  Iteration:     2  Time:    0.04055  0.0015347 
P2243_VELOCITIES #001000  Step:    59  Iteration:     2  Time:    0.04055  0.0017193 
P3387_VELOCITIES #001000  Step:    59  Iteration:     2  Time:    0.04055  0.0015347 
% of load in interval  Step:    59  Iteration:     2  Time:    0.04055  0.0400000  0.0400000 
summation % of load in interval  Step:    59  Iteration:     2  Time:    0.04055  0.0800000 

Number of zero columns: 4
Memory requirement - global matrix: 1571340 solver (totally): 1571340
P1127_VELOCITIES #001000  Step:    59  Iteration:     2  Time:    0.01638 -0.0016876 
P2243_VELOCITIES #001000  Step:    59  Iteration:     2  Time:    0.01638 -0.0018896 
P3387_VELOCITIES #001000  Step:    59  Iteration:     2  Time:    0.01638 -0.0016876 
% of load in interval  Step:    59  Iteration:     2  Time:    0.01638  0.0400000  0.0400000 
summation % of load in interval  Step:    59  Iteration:     2  Time:    0.01638  0.0800000

所以我想使用以下代码提取 P1127_VELOCITIES：

P1127_positive = re.compile(r'P1127_VELOCITIES #001000  Step:    (\d+)  Iteration:     (\d+)  Time:    (\d+\.\d+)  (\d*\.\d+|-\d*\.\d+)')

P1127_negative = re.compile(r'P1125_VELOCITIES #001000  Step:    (\d+)  Iteration:     (\d+)  Time:    (\d+\.\d+) (\d*\.\d+|-\d*\.\d+)')


def Extract_Data(filepath, expression_positive, expression_negative, data):

    velocity_list = []
    time_list = []
    #negative_data = []

    with open(filepath) as file:
        for line in file:
            data.extend(expression_positive.findall(line))

    with open(filepath) as file:
        for line in file:
            data.extend(expression_negative.findall(line))
    print(data[0])
    print(data[1])
    for data_tuple in data:
        step, iteration, time, velocity = data_tuple
        velocity_list.append(float(velocity))
        time_list.append(float(time))



    return velocity_list, time_list

但是，我想提取右端的所有浮点值，而不是分别提取正负值。正如您在文本文件中看到的，正值有 2 spaces（即 Time: 0.04055[space][space]0.0015347 而负值只有 1 space（即 Time: 0.01638[space]-0.0016876）

有没有办法使用 re.compile 提取这两个值？（就像我上面的一样，但同时提取两者）。你会推荐什么表达方式？（即 ([-+]?\d\.\d+)）

干杯！

Answer 1

你现在在做什么：

您使用以下方式匹配正值：

(\d+\.\d+) (\d*\.\d+|-\d*\.\d+)')（组间2 space秒）

您使用以下方式匹配负值：

(\d+\.\d+) (\d*\.\d+|-\d*\.\d+)')（组间一space）

你可以做的是使用 [space]{1,2} 来匹配 1 或 2 spaces。

像这样：

(\d+\.\d+) {1,2}(\d*\.\d+|-\d*\.\d+)

您可以在此处进行现场测试：https://regex101.com/r/Cz1YJ2/1

Answer 2

所提供代码中的正则表达式对于您所提供的文件来说似乎有些矫枉过正。我看不出有什么理由让他们如此死板，以至于改变一个角色需要一种新的模式。文件中似乎没有足够的细微变化，无法如此具体地说明一行中的空格数和格式。

此代码段在您共享的文件上干净利落地完成工作（我使用 append 而不是 extend，以便保留每一行的时间对）。添加更多要求以根据需要更具体地匹配行很简单（例如，如果您希望指定步骤或迭代）。如果您想将其放入函数中并使用它按不同的速度值进行过滤，您还可以动态构建正则表达式模式。

import re

pattern = r"P1127_VELOCITIES.+?Time:\s*(\S+)\s+(\S+)\s*$" 
data = []

with open("file.txt") as f:
    for line in f:
        m = re.match(pattern, line)

        if m: 
            data.append(tuple(map(float, m.groups())))

print(data)

输出：

[(0.04055, 0.0015347), (0.01638, -0.0016876)]

使用 Python 在原始文本文件中提取正负浮点值

Extract Positive and Negative float values in raw text file using Python

regex

text-extraction

dataframe

data-extraction

python-3.x