如何使用 Python 删除数据文件中的损坏事件?
How should corrupt events in a data file be removed using Python?
我有一个数据文件,其中包含以下形式的事件条目:
<event>
4 0 0.5048900E-01 0.1915537E+03 0.7546771E-02 0.1157067E+00
21 -1 0 0 503 502 0.00000000000E+00 0.00000000000E+00 0.20916118194E+03 0.20916118194E+03 0.00000000000E+00 0. 1.
21 -1 0 0 501 503 0.00000000000E+00 0.00000000000E+00 -0.19069665391E+03 0.19069665391E+03 0.00000000000E+00 0. 1.
6 1 1 2 501 0 0.64272189331E+02 0.51311781060E+02 -0.47339360468E+02 0.19731656861E+03 0.17300000000E+03 0. -1.
-6 1 1 2 0 502 -0.64272189331E+02 -0.51311781060E+02 0.65803888495E+02 0.20254126725E+03 0.17300000000E+03 0. -1.
</event>
<event>
4 0 0.5048900E-01 0.1923878E+03 0.7546771E-02 0.1156325E+00
21 -1 0 0 503 502 0.00000000000E+00 0.00000000000E+00 0.24573125562E+02 0.24573125562E+02 0.00000000000E+00 0. 1.
21 -1 0 0 501 503 0.00000000000E+00 0.00000000000E+00 -0.15553273337E+04 0.15553273337E+04 0.00000000000E+00 0. -1.
6 1 1 2 501 0 0.98476452980E+01 0.83588711195E+02 -0.62504106700E+03 0.65397965120E+03 0.17300000000E+03 0. 1.
-6 1 1 2 0 502 -0.98476452980E+01 -0.83588711195E+02 -0.90571314110E+03 0.92592080802E+03 0.17300000000E+03 0. -1.
</event>
<event>
4 0 0.5048900E-01 0.1782060E+03 0.7546771E-02 0.1169551E+00
21 -1 0 0 501 502 0.00000000000E+00 0.00000000000E+00 0.17068413103E+02 0.17068413103E+02 0.00000000000E+00 0. 1.
21 -1 0 0 502 503 0.00000000000E+00 0.00000000000E+00 -0.19878188087E+04 0.19878188087E+04 0.00000000000E+00 0. 1.
6 1 1 2 501 0 0.40928013982E+02 -0.12380831554E+02 -0.73177042255E+03 0.75315691502E+03 0.17300000000E+03 0. 1.
-6 1 1 2 0 503 -0.40928013982E+02 0.12380831554E+02 -0.12389799731E+04 0.12517303068E+04 0.17300000000E+03 0. 1.
</event>
<event>
4 0 0.5048900E-01 0.1748201E+03 0.7546771E-02 0.1172912E+00
21 -1 0 0 501 502 0.00000000000E+00 0.00000000000E+00 0.50201908406E+02 0.50201908406E+02 0.00000000000E+00 0. -1.
21 -1 0 0 502 503 0.00000000000E+00 0.00000000000E+00 -0.81442244278E+03 0.81442244278E+03 0.00000000000E+00 0. -1.
6 1 1 2 501 0 -0.76531495601E+01 -0.23968586903E+02 -0.16487721432E+03 0.24030513864E+03 0.17300000000E+03 0. -1.
-6 1 1 2 0 503 0.76531495601E+01 0.23968586903E+02 -0.59934332005E+03 0.62431921254E+03 0.17300000000E+03 0. -1.
</event>
<event>
4 0 0.5048900E-01 0.2161793E+03 0.7546771E-02 0.1136764E+00
21 -1 0 0 501 502 0.00000000000E+00 0.00000000000E+00 0.44614769518E+03 0.44614769518E+03 0.00000000000E+00 0. -1.
21 -1 0 0 502 503 0.00000000000E+00 0.00000000000E+00 -0.11252245546E+03 0.11252245546E+03 0.00000000000E+00 0. 1.
6 1 1 2 501 0 0.12142710736E+03 -0.45386865351E+02 0.24023253309E+03 0.32317979501E+03 0.17300000000E+03 0. -1.
-6 1 1 2 0 503 -0.12142710736E+03 0.45386865351E+02 0.93392706626E+02 0.23549035564E+03 0.17300000000E+03 0. 1.
</event>
我要生成这样的东西,生成过程不会没有错误;一些事件条目将被畸形和损坏。如何使用 Python?
检测并删除不属于上述形式的事件条目
How could I detect and remove event entries that are not of the forms shown above?
您的活动格式规范是什么?
我猜到了您输入数据的一些要求,并提出了一个不是很复杂但非常混乱的正则表达式:
import re
rx = re.compile(r'<event>$'
r'(?P<body>\s*\d\s+\d'
r'(\s+(\+|-)?\d+\.\d+(e|E)(\+|-)\d+){4}$'
r'((\s+[-\d]+){6}(\s+(\+|-)?\d+\.\d+(e|E)(\+|-)\d+){5}'
r'(\s+[-\d.]+){2}$)+)', re.M)
for match in rx.finditer(your_input_data):
print(match.group('body'))
查看 here 以获取正则表达式的交互式解释。您很可能需要进行大量微调,但这可能只是一个开始。
我有一个数据文件,其中包含以下形式的事件条目:
<event>
4 0 0.5048900E-01 0.1915537E+03 0.7546771E-02 0.1157067E+00
21 -1 0 0 503 502 0.00000000000E+00 0.00000000000E+00 0.20916118194E+03 0.20916118194E+03 0.00000000000E+00 0. 1.
21 -1 0 0 501 503 0.00000000000E+00 0.00000000000E+00 -0.19069665391E+03 0.19069665391E+03 0.00000000000E+00 0. 1.
6 1 1 2 501 0 0.64272189331E+02 0.51311781060E+02 -0.47339360468E+02 0.19731656861E+03 0.17300000000E+03 0. -1.
-6 1 1 2 0 502 -0.64272189331E+02 -0.51311781060E+02 0.65803888495E+02 0.20254126725E+03 0.17300000000E+03 0. -1.
</event>
<event>
4 0 0.5048900E-01 0.1923878E+03 0.7546771E-02 0.1156325E+00
21 -1 0 0 503 502 0.00000000000E+00 0.00000000000E+00 0.24573125562E+02 0.24573125562E+02 0.00000000000E+00 0. 1.
21 -1 0 0 501 503 0.00000000000E+00 0.00000000000E+00 -0.15553273337E+04 0.15553273337E+04 0.00000000000E+00 0. -1.
6 1 1 2 501 0 0.98476452980E+01 0.83588711195E+02 -0.62504106700E+03 0.65397965120E+03 0.17300000000E+03 0. 1.
-6 1 1 2 0 502 -0.98476452980E+01 -0.83588711195E+02 -0.90571314110E+03 0.92592080802E+03 0.17300000000E+03 0. -1.
</event>
<event>
4 0 0.5048900E-01 0.1782060E+03 0.7546771E-02 0.1169551E+00
21 -1 0 0 501 502 0.00000000000E+00 0.00000000000E+00 0.17068413103E+02 0.17068413103E+02 0.00000000000E+00 0. 1.
21 -1 0 0 502 503 0.00000000000E+00 0.00000000000E+00 -0.19878188087E+04 0.19878188087E+04 0.00000000000E+00 0. 1.
6 1 1 2 501 0 0.40928013982E+02 -0.12380831554E+02 -0.73177042255E+03 0.75315691502E+03 0.17300000000E+03 0. 1.
-6 1 1 2 0 503 -0.40928013982E+02 0.12380831554E+02 -0.12389799731E+04 0.12517303068E+04 0.17300000000E+03 0. 1.
</event>
<event>
4 0 0.5048900E-01 0.1748201E+03 0.7546771E-02 0.1172912E+00
21 -1 0 0 501 502 0.00000000000E+00 0.00000000000E+00 0.50201908406E+02 0.50201908406E+02 0.00000000000E+00 0. -1.
21 -1 0 0 502 503 0.00000000000E+00 0.00000000000E+00 -0.81442244278E+03 0.81442244278E+03 0.00000000000E+00 0. -1.
6 1 1 2 501 0 -0.76531495601E+01 -0.23968586903E+02 -0.16487721432E+03 0.24030513864E+03 0.17300000000E+03 0. -1.
-6 1 1 2 0 503 0.76531495601E+01 0.23968586903E+02 -0.59934332005E+03 0.62431921254E+03 0.17300000000E+03 0. -1.
</event>
<event>
4 0 0.5048900E-01 0.2161793E+03 0.7546771E-02 0.1136764E+00
21 -1 0 0 501 502 0.00000000000E+00 0.00000000000E+00 0.44614769518E+03 0.44614769518E+03 0.00000000000E+00 0. -1.
21 -1 0 0 502 503 0.00000000000E+00 0.00000000000E+00 -0.11252245546E+03 0.11252245546E+03 0.00000000000E+00 0. 1.
6 1 1 2 501 0 0.12142710736E+03 -0.45386865351E+02 0.24023253309E+03 0.32317979501E+03 0.17300000000E+03 0. -1.
-6 1 1 2 0 503 -0.12142710736E+03 0.45386865351E+02 0.93392706626E+02 0.23549035564E+03 0.17300000000E+03 0. 1.
</event>
我要生成这样的东西,生成过程不会没有错误;一些事件条目将被畸形和损坏。如何使用 Python?
检测并删除不属于上述形式的事件条目How could I detect and remove event entries that are not of the forms shown above?
您的活动格式规范是什么?
我猜到了您输入数据的一些要求,并提出了一个不是很复杂但非常混乱的正则表达式:
import re
rx = re.compile(r'<event>$'
r'(?P<body>\s*\d\s+\d'
r'(\s+(\+|-)?\d+\.\d+(e|E)(\+|-)\d+){4}$'
r'((\s+[-\d]+){6}(\s+(\+|-)?\d+\.\d+(e|E)(\+|-)\d+){5}'
r'(\s+[-\d.]+){2}$)+)', re.M)
for match in rx.finditer(your_input_data):
print(match.group('body'))
查看 here 以获取正则表达式的交互式解释。您很可能需要进行大量微调,但这可能只是一个开始。