如何用另一个索引字符串替换字符串的所有实例 Python

How to replace all instance of a String with another indexed string Python

这个问题有点难表达,我的英语不够好,但我会尽力的。

我有一个包含 xml 个文件的目录,每个文件包含 xml 例如:

<root>
    <fields>
        <field>
            <description/>
            <region id="Number.T2S366_R_487" page="1"/>
        </field>
        <field>
            <description/>
            <region id="Number.T2S366_R_488.`0" page="1"/>
            <region id="String.T2S366_R_488.`1" page="1"/>
        </field>
    </fields>
</root>

我想在包含 dot, tick, number 符号(例如 .`0)的行上使用索引符号(例如 [0]、[1]、[2]、.. .等等。

所以转换后的 xml 有效负载应该如下所示:

<root>
    <fields>
        <field>
            <description/>
            <region id="Number.T2S366_R_487" page="1"/>
        </field>
        <field>
            <description/>
            <region id="Number.T2S366_R_488[0]" page="1"/>
            <region id="String.T2S366_R_488[1]" page="1"/>
        </field>
    </fields>
</root>

如何使用 python 完成此操作?使用正则表达式这似乎相当简单,但对于包含多个文件的文件目录很难做到。我希望看到使用 python 3.x 的实现,因为我正在学习它。

您可以使用简单的正则表达式来做到这一点:

import re
sample_str = """
<root>
    <fields>
        <field id="S366/487" type="xs:int" bind="T2S366/487">
            <description/>
            <region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
        </field>
        <field id="S366/488" type="xs:int" bind="T2S366/488">
            <description/>
            <region id="Number.T2S366_R_488.`0" page="1"/>
            <region id="String.T2S366_R_488.`1" page="1"/>
        </field>
    </fields>
</root>
"""
pattern = "\.`(\d+)"
result = re.sub(pattern, lambda x: "[{}]".format(x.groups()[0]), sample_str)
print result

产量

<root>
    <fields>
        <field id="S366/487" type="xs:int" bind="T2S366/487">
            <description/>
            <region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
        </field>
        <field id="S366/488" type="xs:int" bind="T2S366/488">
            <description/>
            <region id="Number.T2S366_R_488[0]" page="1"/>
            <region id="String.T2S366_R_488[1]" page="1"/>
        </field>
    </fields>
</root>

在 Python 中,您可以使用 os.listdir and make substitutions in-place with fileinput:

遍历目录中的所有文件
import os
import fileinput

path = '/home/arabian_albert/'
for f in os.listdir(path):
    with fileinput.FileInput(f, inplace=True, backup='.bak') as file:
        for line in file:
            print(re.sub(r'\.`(\d+)', r'\[\]', line), end='')

但是,您应该考虑使用 sed 从命令行执行此操作:

find . -type f -exec sed -i.bak -E "s/\.`([0-9]+)/[]/g" {} \;

以上将替换当前目录中的所有文件,并用 .bak.

的旧文件进行备份

这个怎么样:

wholefile = ''

with open(r'xml_input.xml', 'r+') as f:
    lines = f.readlines()
    for line in lines:
        split_line = line.split('.')  # split at periods
        end_point = split_line.pop(-1)  # get and remove existing endpoint
        if end_point[0] == '`':  # if it matches tick notation
            idx_after_num = end_point.find('"')  # get the first index that matches a double quote
            the_int = end_point[1:idx_after_num]  # slice from after the tick to the end of the int
            end_point = list(end_point)  # convert to list
            del(end_point[:idx_after_num])  # delete up to the double quote
            end_point = ''.join(end_point)  # reconstruct string
            new_endpoint = '[{}]'.format(the_int) + end_point  # create new endpoint
            split_line += [new_endpoint]  # append new endpoint to end of list of split strs
            new_line = ''  # new empty string
            for n, segment in enumerate(split_line):
                if n >= len(split_line) - 2:  # if we're at or beyond the endpoint
                    new_line += segment  # concatenate the new endpoint
                else:
                    new_line += segment + '.'  # concatenate, replacing the needed '.'s
            wholefile += new_line  # replace, with changes
        else:
            wholefile += line  # replace, with no changes

with open('xml_out.xml', 'w+') as f:
    f.write(wholefile)

我的输出:

<root>
    <fields>
        <field id="S366/487" type="xs:int" bind="T2S366/487">
            <description/>
            <region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
        </field>
        <field id="S366/488" type="xs:int" bind="T2S366/488">
            <description/>
            <region id="Number.T2S366_R_488[0]" page="1"/>
            <region id="String.T2S366_R_488[1]" page="1"/>
        </field>
    </fields>
</root>