我想把一个列表放在一个字符串上

I want to put a list on a string

然后我从一个文件中获取一个 png 文件名,然后使用正则表达式指定一个 4 位数的 png 文件名,删除标点符号并将其保存到另一个文件

让我难过的是试图将列表中的每个单独值都放在一个字符串中,例如:

<div class="parent"><img class="img" title="" src="images/char/{HERE}.png" ></div>

然后保存到文件中为:

<div class="parent"><img class="img" title="" src="images/char/1432.png" ></div>
<div class="parent"><img class="img" title="" src="images/char/1250.png" ></div>
<div class="parent"><img class="img" title="" src="images/char/1324.png" ></div>

这是代码

import re
import pyperclip

def remove_punc(string):
    punc = '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
    for ele in string:  
        if ele in punc:  
            string = string.replace(ele, "") 
    return string
 



text_file = open(r'C:\My Web Sites\‏‏image_data(1).txt', 'r') 
    
s = text_file.read()

text_file.close()
string_pattern = r"\d{4}\." 


regex_pattern = re.compile(string_pattern)


# find all the matches in string one
result = regex_pattern.findall(s)

result = [remove_punc(i) for i in result]


with open(r'C:\My Web Sites.txt', 'w') as fp:
    for item in result:
        # write each item on a new line
        fp.write("%s\n" % item)
        

fp.close()

编辑

这是文本文件的示例

<div class="cell-imgs"><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1535.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="0" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/60<br/>Level: 0/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='red'><br/>(version)</font>"><img src="resources/images/elements/3.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1510.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="1" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#F96700'><br/>(version)</font>"><img src="resources/images/elements/5.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1403.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="2" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#071BA0'><br/>(version)</font>"><img src="resources/images/elements/4.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1388.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="3" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#F96700'><br/>(version)</font>"><img src="resources/images/elements/5.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/6.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1323.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="4" src="resources/images/frames/6.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 6★<br/>Level: 200/200<br/>Level: 4/4<br/>Level: 1/5<br/>: 150%<br/>1: 0/10<br/>2: 0/10<br/>3: 0/10<br/>" title="<font color='red'><br/>(version)</font>"><img src="resources/images/elements/3.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1322.png"

输出

1535
1510
1403
1388
1323
1322

要创建您的文件,您可以使用 str.format。例如:

s = """<div class="parent"><img class="img" title="" src="images/char/{}.png"></div>"""

result = [1432, 1250, 1324]  # <-- your result with removed punctuations

with open("data.txt", "w") as fp:
    for item in result:
        print(s.format(item), file=fp)

创建 data.txt 内容:

<div class="parent"><img class="img" title="" src="images/char/1432.png"></div>
<div class="parent"><img class="img" title="" src="images/char/1250.png"></div>
<div class="parent"><img class="img" title="" src="images/char/1324.png"></div>

提供了有关作者的更多信息

这个模式应该可以解决问题 (\d{4})\.(?=png) 其中

  • 准确捕获数字 4 次
  • 并以 .png 结尾

如果您想添加对 jpeg 的支持,您可以将模式更改为 (\d{4})\.(?=png|jpeg)

为了在线测试,我编写了这个代码,但它应该可以加载文件然后使用 findall。剩下的工作就是你的了。

import re

string = "<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1432.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1250.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1324.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1324.jpeg\" ></div>"

pattern = re.compile(r'(\d{4})\.(?=png)')

print(pattern.findall(string))

输出在哪里

['1432', '1250', '1324']