我想把一个列表放在一个字符串上
I want to put a list on a string
然后我从一个文件中获取一个 png 文件名,然后使用正则表达式指定一个 4 位数的 png 文件名,删除标点符号并将其保存到另一个文件
让我难过的是试图将列表中的每个单独值都放在一个字符串中,例如:
<div class="parent"><img class="img" title="" src="images/char/{HERE}.png" ></div>
然后保存到文件中为:
<div class="parent"><img class="img" title="" src="images/char/1432.png" ></div>
<div class="parent"><img class="img" title="" src="images/char/1250.png" ></div>
<div class="parent"><img class="img" title="" src="images/char/1324.png" ></div>
这是代码
import re
import pyperclip
def remove_punc(string):
punc = '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
for ele in string:
if ele in punc:
string = string.replace(ele, "")
return string
text_file = open(r'C:\My Web Sites\image_data(1).txt', 'r')
s = text_file.read()
text_file.close()
string_pattern = r"\d{4}\."
regex_pattern = re.compile(string_pattern)
# find all the matches in string one
result = regex_pattern.findall(s)
result = [remove_punc(i) for i in result]
with open(r'C:\My Web Sites.txt', 'w') as fp:
for item in result:
# write each item on a new line
fp.write("%s\n" % item)
fp.close()
编辑
这是文本文件的示例
<div class="cell-imgs"><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1535.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="0" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/60<br/>Level: 0/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='red'><br/>(version)</font>"><img src="resources/images/elements/3.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1510.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="1" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#F96700'><br/>(version)</font>"><img src="resources/images/elements/5.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1403.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="2" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#071BA0'><br/>(version)</font>"><img src="resources/images/elements/4.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1388.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="3" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#F96700'><br/>(version)</font>"><img src="resources/images/elements/5.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/6.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1323.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="4" src="resources/images/frames/6.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 6★<br/>Level: 200/200<br/>Level: 4/4<br/>Level: 1/5<br/>: 150%<br/>1: 0/10<br/>2: 0/10<br/>3: 0/10<br/>" title="<font color='red'><br/>(version)</font>"><img src="resources/images/elements/3.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1322.png"
输出
1535
1510
1403
1388
1323
1322
要创建您的文件,您可以使用 str.format
。例如:
s = """<div class="parent"><img class="img" title="" src="images/char/{}.png"></div>"""
result = [1432, 1250, 1324] # <-- your result with removed punctuations
with open("data.txt", "w") as fp:
for item in result:
print(s.format(item), file=fp)
创建 data.txt
内容:
<div class="parent"><img class="img" title="" src="images/char/1432.png"></div>
<div class="parent"><img class="img" title="" src="images/char/1250.png"></div>
<div class="parent"><img class="img" title="" src="images/char/1324.png"></div>
提供了有关作者的更多信息
这个模式应该可以解决问题 (\d{4})\.(?=png)
其中
- 准确捕获数字 4 次
- 并以 .png 结尾
如果您想添加对 jpeg 的支持,您可以将模式更改为 (\d{4})\.(?=png|jpeg)
为了在线测试,我编写了这个代码,但它应该可以加载文件然后使用 findall。剩下的工作就是你的了。
import re
string = "<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1432.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1250.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1324.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1324.jpeg\" ></div>"
pattern = re.compile(r'(\d{4})\.(?=png)')
print(pattern.findall(string))
输出在哪里
['1432', '1250', '1324']
然后我从一个文件中获取一个 png 文件名,然后使用正则表达式指定一个 4 位数的 png 文件名,删除标点符号并将其保存到另一个文件
让我难过的是试图将列表中的每个单独值都放在一个字符串中,例如:
<div class="parent"><img class="img" title="" src="images/char/{HERE}.png" ></div>
然后保存到文件中为:
<div class="parent"><img class="img" title="" src="images/char/1432.png" ></div>
<div class="parent"><img class="img" title="" src="images/char/1250.png" ></div>
<div class="parent"><img class="img" title="" src="images/char/1324.png" ></div>
这是代码
import re
import pyperclip
def remove_punc(string):
punc = '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
for ele in string:
if ele in punc:
string = string.replace(ele, "")
return string
text_file = open(r'C:\My Web Sites\image_data(1).txt', 'r')
s = text_file.read()
text_file.close()
string_pattern = r"\d{4}\."
regex_pattern = re.compile(string_pattern)
# find all the matches in string one
result = regex_pattern.findall(s)
result = [remove_punc(i) for i in result]
with open(r'C:\My Web Sites.txt', 'w') as fp:
for item in result:
# write each item on a new line
fp.write("%s\n" % item)
fp.close()
编辑
这是文本文件的示例
<div class="cell-imgs"><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1535.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="0" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/60<br/>Level: 0/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='red'><br/>(version)</font>"><img src="resources/images/elements/3.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1510.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="1" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#F96700'><br/>(version)</font>"><img src="resources/images/elements/5.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1403.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="2" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#071BA0'><br/>(version)</font>"><img src="resources/images/elements/4.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1388.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="3" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#F96700'><br/>(version)</font>"><img src="resources/images/elements/5.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/6.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1323.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="4" src="resources/images/frames/6.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 6★<br/>Level: 200/200<br/>Level: 4/4<br/>Level: 1/5<br/>: 150%<br/>1: 0/10<br/>2: 0/10<br/>3: 0/10<br/>" title="<font color='red'><br/>(version)</font>"><img src="resources/images/elements/3.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1322.png"
输出
1535
1510
1403
1388
1323
1322
要创建您的文件,您可以使用 str.format
。例如:
s = """<div class="parent"><img class="img" title="" src="images/char/{}.png"></div>"""
result = [1432, 1250, 1324] # <-- your result with removed punctuations
with open("data.txt", "w") as fp:
for item in result:
print(s.format(item), file=fp)
创建 data.txt
内容:
<div class="parent"><img class="img" title="" src="images/char/1432.png"></div>
<div class="parent"><img class="img" title="" src="images/char/1250.png"></div>
<div class="parent"><img class="img" title="" src="images/char/1324.png"></div>
提供了有关作者的更多信息
这个模式应该可以解决问题 (\d{4})\.(?=png)
其中
- 准确捕获数字 4 次
- 并以 .png 结尾
如果您想添加对 jpeg 的支持,您可以将模式更改为 (\d{4})\.(?=png|jpeg)
为了在线测试,我编写了这个代码,但它应该可以加载文件然后使用 findall。剩下的工作就是你的了。
import re
string = "<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1432.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1250.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1324.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1324.jpeg\" ></div>"
pattern = re.compile(r'(\d{4})\.(?=png)')
print(pattern.findall(string))
输出在哪里
['1432', '1250', '1324']