Python:将字节对象转换为字符串,删除 \'s,然后写入列表带回 \'s。斜杠
Python: Converting a byte object to string, removing \'s, then writing to list brings back \'s. slashes
我正在解析 html 当前的字节形式,方法是将其转换为字符串,然后将其写入列表。
我想删除所有正斜杠(或者甚至只是很好地处理转义字符)。
这是我的代码:
picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="whatever;"/>']
def get_alt_text(picture_divs):
alt_text = []
for i, elem in enumerate(picture_divs):
str_elem = str(elem).replace('\', '') # Convert bytes -> strings
start_index = int(str_elem.find('alt='))
end_index = int(str_elem.find('class='))
alt_text.append(str_elem[start_index + 4:end_index])
return alt_text
alt_text_return = get_alt_text(picture_divs)
print(alt_text_return)
输出:
['"Python\'s Confusing me."']
期望的输出:
['"Python's Confusing me."']
您要求的解决方案是 python 语法错误。 Python 创建格式列表
list_example = ['a','b']
如果您希望在列表中包含“Python's confusing' me”,那么您会看到打开的单引号是如何被您的单引号关闭的。所以 python 放置反斜杠是为了覆盖单引号而不是抛出错误。
这是一种可能的清理方法:
>>> from re import sub
>>> picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="wha
tever;"/>']
>>> for div in picture_divs:
... rev1 = sub(r'[\/]', '', div.decode('utf-8'))
... rev2 = rev1.replace('\'', "'")
... print(rev2)
...
<img alt="Python's Confusing me." class="" src="https:link_goes_here" style="whatever;">
>>>
我正在解析 html 当前的字节形式,方法是将其转换为字符串,然后将其写入列表。 我想删除所有正斜杠(或者甚至只是很好地处理转义字符)。
这是我的代码:
picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="whatever;"/>']
def get_alt_text(picture_divs):
alt_text = []
for i, elem in enumerate(picture_divs):
str_elem = str(elem).replace('\', '') # Convert bytes -> strings
start_index = int(str_elem.find('alt='))
end_index = int(str_elem.find('class='))
alt_text.append(str_elem[start_index + 4:end_index])
return alt_text
alt_text_return = get_alt_text(picture_divs)
print(alt_text_return)
输出: ['"Python\'s Confusing me."']
期望的输出: ['"Python's Confusing me."']
您要求的解决方案是 python 语法错误。 Python 创建格式列表
list_example = ['a','b']
如果您希望在列表中包含“Python's confusing' me”,那么您会看到打开的单引号是如何被您的单引号关闭的。所以 python 放置反斜杠是为了覆盖单引号而不是抛出错误。
这是一种可能的清理方法:
>>> from re import sub
>>> picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="wha
tever;"/>']
>>> for div in picture_divs:
... rev1 = sub(r'[\/]', '', div.decode('utf-8'))
... rev2 = rev1.replace('\'', "'")
... print(rev2)
...
<img alt="Python's Confusing me." class="" src="https:link_goes_here" style="whatever;">
>>>