Python:将字节对象转换为字符串,删除 \'s,然后写入列表带回 \'s。斜杠

Python: Converting a byte object to string, removing \'s, then writing to list brings back \'s. slashes

我正在解析 html 当前的字节形式,方法是将其转换为字符串,然后将其写入列表。 我想删除所有正斜杠(或者甚至只是很好地处理转义字符)。

这是我的代码:

picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="whatever;"/>']

def get_alt_text(picture_divs):
    alt_text = []
    for i, elem in enumerate(picture_divs):
        str_elem = str(elem).replace('\', '')  # Convert bytes -> strings
        start_index = int(str_elem.find('alt='))
        end_index = int(str_elem.find('class='))
        alt_text.append(str_elem[start_index + 4:end_index])

    return alt_text


alt_text_return = get_alt_text(picture_divs)
print(alt_text_return)

输出: ['"Python\'s Confusing me."']

期望的输出: ['"Python's Confusing me."']

您要求的解决方案是 python 语法错误。 Python 创建格式列表

list_example = ['a','b']

如果您希望在列表中包含“Python's confusing' me”,那么您会看到打开的单引号是如何被您的单引号关闭的。所以 python 放置反斜杠是为了覆盖单引号而不是抛出错误。

这是一种可能的清理方法:

>>> from re import sub
>>> picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="wha
tever;"/>']
>>> for div in picture_divs:
...     rev1 = sub(r'[\/]', '', div.decode('utf-8'))
...     rev2 = rev1.replace('\'', "'")
...     print(rev2)
... 
<img alt="Python's Confusing me." class="" src="https:link_goes_here" style="whatever;">
>>>