Python：将字节对象转换为字符串，删除 \'s，然后写入列表带回 \'s。斜杠

Question

我正在解析 html 当前的字节形式，方法是将其转换为字符串，然后将其写入列表。我想删除所有正斜杠（或者甚至只是很好地处理转义字符）。

这是我的代码：

picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="whatever;"/>']

def get_alt_text(picture_divs):
    alt_text = []
    for i, elem in enumerate(picture_divs):
        str_elem = str(elem).replace('\', '')  # Convert bytes -> strings
        start_index = int(str_elem.find('alt='))
        end_index = int(str_elem.find('class='))
        alt_text.append(str_elem[start_index + 4:end_index])

    return alt_text


alt_text_return = get_alt_text(picture_divs)
print(alt_text_return)

输出： ['"Python\'s Confusing me."']

期望的输出： ['"Python's Confusing me."']

Answer 1

您要求的解决方案是 python 语法错误。 Python 创建格式列表

list_example = ['a','b']

如果您希望在列表中包含“Python's confusing' me”，那么您会看到打开的单引号是如何被您的单引号关闭的。所以 python 放置反斜杠是为了覆盖单引号而不是抛出错误。

Answer 2

这是一种可能的清理方法：

>>> from re import sub
>>> picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="wha
tever;"/>']
>>> for div in picture_divs:
...     rev1 = sub(r'[\/]', '', div.decode('utf-8'))
...     rev2 = rev1.replace('\'', "'")
...     print(rev2)
... 
<img alt="Python's Confusing me." class="" src="https:link_goes_here" style="whatever;">
>>>

Python：将字节对象转换为字符串，删除 \'s，然后写入列表带回 \'s。斜杠

Python: Converting a byte object to string, removing \'s, then writing to list brings back \'s. slashes

python

string

byte