BeautifulSoup: SyntaxError: invalid character in identifier

Question

我正在尝试从该网页中抓取所有日期，这些日期在 table 内。如何：使用查找，指定 table 的元素及其属性（蓝色）问题：当我尝试提取整个 table 时语法错误，字符标识符无效。其他相关信息：此站点需要用户名和密码，因此我使用会话来保存我的凭据。

import requests
from getpass import getpass
from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
from requests.auth import HTTPBasicAuth

URL = "https://d2l.pima.edu/d2l/lms/dropbox/user/folders_list.d2l?ou=475011&isprv=0"
s = requests.Session()
s.auth = ("myusername", "mypass")
s.headers.update({"x-test": "true"}) 

# both "x-test" and "x-test2" are sent
s.get("https://d2l.pima.edu/d2l/lms/dropbox/user/folders_list.d2l?ou=475011&isprv=0", headers={"x-test2": "true"})
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
results = soup.find("div", attrs= {"id":"id_content_r_c1"})

错误引用最后一行代码：标识符中的无效字符然而，我三重检查并与其他有效的代码进行比较，没有发现任何差异。

另外这是我网页的DOC

回溯：

runfile('/Users/rahelmizrahi/Python/scripts/d2lwebscrape1.py', wdir='/Users/rahelmizrahi/Python/scripts')
  File "/Users/rahelmizrahi/Python/scripts/d2lwebscrape1.py", line 26
    results = soup.find("div", attrs= {"id":"id_content_r_c1"})
                                                              ^
SyntaxError: invalid character in identifier

Answer 1

这可能是 copy/pasting 代码的结果 - 让我们看看失败的行

>>> import unicodedata as ud
>>> s = 'results = soup.find("div", attrs= {"id":"id_content_r_c1"})'
>>> for c in s:print(c, ud.name(c))
... 
r LATIN SMALL LETTER R
e LATIN SMALL LETTER E
s LATIN SMALL LETTER S
u LATIN SMALL LETTER U
l LATIN SMALL LETTER L
t LATIN SMALL LETTER T
s LATIN SMALL LETTER S
  SPACE
= EQUALS SIGN
  SPACE
s LATIN SMALL LETTER S
...
1 DIGIT ONE
" QUOTATION MARK
} RIGHT CURLY BRACKET
 ZERO WIDTH SPACE
) RIGHT PARENTHESIS

倒数第二个字符 "ZERO WIDTH SPACE" 是不可见的，这是问题所在。删除它或重新键入代码行。

BeautifulSoup: SyntaxError: invalid character in identifier

BeautifulSoup: SyntaxError: invalid character in identifier

python

unicode

syntax-error

special-characters