是否有涵盖所有 html 个实体的 python 模块?

Is there a python module that covers all html entities?

https://dev.w3.org/html5/html-author/charref

我尝试了以下方法。他们都不能翻译上面link中的所有字符。有没有包含所有字符映射的python模块?

>>> from HTMLParser import HTMLParser
>>> h = HTMLParser()
>>> h.unescape('	')
'	'

>>> from w3lib.html import replace_entities
>>> replace_entities('	')
u''

我用 beautifulsouphtml5lib 解析器尝试了上面的 URL。检查输出似乎解码了所有元素:

import requests
from bs4 import BeautifulSoup

url = 'https://dev.w3.org/html5/html-author/charref'

soup = BeautifulSoup(requests.get(url).text, 'html5lib')

for ch in soup.select('td.named code'):
    print('{: <40} {}'.format(ch.text, BeautifulSoup(ch.text, 'html5lib').text))

打印:

&Tab;                                    
&NewLine;                                
&excl;                                   !
&quot; &QUOT;                            " "
&num;                                    #
&dollar;                                 $
&percnt;                                 %
&amp; &AMP;                              & &
&apos;                                   '
&lpar;                                   (
&rpar;                                   )
&ast; &midast;                           * *
&plus;                                   +
&comma;                                  ,
&period;                                 .

... and so on.