是否有涵盖所有 html 个实体的 python 模块？

Question

https://dev.w3.org/html5/html-author/charref

我尝试了以下方法。他们都不能翻译上面link中的所有字符。有没有包含所有字符映射的python模块？

>>> from HTMLParser import HTMLParser
>>> h = HTMLParser()
>>> h.unescape('&Tab;')
'&Tab;'

>>> from w3lib.html import replace_entities
>>> replace_entities('&Tab;')
u''

Answer 1

我用 beautifulsoup 和 html5lib 解析器尝试了上面的 URL。检查输出似乎解码了所有元素：

import requests
from bs4 import BeautifulSoup

url = 'https://dev.w3.org/html5/html-author/charref'

soup = BeautifulSoup(requests.get(url).text, 'html5lib')

for ch in soup.select('td.named code'):
    print('{: <40} {}'.format(ch.text, BeautifulSoup(ch.text, 'html5lib').text))

打印：

&Tab;                                    
&NewLine;                                
&excl;                                   !
&quot; &QUOT;                            " "
&num;                                    #
&dollar;                                 $
&percnt;                                 %
&amp; &AMP;                              & &
&apos;                                   '
&lpar;                                   (
&rpar;                                   )
&ast; &midast;                           * *
&plus;                                   +
&comma;                                  ,
&period;                                 .

... and so on.

是否有涵盖所有 html 个实体的 python 模块？

Is there a python module that covers all html entities?

python

html-entities