beautifulsoup -python 3 中的正则表达式 findall

regex findall in beautifulsoup -python 3

我需要获取标签 ix:nonfraction 下所有字段的名称和值以及上下文参考,如下所示:

<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>

所需的输出为:

TangibleFixedAssets, FY1.end, 238,011

正则表达式必须搜索的字符串包含许多这样的标签,因此是否有办法将所有 3 个输出保持连接(或在列表的同一索引内)?

import bs4
html = '''<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>'''

soup = bs4.BeautifulSoup(html, 'lxml')

ixs = soup.find_all('ix:nonfraction')
for ix in ixs:
    name = ix['name'].split(':')[-1]
    contextref = ix['contextref']
    text = ix.text
    output = [name, contextref, text]
    print(output)

输出:

['TangibleFixedAssets', 'FY1.END', '238,011']