beautifulsoup -python 3 中的正则表达式 findall

Question

我需要获取标签 ix:nonfraction 下所有字段的名称和值以及上下文参考，如下所示：

<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>。

所需的输出为：

TangibleFixedAssets, FY1.end, 238,011

正则表达式必须搜索的字符串包含许多这样的标签，因此是否有办法将所有 3 个输出保持连接（或在列表的同一索引内）？

Answer 1

import bs4
html = '''<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>'''

soup = bs4.BeautifulSoup(html, 'lxml')

ixs = soup.find_all('ix:nonfraction')
for ix in ixs:
    name = ix['name'].split(':')[-1]
    contextref = ix['contextref']
    text = ix.text
    output = [name, contextref, text]
    print(output)

输出：

['TangibleFixedAssets', 'FY1.END', '238,011']

beautifulsoup -python 3 中的正则表达式 findall

regex findall in beautifulsoup -python 3

regex

parsing

beautifulsoup

findall

python-3.x