beautifulsoup -python 3 中的正则表达式 findall
regex findall in beautifulsoup -python 3
我需要获取标签 ix:nonfraction
下所有字段的名称和值以及上下文参考,如下所示:
<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>
。
所需的输出为:
TangibleFixedAssets, FY1.end, 238,011
正则表达式必须搜索的字符串包含许多这样的标签,因此是否有办法将所有 3 个输出保持连接(或在列表的同一索引内)?
import bs4
html = '''<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>'''
soup = bs4.BeautifulSoup(html, 'lxml')
ixs = soup.find_all('ix:nonfraction')
for ix in ixs:
name = ix['name'].split(':')[-1]
contextref = ix['contextref']
text = ix.text
output = [name, contextref, text]
print(output)
输出:
['TangibleFixedAssets', 'FY1.END', '238,011']
我需要获取标签 ix:nonfraction
下所有字段的名称和值以及上下文参考,如下所示:
<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>
。
所需的输出为:
TangibleFixedAssets, FY1.end, 238,011
正则表达式必须搜索的字符串包含许多这样的标签,因此是否有办法将所有 3 个输出保持连接(或在列表的同一索引内)?
import bs4
html = '''<ix:nonfraction name="uk-gaap:TangibleFixedAssets" contextref="FY1.END" unitref="GBP" xmlns:uk-gaap="http://www.xbrl.org/uk/gaap/core/2009-09-01" decimals="0" format="ixt:numcommadot">238,011</ix:nonfraction>'''
soup = bs4.BeautifulSoup(html, 'lxml')
ixs = soup.find_all('ix:nonfraction')
for ix in ixs:
name = ix['name'].split(':')[-1]
contextref = ix['contextref']
text = ix.text
output = [name, contextref, text]
print(output)
输出:
['TangibleFixedAssets', 'FY1.END', '238,011']