如何使用带 Python 的标签名称获取特定标签内的文本
How to get text inside specific tag using tag name with Python
我正在尝试打开一个 XML 文件并对其进行解析,查看其标签并在每个特定标签中查找文本。如果标签中的文本与字符串匹配,我希望它删除字符串的一部分或用其他内容替换它。
我的问题是,我不确定是否:start = x.find('start_char').text 实际上是获取“start_char”标签内的文本并将其保存到“开始”变量。 (“x.find('tag_name').text 是否真的获取标签内的文本?)
XML 文件具有以下数据:
<?xml version="1.0" encoding="utf-8"?>
<metadata>
<filter>
<regex>ATL|LAX|DFW</regex >
<start_char>3</start_char>
<end_char></end_char>
<action>remove</action>
</filter>
<filter>
<regex>DFW.+\.$</regex >
<start_char>3</start_char>
<end_char>-1</end_char>
<action>remove</action>
</filter>
<filter>
<regex>\-</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex>\s</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex> T&R$</regex >
<start_char></start_char>
<end_char>-4</end_char>
<action>remove</action>
</filter>
</metadata>
我使用的 Python 代码是:
from xml.etree.ElementTree import ElementTree
# filters.xml is the file that holds the things to be filtered
tree = ElementTree()
tree.parse("filters.xml")
# Get the data in the XML file
root = tree.getroot()
# Loop through filters
for x in root.findall('filter'):
# Find the text inside the regex tag
regex = x.find('regex').text
# Find the text inside the start_char tag
start = x.find('start_char').text
# Find the text inside the end_char tag
end = x.find('end_char').text
# Find the text inside the replacement tag
#replace = x.find('replacement')
# Find the text inside the action tag
action = x.find('action').text
if action == 'remove':
if re.match(r'regex', mfn_pn, re.IGNORECASE):
mfn_pn = mfn_pn[start:end]
elif action == 'substitute':
mfn_pn = re.sub(r'regex', '', mfn_pn)
return mfn_pn
代码 start = x.find('start_char').text
将在 filter
元素有 start_char
个子元素的情况下起作用,否则会抛出错误 AttributeError: 'NoneType' object has no attribute 'text'
.
这可以避免,例如使用以下方法:
# find element
start_el = x.find('start_char')
# check if element exist and assign its text to the variable, None (or another default value) otherwise
start = start_el.text if start_el is not None else None
同样适用于 end
变量。
使用这种方法,将为您的示例文档检索以下值:
3 None
3 -1
None None
None None
None -4
我正在尝试打开一个 XML 文件并对其进行解析,查看其标签并在每个特定标签中查找文本。如果标签中的文本与字符串匹配,我希望它删除字符串的一部分或用其他内容替换它。
我的问题是,我不确定是否:start = x.find('start_char').text 实际上是获取“start_char”标签内的文本并将其保存到“开始”变量。 (“x.find('tag_name').text 是否真的获取标签内的文本?)
XML 文件具有以下数据:
<?xml version="1.0" encoding="utf-8"?>
<metadata>
<filter>
<regex>ATL|LAX|DFW</regex >
<start_char>3</start_char>
<end_char></end_char>
<action>remove</action>
</filter>
<filter>
<regex>DFW.+\.$</regex >
<start_char>3</start_char>
<end_char>-1</end_char>
<action>remove</action>
</filter>
<filter>
<regex>\-</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex>\s</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex> T&R$</regex >
<start_char></start_char>
<end_char>-4</end_char>
<action>remove</action>
</filter>
</metadata>
我使用的 Python 代码是:
from xml.etree.ElementTree import ElementTree
# filters.xml is the file that holds the things to be filtered
tree = ElementTree()
tree.parse("filters.xml")
# Get the data in the XML file
root = tree.getroot()
# Loop through filters
for x in root.findall('filter'):
# Find the text inside the regex tag
regex = x.find('regex').text
# Find the text inside the start_char tag
start = x.find('start_char').text
# Find the text inside the end_char tag
end = x.find('end_char').text
# Find the text inside the replacement tag
#replace = x.find('replacement')
# Find the text inside the action tag
action = x.find('action').text
if action == 'remove':
if re.match(r'regex', mfn_pn, re.IGNORECASE):
mfn_pn = mfn_pn[start:end]
elif action == 'substitute':
mfn_pn = re.sub(r'regex', '', mfn_pn)
return mfn_pn
代码 start = x.find('start_char').text
将在 filter
元素有 start_char
个子元素的情况下起作用,否则会抛出错误 AttributeError: 'NoneType' object has no attribute 'text'
.
这可以避免,例如使用以下方法:
# find element
start_el = x.find('start_char')
# check if element exist and assign its text to the variable, None (or another default value) otherwise
start = start_el.text if start_el is not None else None
同样适用于 end
变量。
使用这种方法,将为您的示例文档检索以下值:
3 None
3 -1
None None
None None
None -4