使用 xpath 在所有节点中搜索文本
Searching for text in all nodes with xpath
我正在尝试在 html 的片段中查找单词以将其替换为 href。不知何故找不到用于 Xpath 的正确路径。示例:
require 'nokogiri'
html = '
<p>A paragraph Apple<p>
<span>Apple</span>
<ul>
<li>Item 1</li>
<li>Apple <strong>Apple</strong></li>
<li>Apple</li>
<li>Orange</li>
</ul>
<p><i>Apple</i>Apple</p>'
doc = Nokogiri::HTML.fragment(html)
doc.xpath('.//*[text()="Apple"]').each do |node|
puts "\n"
puts node.name
puts node.content
puts node.replace('REPLACED')
end
puts doc.to_html
结果:
span
Apple
REPLACED
strong
Apple
REPLACED
li
Apple
REPLACED
i
Apple
REPLACED
<p>A paragraph Apple</p><p>
REPLACED
</p><ul>
<li>Item 1</li>
<li>Apple REPLACED</li>
REPLACED
<li>Orange</li>
</ul>
<p>REPLACEDApple</p>
所以根p元素的词不替换,只剩下li中的一个。在这种情况下,我应该使用哪条路径来搜索根目录和所有子目录?在像this.//*这样的页面上读应该是用来在根节点和子节点中查找的路径。关于如何使用 nokogiri 或 xpath 正确处理这个问题有什么想法吗?
提前致谢!
您正在查找整个文本等于 "Apple"
的节点,而不是包含 "Apple"
的节点
html = '
<p>A paragraph Apple<p>
<span>Apple</span>
<ul>
<li>Item 1</li>
<li>Apple <strong>Apple</strong></li>
<li>Apple</li>
<li>Orange</li>
</ul>
<p><i>Apple</i>Apple</p>
<Apple>Dont replace!</Apple>
'
doc = Nokogiri::HTML.fragment(html)
doc.traverse do |node|
if node.text?
node.content = node.content.gsub('Apple', 'REPLACED')
end
end
puts doc.to_html
它输出:
<p>A paragraph REPLACED</p><p>
<span>REPLACED</span>
</p><ul>
<li>Item 1</li>
<li>REPLACED <strong>REPLACED</strong>
</li>
<li>REPLACED</li>
<li>Orange</li>
</ul>
<p><i>REPLACED</i>REPLACED</p>
<apple>Dont replace!</apple>
我正在尝试在 html 的片段中查找单词以将其替换为 href。不知何故找不到用于 Xpath 的正确路径。示例:
require 'nokogiri'
html = '
<p>A paragraph Apple<p>
<span>Apple</span>
<ul>
<li>Item 1</li>
<li>Apple <strong>Apple</strong></li>
<li>Apple</li>
<li>Orange</li>
</ul>
<p><i>Apple</i>Apple</p>'
doc = Nokogiri::HTML.fragment(html)
doc.xpath('.//*[text()="Apple"]').each do |node|
puts "\n"
puts node.name
puts node.content
puts node.replace('REPLACED')
end
puts doc.to_html
结果:
span
Apple
REPLACED
strong
Apple
REPLACED
li
Apple
REPLACED
i
Apple
REPLACED
<p>A paragraph Apple</p><p>
REPLACED
</p><ul>
<li>Item 1</li>
<li>Apple REPLACED</li>
REPLACED
<li>Orange</li>
</ul>
<p>REPLACEDApple</p>
所以根p元素的词不替换,只剩下li中的一个。在这种情况下,我应该使用哪条路径来搜索根目录和所有子目录?在像this.//*这样的页面上读应该是用来在根节点和子节点中查找的路径。关于如何使用 nokogiri 或 xpath 正确处理这个问题有什么想法吗?
提前致谢!
您正在查找整个文本等于 "Apple"
的节点,而不是包含 "Apple"
html = '
<p>A paragraph Apple<p>
<span>Apple</span>
<ul>
<li>Item 1</li>
<li>Apple <strong>Apple</strong></li>
<li>Apple</li>
<li>Orange</li>
</ul>
<p><i>Apple</i>Apple</p>
<Apple>Dont replace!</Apple>
'
doc = Nokogiri::HTML.fragment(html)
doc.traverse do |node|
if node.text?
node.content = node.content.gsub('Apple', 'REPLACED')
end
end
puts doc.to_html
它输出:
<p>A paragraph REPLACED</p><p>
<span>REPLACED</span>
</p><ul>
<li>Item 1</li>
<li>REPLACED <strong>REPLACED</strong>
</li>
<li>REPLACED</li>
<li>Orange</li>
</ul>
<p><i>REPLACED</i>REPLACED</p>
<apple>Dont replace!</apple>