Nokogiri 和 XML - 抓取包含特定文本的 "tr" XML

Question

这是我要抓取的 word 文档的一部分：

        <w:tr>
            <w:tc>
                <w:tcPr>
                    <w:tcW w:type="dxa" w:w="9035"/>
                    <w:tcBorders>
                        <w:top w:color="0A57A4" w:space="0" w:sz="6" w:val="single"/>
                    </w:tcBorders>
                    <w:vAlign w:val="center"/>
                </w:tcPr>
                <w:p>
                    <w:pPr>
                        <w:jc w:val="left"/>
                    </w:pPr>
                    <w:r>
                        <w:t>#Finding#</w:t>
                    </w:r>
                    <w:bookmarkStart w:id="49" w:name="_GoBack"/>
                    <w:bookmarkEnd w:id="49"/>
                </w:p>
            </w:tc>
            <w:tc>
                <w:tcPr>
                    <w:tcW w:type="dxa" w:w="1705"/>
                    <w:tcBorders>
                        <w:top w:color="0A57A4" w:space="0" w:sz="6" w:val="single"/>
                    </w:tcBorders>
                    <w:vAlign w:val="center"/>
                </w:tcPr>
                <w:p>
                    <w:r>
                        <w:rPr>
                            <w:noProof/>
                        </w:rPr>
                        <w:drawing>
                            <wp:inline distB="0" distL="0" distR="0" distT="0">
                                <wp:extent cx="292608" cy="292608"/>
                                <wp:effectExtent b="0" l="0" r="0" t="0"/>
                                <wp:docPr id="924" name="Picture 924"/>
                                <wp:cNvGraphicFramePr>
                                    <a:graphicFrameLocks noChangeAspect="1" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"/>
                                </wp:cNvGraphicFramePr>
                                <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
                                    <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
                                        <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
                                            <pic:nvPicPr>
                                                <pic:cNvPr id="0" name="S-sm.png"/>
                                                <pic:cNvPicPr/>
                                            </pic:nvPicPr>
                                            <pic:blipFill>
                                                <a:blip cstate="print" r:embed="rId20">
                                                    <a:extLst>
                                                        <a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}">
                                                            <a14:useLocalDpi val="0" xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main"/>
                                                        </a:ext>
                                                    </a:extLst>
                                                </a:blip>
                                                <a:stretch>
                                                    <a:fillRect/>
                                                </a:stretch>
                                            </pic:blipFill>
                                            <pic:spPr>
                                                <a:xfrm>
                                                    <a:off x="0" y="0"/>
                                                    <a:ext cx="292608" cy="292608"/>
                                                </a:xfrm>
                                                <a:prstGeom prst="rect">
                                                    <a:avLst/>
                                                </a:prstGeom>
                                            </pic:spPr>
                                        </pic:pic>
                                    </a:graphicData>
                                </a:graphic>
                            </wp:inline>
                        </w:drawing>
                    </w:r>
                </w:p>
            </w:tc>
        </w:tr>

有没有办法让 nokogiri 抓取整个 <w:tr> 一直到存在“#Finding#”的 </w:tr>（结尾）？比如让它在所有 "trs" 中搜索包含#Finding# 的文本并获取整个 tr 元素？我是否必须遍历整个文档中的每个 <w:tr> 标记并查看其中是否包含#Finding#？

Answer 1

Is there a way to have nokogiri grab the entire <w:tr> all the way to the </w:tr> (the end) where "#Finding#" exists?

XPath

//w:tr[.//w:t[contains(., '#Finding#')]]

用简单的英语 "any <w:tr> that has a <w:t> that contains #Finding#".

备注：

您必须先设置 w 命名空间前缀，然后才能使用该 XPath 表达式（其 URI 为 http://schemas.microsoft.com/office/word/2003/wordml）。参见：http://www.nokogiri.org/tutorials/searching_a_xml_html_document.html
确保 #Finding# 不包含单引号，否则表达式会中断。

Nokogiri 和 XML - 抓取包含特定文本的 "tr" XML

Nokogiri and XML - Grab "tr" XML that includes specific text

ruby

xml

nokogiri