在 python 中使用 Xpath 解析 html

Parsing html using Xpath in python

我有一个 html 下面我试图使用 xpath 解析。但我在 return 中只得到空洞的刺痛。谁能告诉我我哪里弄错了。我已经尝试了一切但无法成功。

标签的 Xpath 代码:

divLbl=ch.xpath("//div[@class='left-container']/article/ul[@class='list-unstyled row']/li[@class='col-sm-6 mrg-bottom']/span[@class='text-light']")

对应标签值的 Xpath 代码:

divVal=ch.xpath("//div[@class='left-container']/article/ul[@class='list-unstyled row']/li[@class='col-sm-6 mrg-bottom']/span[@class='text-light']/strong")

HTML 值:

<div>
                        <h2 class="rowbreak"><strong>Information of the Car</strong></h2>
                        <ul class=" list-unstyled row">
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Make Year:</span> <strong>Aug 2009</strong></li>
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-road text-light"></span> <span class=" text-light">Kilometers:</span> <strong>127,553</strong></li>
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-map-marker text-light"></span> <span class=" text-light">City:</span> 
                                <strong class="carCity_795606">  
                                                                        <a href="javascript:void(0);" onclick="javascript: $( &quot;#maplinkbtn&quot; ).trigger( &quot;click&quot; ); ">
                                    Sambalpur                                    </a>
                                                                    </strong>

                            </li>
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Listing Date:</span> <strong>27 Apr 2015</strong></li>
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">No. of Owners:</span> <strong> First Owner</strong>
                            </li>
                            <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-tint text-light"></span> <span class=" text-light">Fuel Type:</span> <strong> Petrol</strong></li>
                              <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">Posted by:</span> <strong> 
                                  Dealer</strong>
                            </li>
                        </ul>
           </div>

已编辑 HTML:

 <div>
                    <h2 class="rowbreak"><strong>Information of the Car</strong></h2>
                    <ul class=" list-unstyled row">
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Make Year:</span> <strong>Aug 2009</strong></li>
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-road text-light"></span> <span class=" text-light">Kilometers:</span> <strong>127,553</strong></li>
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-map-marker text-light"></span> <span class=" text-light">City:</span> 
                            <strong class="carCity_795606">  
                                                                    <a href="javascript:void(0);" onclick="javascript: $( &quot;#maplinkbtn&quot; ).trigger( &quot;click&quot; ); ">
                                Sambalpur                                    </a>
                                                                </strong>

                        </li>
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Listing Date:</span> <strong>27 Apr 2015</strong></li>
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">No. of Owners:</span> <strong> First Owner</strong>
                        </li>
                        <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-tint text-light"></span> <span class=" text-light">Fuel Type:</span> <strong> Petrol</strong></li>
                          <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">Posted by:</span> <strong> 
                              Dealer</strong>
                        </li>
                    </ul>
       </div>

 <h2 class="rowbreak"></h2>
    <ul class=" list-unstyled row">
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light">One Time Tax :</span> <strong>Individual</strong></li>
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light">Registration No. :</span> <strong>OR03F3141</strong></li>
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light"> Insurance &amp; Expiry :</span> <strong>No Insurance&nbsp;</strong></li>
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light">Registration Place: </span> <strong> Sambalpur</strong></li>
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light">Transmission :</span> <strong>Manual</strong></li>
                            <li class="col-sm-6 mrg-bottom"><span class=" text-light">Color :</span> <strong>Silver</strong></li>
                        </ul>

您当前使用的 XPath 非常脆弱 - 您正在检查链中的每个元素并使用 "layout-oriented" 类。

我将从包含 strong 元素和 "Information of the Car" 文本的 h2 元素开始,然后得到以下 ul 元素。例如。获取所有标签:

//h2[strong = 'Information of the Car']/following-sibling::ul/li/span/text()

演示:

In [3]: ch = fromstring(data)

In [4]: ch.xpath("//h2[strong = 'Information of the Car']/following-sibling::ul/li/span/text()")
['Make Year:', 'Kilometers:', 'City:', 'No. of Owners:', 'Fuel Type:', 'Posted by:']

示例(获取名称和值):

In [25]: for field in ch.xpath("//h2/following-sibling::ul/li"):
    name = ''.join(field.xpath(".//span/text()")).strip()
    value = ''.join(field.xpath(".//strong//text()")).strip()
    print name, value
   ....:     
Make Year: Aug 2009
Kilometers: 127,553
City: Sambalpur
Listing Date: 27 Apr 2015
No. of Owners: First Owner
Fuel Type: Petrol
Posted by: Dealer
One Time Tax : Individual
Registration No. : OR03F3141
Insurance & Expiry : No Insurance
Registration Place: Sambalpur
Transmission : Manual
Color : Silver