使用 lxml 抓取数据时 xpath 的使用
xpath usage while scraping data using lxml
我正在尝试编写一个 python 脚本来从网页中抓取数据。但是,我无法弄清楚 xpath 的正确用法来检索值。请帮我解决这个问题。
我正在尝试获取 VWAP 值的值,目前为 27.16(此值每个工作日都会更改。)检查 Chrome 中的值时,我得到以下 xpath 所需值
<span id="vwap">27.16</span>
根据在线教程,我编写了以下 python 脚本
from lxml import html
import requests
page = requests.get('https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=10400.00&type=CE&expiry=30NOV2017')
tree = html.fromstring(page.content)
vwap = tree.xpath('//span[@id="vwap"]/text()')
print(vwap)
但是当我执行这个命令时,我得到以下输出
[]
而不是
27.16
我也尝试根据 Whosebug 上的其他答案将 xpath 行替换为以下内容,但我仍然没有得到正确的输出。
vwap = tree.xpath('//*[@id="vwap"]/text()')
请告诉我要在 xpath 中放入什么,以便我在 vwap 变量中获得正确的值。
也欢迎任何其他解决方案(lxml 除外)。
如果要检查最初出现的页面源,则所需节点将类似于
<li><a style="color: #000000;" title="VWAP">VWAP</a> <span id="vwap"></span></li>
虽然这是 JavaScript 执行后的显示方式
<li><a style="color: #000000;" title="VWAP">VWAP</a> <span id="vwap">27.16</span></li>
注意第一个HTML样本中没有文字内容
似乎值来自节点下方
<div id="responseDiv" style="display:none">
{"valid":"true","isinCode":null,"lastUpdateTime":"29-NOV-2017 15:30:30","ocLink":"\/marketinfo\/sym_map\/symbolMapping.jsp?symbol=NIFTY&instrument=-&date=-&segmentLink=17&symbolCount=2","tradedDate":"29NOV2017","data":[{"change":"-17.80","sellPrice1":"13.80","buyQuantity3":"450","sellPrice2":"13.85","buyQuantity4":"150","buyQuantity1":"13,725","ltp":"-243019.52","buyQuantity2":"6,225","sellPrice5":"14.00","sellPrice3":"13.90","buyQuantity5":"450","sellPrice4":"13.95","underlying":"NIFTY","bestSell":"-2,41,672.50","annualisedVolatility":"9.44","optionType":"CE","prevClose":"31.10","pChange":"-57.23","lastPrice":"13.30","lowPrice":"11.00","strikePrice":"10400.00","premiumTurnover":"11,707.33","numberOfContractsTraded":"5,74,734","underlyingValue":"10,361.30","openInterest":"58,96,350","impliedVolatility":"12.73","vwap":"27.16","totalBuyQuantity":"10,49,850","openPrice":"35.10","closePrice":"17.85","bestBuy":"-2,43,852.25","changeinOpenInterest":"1,60,800","clientWisePositionLimits":"30517526","totalSellQuantity":"11,07,825","dailyVolatility":"0.49","sellQuantity5":"19,800","marketLot":"75","expiryDate":"30NOV2017","marketWidePositionLimits":"-","sellQuantity2":"75","sellQuantity1":"3,825","buyPrice1":"13.00","sellQuantity4":"900","buyPrice2":"12.90","sellQuantity3":"2,025","buyPrice4":"12.75","buyPrice3":"12.80","buyPrice5":"12.65","turnoverinRsLakhs":"44,94,632.53","pchangeinOpenInterest":"2.80","settlementPrice":"-","instrumentType":"OPTIDX","highPrice":"40.85"}],"companyName":"Nifty 50","eqLink":""}
</div>
所以您可能需要的代码是
import json
vwap = json.loads(tree.xpath('//div[@id="responseDiv"]/text()')[0].strip())['data'][0]['vwap']
我正在尝试编写一个 python 脚本来从网页中抓取数据。但是,我无法弄清楚 xpath 的正确用法来检索值。请帮我解决这个问题。
我正在尝试获取 VWAP 值的值,目前为 27.16(此值每个工作日都会更改。)检查 Chrome 中的值时,我得到以下 xpath 所需值
<span id="vwap">27.16</span>
根据在线教程,我编写了以下 python 脚本
from lxml import html
import requests
page = requests.get('https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=10400.00&type=CE&expiry=30NOV2017')
tree = html.fromstring(page.content)
vwap = tree.xpath('//span[@id="vwap"]/text()')
print(vwap)
但是当我执行这个命令时,我得到以下输出
[]
而不是
27.16
我也尝试根据 Whosebug 上的其他答案将 xpath 行替换为以下内容,但我仍然没有得到正确的输出。
vwap = tree.xpath('//*[@id="vwap"]/text()')
请告诉我要在 xpath 中放入什么,以便我在 vwap 变量中获得正确的值。
也欢迎任何其他解决方案(lxml 除外)。
如果要检查最初出现的页面源,则所需节点将类似于
<li><a style="color: #000000;" title="VWAP">VWAP</a> <span id="vwap"></span></li>
虽然这是 JavaScript 执行后的显示方式
<li><a style="color: #000000;" title="VWAP">VWAP</a> <span id="vwap">27.16</span></li>
注意第一个HTML样本中没有文字内容
似乎值来自节点下方
<div id="responseDiv" style="display:none">
{"valid":"true","isinCode":null,"lastUpdateTime":"29-NOV-2017 15:30:30","ocLink":"\/marketinfo\/sym_map\/symbolMapping.jsp?symbol=NIFTY&instrument=-&date=-&segmentLink=17&symbolCount=2","tradedDate":"29NOV2017","data":[{"change":"-17.80","sellPrice1":"13.80","buyQuantity3":"450","sellPrice2":"13.85","buyQuantity4":"150","buyQuantity1":"13,725","ltp":"-243019.52","buyQuantity2":"6,225","sellPrice5":"14.00","sellPrice3":"13.90","buyQuantity5":"450","sellPrice4":"13.95","underlying":"NIFTY","bestSell":"-2,41,672.50","annualisedVolatility":"9.44","optionType":"CE","prevClose":"31.10","pChange":"-57.23","lastPrice":"13.30","lowPrice":"11.00","strikePrice":"10400.00","premiumTurnover":"11,707.33","numberOfContractsTraded":"5,74,734","underlyingValue":"10,361.30","openInterest":"58,96,350","impliedVolatility":"12.73","vwap":"27.16","totalBuyQuantity":"10,49,850","openPrice":"35.10","closePrice":"17.85","bestBuy":"-2,43,852.25","changeinOpenInterest":"1,60,800","clientWisePositionLimits":"30517526","totalSellQuantity":"11,07,825","dailyVolatility":"0.49","sellQuantity5":"19,800","marketLot":"75","expiryDate":"30NOV2017","marketWidePositionLimits":"-","sellQuantity2":"75","sellQuantity1":"3,825","buyPrice1":"13.00","sellQuantity4":"900","buyPrice2":"12.90","sellQuantity3":"2,025","buyPrice4":"12.75","buyPrice3":"12.80","buyPrice5":"12.65","turnoverinRsLakhs":"44,94,632.53","pchangeinOpenInterest":"2.80","settlementPrice":"-","instrumentType":"OPTIDX","highPrice":"40.85"}],"companyName":"Nifty 50","eqLink":""}
</div>
所以您可能需要的代码是
import json
vwap = json.loads(tree.xpath('//div[@id="responseDiv"]/text()')[0].strip())['data'][0]['vwap']