如何获取返回空节点集的跨度内容?
How to get the content of a span returning empty nodeset?
这是我要提取信息的网站 div:
<div class="_24er">
<table class="_4dmd _4eok uiGrid _51mz" cols="4" cellspacing="0" cellpadding="0"><tbody>
<tr class="_51mx">
<td class="_5px7 _51m-">
<span class="_5x8v _5a5j _5a5i">
<span class="_5a4-">FÉV</span>
<span class="_5a4z">11</span>
</span>
</td>
<td class="_4dmi _51m-"><div class="_4dmj">
<div class="_4dmk">
<a data-hovercard="/ajax/hovercard/event.php?id=769853670060959" href="/events/769853670060959/?acontext=%7B%22source%22%3A5%2C%22action_history%22%3A[%7B%22surface%22%3A%22page%22%2C%22mechanism%22%3A%22main_list%22%2C%22extra_data%22%3A%22%5C%22[]%5C%22%22%7D]%2C%22has_source%22%3Atrue%7D" id="js_9a" aria-describedby="u_2r_1" aria-owns="">
<span class=" _50f7"> HipHop Night With YOUSTAAZ (-60% Countdown Sur Toute La Carte)
</span>
</a>
</div>
<div class="_4dml fsm fwn fcg">
<span class="">11 févr. - 12 févr.</span>
<span aria-hidden="true"> · </span>
15 invités</div>
</div>
</td>
<td class="_5pxd _51m-">
<div class="_4dmn">
<div class="_30n-">
<a data-hovercard="/ajax/hovercard/hovercard.php?id=1276481845698447" href="https://xxxxxxx">JOBI - Gammarth</a>
</div>
<div class="_30n_">Tunis, Tunisie</div>
</div></td>
<td class="_4dmt _51mw _51m-">
<div class="_4dmu">
<div class="_2ib5">
<div class="_2ib4">
<div><button class="_4jy0 _4jy3 _517h _51sy _42ft" type="submit" value="1"><i alt="" class="_3-8_ img sp_7RV3BBvGAaI sx_1551de"></i>Ça m’intéresse</button></div>
</div>
</div>
</div>
</td>
</tr>
</tbody>
</table>
</div>
我正在尝试提取 span 节点的内容,如下所示:
<span class=" _50f7"> HipHop Night With YOUSTAAZ (-60% Countdown Sur Toute La Carte)
</span>
我已经提取了日期节点(事件的月份和日期),但是当提取上面显示的跨度中的事件名称时,我得到空节点:
cc<-remDr$findElement(using = "css", "[class = '_24er']")
cc<-remDr$getPageSource()
page_events<-read_html(cc[[1]][1])
events =html_nodes(page_events,'._24er')
mois_data=html_nodes(page_events,'._24er > table > tbody > tr > td > span > ._5a4-')
jours_data=html_nodes(page_events,'._24er > table > tbody > tr > td > span > ._5a4z')
links_events_data=html_nodes(page_events,'._24er > table > tbody > tr > td > div> div > a ')
//getting the name of events : I get {xml_nodeset (0)} as a result
nom_events_data=html_nodes(page_events,'._24er > table > tbody > tr > td > div> div > a > span > ._50f7')
//我试图使用 class 来获取内容,我得到这个错误:
Error in xml2::xml_text(x, trim = trim) :
object 'noms_events_data' not found
nom_events_data=html_nodes(page_events,"[class='._50f7']")
//我尝试使用 xpath ,与 xpath 相同的错误:
nom_events_data=html_nodes(page_events,xpath = '//*[@id="js_9a"]/span')
//结果总是字符(0)
noms_events = html_text(noms_events_data)
经文档验证,正确的语法是:
noms_events_data=html_nodes(page_events,"._50f7")
而不是:
noms_events_data=html_nodes(page_events,'[class="._50f7"]')
这是我要提取信息的网站 div:
<div class="_24er">
<table class="_4dmd _4eok uiGrid _51mz" cols="4" cellspacing="0" cellpadding="0"><tbody>
<tr class="_51mx">
<td class="_5px7 _51m-">
<span class="_5x8v _5a5j _5a5i">
<span class="_5a4-">FÉV</span>
<span class="_5a4z">11</span>
</span>
</td>
<td class="_4dmi _51m-"><div class="_4dmj">
<div class="_4dmk">
<a data-hovercard="/ajax/hovercard/event.php?id=769853670060959" href="/events/769853670060959/?acontext=%7B%22source%22%3A5%2C%22action_history%22%3A[%7B%22surface%22%3A%22page%22%2C%22mechanism%22%3A%22main_list%22%2C%22extra_data%22%3A%22%5C%22[]%5C%22%22%7D]%2C%22has_source%22%3Atrue%7D" id="js_9a" aria-describedby="u_2r_1" aria-owns="">
<span class=" _50f7"> HipHop Night With YOUSTAAZ (-60% Countdown Sur Toute La Carte)
</span>
</a>
</div>
<div class="_4dml fsm fwn fcg">
<span class="">11 févr. - 12 févr.</span>
<span aria-hidden="true"> · </span>
15 invités</div>
</div>
</td>
<td class="_5pxd _51m-">
<div class="_4dmn">
<div class="_30n-">
<a data-hovercard="/ajax/hovercard/hovercard.php?id=1276481845698447" href="https://xxxxxxx">JOBI - Gammarth</a>
</div>
<div class="_30n_">Tunis, Tunisie</div>
</div></td>
<td class="_4dmt _51mw _51m-">
<div class="_4dmu">
<div class="_2ib5">
<div class="_2ib4">
<div><button class="_4jy0 _4jy3 _517h _51sy _42ft" type="submit" value="1"><i alt="" class="_3-8_ img sp_7RV3BBvGAaI sx_1551de"></i>Ça m’intéresse</button></div>
</div>
</div>
</div>
</td>
</tr>
</tbody>
</table>
</div>
我正在尝试提取 span 节点的内容,如下所示:
<span class=" _50f7"> HipHop Night With YOUSTAAZ (-60% Countdown Sur Toute La Carte)
</span>
我已经提取了日期节点(事件的月份和日期),但是当提取上面显示的跨度中的事件名称时,我得到空节点:
cc<-remDr$findElement(using = "css", "[class = '_24er']")
cc<-remDr$getPageSource()
page_events<-read_html(cc[[1]][1])
events =html_nodes(page_events,'._24er')
mois_data=html_nodes(page_events,'._24er > table > tbody > tr > td > span > ._5a4-')
jours_data=html_nodes(page_events,'._24er > table > tbody > tr > td > span > ._5a4z')
links_events_data=html_nodes(page_events,'._24er > table > tbody > tr > td > div> div > a ')
//getting the name of events : I get {xml_nodeset (0)} as a result
nom_events_data=html_nodes(page_events,'._24er > table > tbody > tr > td > div> div > a > span > ._50f7')
//我试图使用 class 来获取内容,我得到这个错误:
Error in xml2::xml_text(x, trim = trim) :
object 'noms_events_data' not found
nom_events_data=html_nodes(page_events,"[class='._50f7']")
//我尝试使用 xpath ,与 xpath 相同的错误:
nom_events_data=html_nodes(page_events,xpath = '//*[@id="js_9a"]/span')
//结果总是字符(0)
noms_events = html_text(noms_events_data)
经文档验证,正确的语法是:
noms_events_data=html_nodes(page_events,"._50f7")
而不是:
noms_events_data=html_nodes(page_events,'[class="._50f7"]')