HtmlUnit 和解密 span 元素
HtmlUnit and decrypting a span element
我正在尝试从网站上抓取电话号码。
当我检查第二个条目中的电话号码时,Chrome 中的检查员给了我以下结果:
<span class="nummer">(012) 34 56 78</span>
<span class="suffix encode_me telSelector129112728843_1306868" data-telselector="telSelector129112728843_1306868" data-telsuffix="IDEw"> 90</span>
但是,Htmlunit(和 Chrome,如果我单击 "show source")显示以下内容:
<span class="nummer">(012) 34 56 78</span>
<span class="suffix encode_me telSelector129112728843_1306868" data-telselector="telSelector129112728843_1306868" data-telsuffix="IDEw"></span>
有什么方法可以用 Htmlunit 获取电话号码的最后一段吗?
最新版本,我明白了:
try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
String url = "http://www.gelbeseiten.de/schneider/hamburg";
HtmlPage htmlPage = webClient.getPage(url);
for (Object o : htmlPage.getByXPath("//span[@class='teilnehmertelefon']")) {
System.out.println(((HtmlElement) o).asXml());
}
}
打印条目:
<span class="teilnehmertelefon">
<span class="text nummer_ganz">
<span class="nummer">
(040) 78 80 89
</span>
<span class="suffix encode_me telSelector129112728843_3662885" data-telselector="telSelector129112728843_3662885" data-telsuffix="IDEw">
10
</span>
</span>
</span>
我正在尝试从网站上抓取电话号码。
当我检查第二个条目中的电话号码时,Chrome 中的检查员给了我以下结果:
<span class="nummer">(012) 34 56 78</span>
<span class="suffix encode_me telSelector129112728843_1306868" data-telselector="telSelector129112728843_1306868" data-telsuffix="IDEw"> 90</span>
但是,Htmlunit(和 Chrome,如果我单击 "show source")显示以下内容:
<span class="nummer">(012) 34 56 78</span>
<span class="suffix encode_me telSelector129112728843_1306868" data-telselector="telSelector129112728843_1306868" data-telsuffix="IDEw"></span>
有什么方法可以用 Htmlunit 获取电话号码的最后一段吗?
最新版本,我明白了:
try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
String url = "http://www.gelbeseiten.de/schneider/hamburg";
HtmlPage htmlPage = webClient.getPage(url);
for (Object o : htmlPage.getByXPath("//span[@class='teilnehmertelefon']")) {
System.out.println(((HtmlElement) o).asXml());
}
}
打印条目:
<span class="teilnehmertelefon">
<span class="text nummer_ganz">
<span class="nummer">
(040) 78 80 89
</span>
<span class="suffix encode_me telSelector129112728843_3662885" data-telselector="telSelector129112728843_3662885" data-telsuffix="IDEw">
10
</span>
</span>
</span>