JSoup 未通过 class 正确提取元素

JSoup not properly extracting elements by class

我在网页中有以下元素:

<div id="pnNij" class="post" data-tag1="" data-tag2="">
    <a class="image-list-link" href="http://imgur.com/gallery/pnNij" data-page="0">
        <img alt="" src="./Imgur_ The most awesome images on the Internet_files/H7fZCNgb.jpg">


            <div class="point-info gradient-transparent-black transition">
                <div class="relative">
                    <div class="pa-bottom">
                        <div class="arrows">
                            <div title="like" class="pointer arrow-up icon-upvote-outline" data="pnNij" type="image" data-up="4212"></div>
                            <div title="dislike" class="pointer arrow-down icon-downvote-outline" data="pnNij" type="image" data-downs="502"></div>
                            <div class="clear"></div>
                        </div>

                        <div class="point-info-points" title="points">
                            <span class="points-pnNij">3,710</span>
                            <span class="points-text-pnNij">points</span>
                        </div>
                    </div>
                </div>
            </div>

    </a>
    <div class="hover">
                    <p>Seems like 2017 has it all...</p>


        <div class="post-info">
            album · 69,542 views
        </div>
    </div>

</div>

注意 href 如何等于 http://imgur.com/gallery/pnNij

但是,当我像这样使用 JSoup 从页面中提取元素时:

docImgur = Jsoup.connect("http://imgur.com/").get();
Elements links = docImgur.getElementsByClass("post");

除了 href 属性等于 /gallery/pnNij/

外,该元素几乎已正确提取

为什么 href 属性不包含完整的 URL?

当您检查页面源代码时,您会发现

<a class="image-list-link" href="/gallery/WRzti" data-page="0">
    ...
</a>

所以href属性不是绝对的,这导致了你预期的结果:/gallery/WRzti

解决方案

使用 abs: attribute prefix.

例子

Document docImgur = Jsoup.connect("http://imgur.com/").get();

Elements links = docImgur.select("a[href].image-list-link");

for (Element element : links) {
    System.out.println(element.attr("abs:href"));
}

输出

http://imgur.com/gallery/WRzti
http://imgur.com/gallery/tCnDJ
http://imgur.com/gallery/JIHYh
...