Jsoup 选择器解析元素的 href 和标题

Jsoup selector parse element 's href and title

HTML 来自古腾堡:

<li class="booklink">
    <a class="table link" href="/ebooks/4300.mobile" accesskey="5">
        <span class="row">
            <span class="cell leftcell">
                <span class="icon icon_book"></span>
            </span>
            <span class="cell content">
                <span class="title">Ulysses</span>
                <span class="subtitle">James Joyce</span>
                <span class="extra">7824 downloads</span>
            </span>
            <span class="cell rightcell">
                <span class="icon icon_next"></span>
            </span>
        </span>
    </a>
</li>

我想像这样解析 HTML 并使用 JSoup 获取 href link 和 title

我试了很多方法都没有成功。

   Document doc = Jsoup.connect("https://www.gutenberg.org/ebooks/search/?sort_order=downloads").get();


   //select tags with class name link, that has parent tag with class booklink
   for(Element e: doc.select(".booklink > .link"))
   {
       //for selected tag select element that has class title
       System.out.println("title: "+ e.select(".title").text());
       //for selected tag select attribute href and resolve absolute url
       System.out.println("url: "+ e.attr("abs:href"));
   }

试试这个:

import java.io.IOException;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class BookScraper {

    public static void main(String[] args) throws IOException {

        Document document = Jsoup.connect("https://m.gutenberg.org/ebooks/search.mobile/?query=ulysses").get();
        List<Element> bookLinks = document.select("body > div.content > ol > li[class=booklink]");

        for (Element bookLink : bookLinks) {

            String href = bookLink.select(".table.link").get(0).absUrl("href");
            String title = bookLink.select(".cell.content .title").text();
            String subTitle = bookLink.select(".cell.content .subtitle").text();
            String extra = bookLink.select(".cell.content .extra").text();

            System.out.println("Link : " + href);
            System.out.println("    Title    : " + title);
            System.out.println("    Subtitle : " + subTitle);
            System.out.println("    Info     : " + extra);
        }
    }

}

示例输出:

Link : https://m.gutenberg.org/ebooks/4300.mobile
    Title    : Ulysses
    Subtitle : James Joyce
    Info     : 7824 downloads
Link : https://m.gutenberg.org/ebooks/4367.mobile
    Title    : Personal Memoirs of U. S. Grant, Complete
    Subtitle : Ulysses S. Grant
    Info     : 1459 downloads
Link : https://m.gutenberg.org/ebooks/20151.mobile
    Title    : Hidden Treasures; Or, Why Some Succeed While Others Fail
    Subtitle : Harry A. Lewis
    Info     : 199 downloads
Link : https://m.gutenberg.org/ebooks/32884.mobile
    Title    : Ideas of Good and Evil
    Subtitle : W. B. Yeats
    Info     : 143 downloads
Link : https://m.gutenberg.org/ebooks/35742.mobile
    Title    : American Leaders and Heroes: A preliminary text-book in United States History
    Subtitle : Wilbur F. Gordy
    Info     : 143 downloads
Link : https://m.gutenberg.org/ebooks/32326.mobile
    Title    : Tales of Troy and Greece
    Subtitle : Andrew Lang
    Info     : 118 downloads
Link : https://m.gutenberg.org/ebooks/7768.mobile
    Title    : The Adventures of Ulysses
    Subtitle : Charles Lamb
    Info     : 108 downloads
Link : https://m.gutenberg.org/ebooks/11490.mobile
    Title    : American Negro Slavery
    Subtitle : Ulrich Bonnell Phillips
    Info     : 102 downloads
Link : https://m.gutenberg.org/ebooks/17667.mobile
    Title    : Dialogues of the Dead
    Subtitle : Baron George Lyttelton Lyttelton and Mrs. Montagu
    Info     : 98 downloads
Link : https://m.gutenberg.org/ebooks/2851.mobile
    Title    : Sixes and Sevens
    Subtitle : O. Henry
    Info     : 97 downloads
Link : https://m.gutenberg.org/ebooks/32728.mobile
    Title    : The English in the West Indies; Or, The Bow of Ulysses
    Subtitle : James Anthony Froude
    Info     : 69 downloads
Link : https://m.gutenberg.org/ebooks/41935.mobile
    Title    : The Adventures of Ulysses the Wanderer
    Subtitle : Homer and Guy Thorne
    Info     : 67 downloads
Link : https://m.gutenberg.org/ebooks/32628.mobile
    Title    : The Child's Book of American Biography
    Subtitle : Mary Stoyell Stimpson
    Info     : 63 downloads
Link : https://m.gutenberg.org/ebooks/29659.mobile
    Title    : Manual of American Grape-Growing
    Subtitle : U. P. Hedrick
    Info     : 54 downloads
Link : https://m.gutenberg.org/ebooks/46327.mobile
    Title    : The Cherries of New York
    Subtitle : U. P. Hedrick
    Info     : 47 downloads
Link : https://m.gutenberg.org/ebooks/5860.mobile
    Title    : Personal Memoirs of U. S. Grant, Part 1.
    Subtitle : Ulysses S. Grant
    Info     : 46 downloads
Link : https://m.gutenberg.org/ebooks/51076.mobile
    Title    : Aaron Rodd, Diviner
    Subtitle : E. Phillips Oppenheim
    Info     : 34 downloads
Link : https://m.gutenberg.org/ebooks/45978.mobile
    Title    : The Grapes of New York
    Subtitle : U. P. Hedrick
    Info     : 33 downloads
Link : https://m.gutenberg.org/ebooks/46347.mobile
    Title    : Men of Our Times; Or, Leading Patriots of the Day
    Subtitle : Harriet Beecher Stowe
    Info     : 31 downloads
Link : https://m.gutenberg.org/ebooks/4546.mobile
    Title    : Memoirs of the Union's Three Great Civil War Generals
    Subtitle : Ulysses S. Grant, William T. Sherman, and Philip Henry Sheridan
    Info     : 30 downloads
Link : https://m.gutenberg.org/ebooks/47263.mobile
    Title    : The Peaches of New York
    Subtitle : U. P. Hedrick
    Info     : 30 downloads
Link : https://m.gutenberg.org/ebooks/39626.mobile
    Title    : An Alphabet of History
    Subtitle : Wilbur D. Nesbit
    Info     : 28 downloads
Link : https://m.gutenberg.org/ebooks/46994.mobile
    Title    : The Pears of New York
    Subtitle : U. P. Hedrick
    Info     : 27 downloads
Link : https://m.gutenberg.org/ebooks/43982.mobile
    Title    : Stories of the Old World
    Subtitle : Alfred John Church
    Info     : 26 downloads
Link : https://m.gutenberg.org/ebooks/28386.mobile
    Title    : Ulysses S. Grant
    Subtitle : Walter Allen
    Info     : 25 downloads