如何使用jsoup重定向到另一个页面并继续打印ListView中的内容?

How to redirect to another page using jsoup and continue to print content in ListView?

一般来说,我的网站主要内容是 post 的列表和文字。所以我解析了这个 HTML 代码块中的每个 post 。

<div class="col-xs-12" style="margin:0.5em 0;line-height:1.785em">Some text</div>

为此我创建了这个 AsyncTask。

class NewPostsAsyncTask extends AsyncTask<String, Void, String> {

    @Override
    protected void onPreExecute() {
        super.onPreExecute();

        progressDialog = new ProgressDialog(MainActivity.this);
        progressDialog.setTitle("Новые");
        progressDialog.setMessage("Загрузка...");
        progressDialog.setIndeterminate(false);
        progressDialog.show();
    }

    @Override
    protected String doInBackground(String... params) {
        Document doc;

        try {
            doc = Jsoup.connect(URL).get(); 

            content = doc.select("[style=margin:0.5em 0;line-height:1.785em]");
            titleList.clear();

            for (Element contents : content) {
                if (!contents.text().contains("18+")) {
                    titleList.add(contents.text());
                }
            }

        } catch (IOException e) {
            e.printStackTrace(); 
        }

        return null;
    }

    @Override
    protected void onPostExecute(String s) {
        super.onPostExecute(s);
        listView.setAdapter(adapter);
        progressDialog.dismiss();
    }
}

但是我有一些问题。所有 post 都不会存储在一个网页上。您必须单击所有 post 末尾的 link 才能重定向到带有 post 的另一个页面。

而这个区块有这个 HTML 代码。

    <div class="row"><div class="col-xs-12">
        <div class="paginator">

                <span class="pagina">1683</span> " | " 

                <span class="pagina"><a href="/page/1682">1682</a></span> " | " 

                <span class="pagina"><a href="/page/1681">1681</a></span> " | " 

                <span class="pagina"><a href="/page/1680">1680</a></span> " | " 

                <span class="pagina"><a href="/page/1679">1679</a></span> " | " 

                <span class="pagina"><a href="/page/3">3</a></span> " | "

                <span class="pagina"><a href="/page/2">2</a></span> " | " 

                <span class="pagina"><a href="/page/1">1</a></span>

        </div>
    </div>
</div>

如何转到另一个页面,解析其他 posts 并在之前的 posts 之后在 ListView 中打印它们?因此,我希望在一个 ListView 中包含该网站的所有 post。你能告诉我应该怎么做吗?

我会这样做:

示例代码

@Override
protected String doInBackground(String... params) {
    Document doc;

    // I supposed URL variable is initialized like this: URL="killpls.me";
    try {
        do {
            doc = Jsoup.connect(URL).get();

            content = doc.select("[style=margin:0.5em 0;line-height:1.785em]");
            titleList.clear();

            for (Element contents : content) {
                if (!contents.text().contains("18+")) {
                    titleList.add(contents.text());
                }
            }

            Element anchor = doc.select( //
                "#stories > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > span.pagina:not(:has(a)) + span > a" //
            ).first();
            if (anchor==null) {
                break;
            } else {
                doc = null;
                URL = anchor.absUrl("href"); 
            }
        } while(canContinue());
    } catch (IOException e) {
        e.printStackTrace(); 
    }

    return null;
}

private boolean canContinue() {
     // Implement custom logic here ...
     // Return true if additionnal posts should be downloaded false otherwise.
     return true;
}

一些细节

该方法的核心在于以下行:

Element anchor = doc.select( //
   "#stories > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > span.pagina:not(:has(a)) + span > a" //
).first();

只要下一页存在,first() 方法就会 return 非 null 引用。到达第一页时,first() return 为空,没有更多页面可获取。

#stories           /* Find an element with id `stories`*/
> div:nth-child(3) /* Select its third div child */
> div:nth-child(1) /* Select first div child of the previous div */
> div:nth-child(1) /* Select first div (DIV-a) child of the previous div */
> span.pagina:not(:has(a)) /* Select a span with class `pagina` without any anchor as child */
+ span /* Select closest span next to previous span and child of `DIV-a` */
> a    /* Here is the next page to fetch */