如何使用jsoup重定向到另一个页面并继续打印ListView中的内容?
How to redirect to another page using jsoup and continue to print content in ListView?
一般来说,我的网站主要内容是 post 的列表和文字。所以我解析了这个 HTML 代码块中的每个 post 。
<div class="col-xs-12" style="margin:0.5em 0;line-height:1.785em">Some text</div>
为此我创建了这个 AsyncTask。
class NewPostsAsyncTask extends AsyncTask<String, Void, String> {
@Override
protected void onPreExecute() {
super.onPreExecute();
progressDialog = new ProgressDialog(MainActivity.this);
progressDialog.setTitle("Новые");
progressDialog.setMessage("Загрузка...");
progressDialog.setIndeterminate(false);
progressDialog.show();
}
@Override
protected String doInBackground(String... params) {
Document doc;
try {
doc = Jsoup.connect(URL).get();
content = doc.select("[style=margin:0.5em 0;line-height:1.785em]");
titleList.clear();
for (Element contents : content) {
if (!contents.text().contains("18+")) {
titleList.add(contents.text());
}
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
@Override
protected void onPostExecute(String s) {
super.onPostExecute(s);
listView.setAdapter(adapter);
progressDialog.dismiss();
}
}
但是我有一些问题。所有 post 都不会存储在一个网页上。您必须单击所有 post 末尾的 link 才能重定向到带有 post 的另一个页面。
而这个区块有这个 HTML 代码。
<div class="row"><div class="col-xs-12">
<div class="paginator">
<span class="pagina">1683</span> " | "
<span class="pagina"><a href="/page/1682">1682</a></span> " | "
<span class="pagina"><a href="/page/1681">1681</a></span> " | "
<span class="pagina"><a href="/page/1680">1680</a></span> " | "
<span class="pagina"><a href="/page/1679">1679</a></span> " | "
<span class="pagina"><a href="/page/3">3</a></span> " | "
<span class="pagina"><a href="/page/2">2</a></span> " | "
<span class="pagina"><a href="/page/1">1</a></span>
</div>
</div>
</div>
如何转到另一个页面,解析其他 posts 并在之前的 posts 之后在 ListView 中打印它们?因此,我希望在一个 ListView 中包含该网站的所有 post。你能告诉我应该怎么做吗?
我会这样做:
示例代码
@Override
protected String doInBackground(String... params) {
Document doc;
// I supposed URL variable is initialized like this: URL="killpls.me";
try {
do {
doc = Jsoup.connect(URL).get();
content = doc.select("[style=margin:0.5em 0;line-height:1.785em]");
titleList.clear();
for (Element contents : content) {
if (!contents.text().contains("18+")) {
titleList.add(contents.text());
}
}
Element anchor = doc.select( //
"#stories > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > span.pagina:not(:has(a)) + span > a" //
).first();
if (anchor==null) {
break;
} else {
doc = null;
URL = anchor.absUrl("href");
}
} while(canContinue());
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
private boolean canContinue() {
// Implement custom logic here ...
// Return true if additionnal posts should be downloaded false otherwise.
return true;
}
一些细节
该方法的核心在于以下行:
Element anchor = doc.select( //
"#stories > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > span.pagina:not(:has(a)) + span > a" //
).first();
只要下一页存在,first()
方法就会 return 非 null
引用。到达第一页时,first()
return 为空,没有更多页面可获取。
#stories /* Find an element with id `stories`*/
> div:nth-child(3) /* Select its third div child */
> div:nth-child(1) /* Select first div child of the previous div */
> div:nth-child(1) /* Select first div (DIV-a) child of the previous div */
> span.pagina:not(:has(a)) /* Select a span with class `pagina` without any anchor as child */
+ span /* Select closest span next to previous span and child of `DIV-a` */
> a /* Here is the next page to fetch */
一般来说,我的网站主要内容是 post 的列表和文字。所以我解析了这个 HTML 代码块中的每个 post 。
<div class="col-xs-12" style="margin:0.5em 0;line-height:1.785em">Some text</div>
为此我创建了这个 AsyncTask。
class NewPostsAsyncTask extends AsyncTask<String, Void, String> {
@Override
protected void onPreExecute() {
super.onPreExecute();
progressDialog = new ProgressDialog(MainActivity.this);
progressDialog.setTitle("Новые");
progressDialog.setMessage("Загрузка...");
progressDialog.setIndeterminate(false);
progressDialog.show();
}
@Override
protected String doInBackground(String... params) {
Document doc;
try {
doc = Jsoup.connect(URL).get();
content = doc.select("[style=margin:0.5em 0;line-height:1.785em]");
titleList.clear();
for (Element contents : content) {
if (!contents.text().contains("18+")) {
titleList.add(contents.text());
}
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
@Override
protected void onPostExecute(String s) {
super.onPostExecute(s);
listView.setAdapter(adapter);
progressDialog.dismiss();
}
}
但是我有一些问题。所有 post 都不会存储在一个网页上。您必须单击所有 post 末尾的 link 才能重定向到带有 post 的另一个页面。
而这个区块有这个 HTML 代码。
<div class="row"><div class="col-xs-12">
<div class="paginator">
<span class="pagina">1683</span> " | "
<span class="pagina"><a href="/page/1682">1682</a></span> " | "
<span class="pagina"><a href="/page/1681">1681</a></span> " | "
<span class="pagina"><a href="/page/1680">1680</a></span> " | "
<span class="pagina"><a href="/page/1679">1679</a></span> " | "
<span class="pagina"><a href="/page/3">3</a></span> " | "
<span class="pagina"><a href="/page/2">2</a></span> " | "
<span class="pagina"><a href="/page/1">1</a></span>
</div>
</div>
</div>
如何转到另一个页面,解析其他 posts 并在之前的 posts 之后在 ListView 中打印它们?因此,我希望在一个 ListView 中包含该网站的所有 post。你能告诉我应该怎么做吗?
我会这样做:
示例代码
@Override
protected String doInBackground(String... params) {
Document doc;
// I supposed URL variable is initialized like this: URL="killpls.me";
try {
do {
doc = Jsoup.connect(URL).get();
content = doc.select("[style=margin:0.5em 0;line-height:1.785em]");
titleList.clear();
for (Element contents : content) {
if (!contents.text().contains("18+")) {
titleList.add(contents.text());
}
}
Element anchor = doc.select( //
"#stories > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > span.pagina:not(:has(a)) + span > a" //
).first();
if (anchor==null) {
break;
} else {
doc = null;
URL = anchor.absUrl("href");
}
} while(canContinue());
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
private boolean canContinue() {
// Implement custom logic here ...
// Return true if additionnal posts should be downloaded false otherwise.
return true;
}
一些细节
该方法的核心在于以下行:
Element anchor = doc.select( //
"#stories > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > span.pagina:not(:has(a)) + span > a" //
).first();
只要下一页存在,first()
方法就会 return 非 null
引用。到达第一页时,first()
return 为空,没有更多页面可获取。
#stories /* Find an element with id `stories`*/
> div:nth-child(3) /* Select its third div child */
> div:nth-child(1) /* Select first div child of the previous div */
> div:nth-child(1) /* Select first div (DIV-a) child of the previous div */
> span.pagina:not(:has(a)) /* Select a span with class `pagina` without any anchor as child */
+ span /* Select closest span next to previous span and child of `DIV-a` */
> a /* Here is the next page to fetch */