使用 JSoup 获取被点击按钮 javascript 隐藏的 table 的内容
Getting the content of a table that being hidden by an onclick button javascript using JSoup
我正在创建一个网络抓取供个人在游戏中使用。
这是我要抓取的网站:http://forum.toribash.com/clan_war.php?clanid=139
而且我想计算名字出现在 "shows detail" 上的频率。
我读过这个Get content from javascript onClick hyperlink without knowing that if this actually what i am searching for. I have a doubt that this is not what i am searching for, but regardless i have not try the answer of that questions since i have no idea on how to make this 符合我的要求。
BufferedReader month = new BufferedReader(new InputStreamReader(System.in));
String mth = month.readLine();
//Accessing the website
Document docs = Jsoup.connect("http://forum.toribash.com/clan_war.php?clanid=139").get();
//Taking every entry of war history
Elements collection = docs.getElementsByClass("war_history_entry");
//Itterate every collection
for(Element e : collection){
//if the info is on the exact month that are being searched we will use the e
if(e.getElementsByClass("war_info").text().split(" ")[1].equalsIgnoreCase(mth)){
//supposedly it holds every element that has player as it class inside of the button onclick
//But it doesn't work
Elements cek = e.getElementsByClass("player");
for(Element c : cek){
System.out.println(c.text());
}
}
现在我希望至少能在节目详情中得到名字table
Kaito
Chax
Draku
等等
此页面不包含您要抓取的信息。单击按钮后,结果由 AJAX (Javascript) 加载。
您可以使用 Web 浏览器的调试器查看“网络”选项卡,了解单击该按钮时发生的情况。
单击按钮
<button id="buttonwarid19557" ... >
从 URL:
加载 table
http://forum.toribash.com/clan_war_ajax.php?warid=19557&clanid=139
注意相同的 ID 号。
你要做的是从每个按钮中获取id,然后为每个按钮获取另一个文档并逐一解析。无论如何,这就是您的网络浏览器所做的。
BufferedReader month = new BufferedReader(new InputStreamReader(System.in));
String mth = month.readLine();
//Accessing the website
Document docs = Jsoup.connect("http://forum.toribash.com/clan_war.php?clanid=139").get();
//Taking every entry of war history
Elements collection = docs.getElementsByClass("war_history_entry");
//Itterate every collection
for(Element e : collection){
//if the info is on the exact month that are being searched we will use the e
if(e.getElementsByClass("war_info").text().split(" ")[1].equalsIgnoreCase(mth)){
// selecting button
Element button = e.selectFirst("button");
// getting warid from button id
String buttonId = button.attr("id");
// removing text because we need only number
String warId = buttonId.replace("buttonwarid", "");
System.out.println("downloading results for " + e.getElementsByClass("war_info").text());
// downloading and parsing subpage containing table with info about single war
// adding referrer to make the request look more like it comes from the real web browser to avoid possible hotlinking protection
Document table = Jsoup.connect("http://forum.toribash.com/clan_war_ajax.php?warid=" + warId + "&clanid=139").referrer("http://forum.toribash.com/clan_war.php?clanid=139").get();
// get every <td class="player"> ... </td>
Elements players = table.select(".player");
for(Element player : players){
System.out.println(player.text());
}
}
}
我正在创建一个网络抓取供个人在游戏中使用。 这是我要抓取的网站:http://forum.toribash.com/clan_war.php?clanid=139
而且我想计算名字出现在 "shows detail" 上的频率。
我读过这个Get content from javascript onClick hyperlink without knowing that if this actually what i am searching for. I have a doubt that this is not what i am searching for, but regardless i have not try the answer of that questions since i have no idea on how to make this 符合我的要求。
BufferedReader month = new BufferedReader(new InputStreamReader(System.in));
String mth = month.readLine();
//Accessing the website
Document docs = Jsoup.connect("http://forum.toribash.com/clan_war.php?clanid=139").get();
//Taking every entry of war history
Elements collection = docs.getElementsByClass("war_history_entry");
//Itterate every collection
for(Element e : collection){
//if the info is on the exact month that are being searched we will use the e
if(e.getElementsByClass("war_info").text().split(" ")[1].equalsIgnoreCase(mth)){
//supposedly it holds every element that has player as it class inside of the button onclick
//But it doesn't work
Elements cek = e.getElementsByClass("player");
for(Element c : cek){
System.out.println(c.text());
}
}
现在我希望至少能在节目详情中得到名字table
Kaito
Chax
Draku
等等
此页面不包含您要抓取的信息。单击按钮后,结果由 AJAX (Javascript) 加载。 您可以使用 Web 浏览器的调试器查看“网络”选项卡,了解单击该按钮时发生的情况。 单击按钮
<button id="buttonwarid19557" ... >
从 URL:
加载 tablehttp://forum.toribash.com/clan_war_ajax.php?warid=19557&clanid=139
注意相同的 ID 号。
你要做的是从每个按钮中获取id,然后为每个按钮获取另一个文档并逐一解析。无论如何,这就是您的网络浏览器所做的。
BufferedReader month = new BufferedReader(new InputStreamReader(System.in));
String mth = month.readLine();
//Accessing the website
Document docs = Jsoup.connect("http://forum.toribash.com/clan_war.php?clanid=139").get();
//Taking every entry of war history
Elements collection = docs.getElementsByClass("war_history_entry");
//Itterate every collection
for(Element e : collection){
//if the info is on the exact month that are being searched we will use the e
if(e.getElementsByClass("war_info").text().split(" ")[1].equalsIgnoreCase(mth)){
// selecting button
Element button = e.selectFirst("button");
// getting warid from button id
String buttonId = button.attr("id");
// removing text because we need only number
String warId = buttonId.replace("buttonwarid", "");
System.out.println("downloading results for " + e.getElementsByClass("war_info").text());
// downloading and parsing subpage containing table with info about single war
// adding referrer to make the request look more like it comes from the real web browser to avoid possible hotlinking protection
Document table = Jsoup.connect("http://forum.toribash.com/clan_war_ajax.php?warid=" + warId + "&clanid=139").referrer("http://forum.toribash.com/clan_war.php?clanid=139").get();
// get every <td class="player"> ... </td>
Elements players = table.select(".player");
for(Element player : players){
System.out.println(player.text());
}
}
}