在 HTMLUnit 中提交按钮单击()后无法到达新页面
Cann't reach new page after submit button click() in HTMLUnit
问题如下:当我 运行 这段代码时,它一直运行到 submitButton.fireEvent("onclick").getNewPage()
,然后即使最后一个 System.out.println(pageAfterLogin.getUrl().toString())
没有执行,它似乎也结束了。程序执行过程中没有发生错误。
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import java.util.List;
public class WebScraperHTMLUnit2 {
public static void main(String[] args) {
try{
WebClient wc = new WebClient();
HtmlPage page = wc.getPage("https://www.google.com/");
HtmlInput searchForm = (HtmlInput)page.getFirstByXPath("//input[@name='q']");
searchForm.setValueAttribute("q");
HtmlElement submitButton = page.getFirstByXPath("//button[@id='searchButton']");
HtmlPage pageAfterLogin = (HtmlPage) submitButton.fireEvent("onclick").getNewPage();
System.out.println(pageAfterLogin.getUrl().toString());
} catch (Exception ex) {}
}
}
这是 NetBeans 的输出日志:
run:
дек 16, 2016 2:38:16 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.ru/' [1:14018] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
дек 16, 2016 2:38:16 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.ru/' [1:14042] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
дек 16, 2016 2:38:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
СБОРКА УСПЕШНО ЗАВЕРШЕНА (общее время: 3 секунды)
按钮的 xpath 不正确。按钮是:
<input value="Google Search" aria-label="Google Search" name="btnK" type="submit" jsaction="sf.chk">
你的代码应该是这样的:
try {
final WebClient wc = new WebClient();
wc.getOptions().setThrowExceptionOnScriptError(false);
HtmlPage page = wc.getPage("https://www.google.com/");
HtmlInput searchForm = page.getFirstByXPath("//input[@name='q']");
searchForm.setValueAttribute("q");
HtmlSubmitInput submitButton = page.getFirstByXPath("//input[@name='btnK']");
HtmlPage pageAfterLogin = submitButton.click();
System.out.println(pageAfterLogin.getUrl().toString());
} catch (Exception e) {}
您需要将 setThrowExceptionOnScriptError 添加到 false 的原因是因为抛出了错误(出于未知原因)并且您不想因此而停止执行代码。
根据 this post 在 www.google.com 上生成的 HTML 不断变化。
所以我的 //input[@name='btnK'] xpath 将来可能无法工作。
问题如下:当我 运行 这段代码时,它一直运行到 submitButton.fireEvent("onclick").getNewPage()
,然后即使最后一个 System.out.println(pageAfterLogin.getUrl().toString())
没有执行,它似乎也结束了。程序执行过程中没有发生错误。
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import java.util.List;
public class WebScraperHTMLUnit2 {
public static void main(String[] args) {
try{
WebClient wc = new WebClient();
HtmlPage page = wc.getPage("https://www.google.com/");
HtmlInput searchForm = (HtmlInput)page.getFirstByXPath("//input[@name='q']");
searchForm.setValueAttribute("q");
HtmlElement submitButton = page.getFirstByXPath("//button[@id='searchButton']");
HtmlPage pageAfterLogin = (HtmlPage) submitButton.fireEvent("onclick").getNewPage();
System.out.println(pageAfterLogin.getUrl().toString());
} catch (Exception ex) {}
}
}
这是 NetBeans 的输出日志:
run:
дек 16, 2016 2:38:16 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.ru/' [1:14018] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
дек 16, 2016 2:38:16 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.ru/' [1:14042] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <UNICODE_RANGE>, <URI>, <FUNCTION>, "progid:".)
дек 16, 2016 2:38:16 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
СБОРКА УСПЕШНО ЗАВЕРШЕНА (общее время: 3 секунды)
按钮的 xpath 不正确。按钮是:
<input value="Google Search" aria-label="Google Search" name="btnK" type="submit" jsaction="sf.chk">
你的代码应该是这样的:
try {
final WebClient wc = new WebClient();
wc.getOptions().setThrowExceptionOnScriptError(false);
HtmlPage page = wc.getPage("https://www.google.com/");
HtmlInput searchForm = page.getFirstByXPath("//input[@name='q']");
searchForm.setValueAttribute("q");
HtmlSubmitInput submitButton = page.getFirstByXPath("//input[@name='btnK']");
HtmlPage pageAfterLogin = submitButton.click();
System.out.println(pageAfterLogin.getUrl().toString());
} catch (Exception e) {}
您需要将 setThrowExceptionOnScriptError 添加到 false 的原因是因为抛出了错误(出于未知原因)并且您不想因此而停止执行代码。
根据 this post 在 www.google.com 上生成的 HTML 不断变化。 所以我的 //input[@name='btnK'] xpath 将来可能无法工作。