无法使用 HtmlUnit 单击 google 的新 reCaptcha 复选框
Cannot click on google's new reCaptcha tick box using HtmlUnit
我正在尝试下载 recaptcha 的所有图像,但不知何故我无法单击 reCaptcha iframe 的复选框。
单击它时,HtmlUnit 会抛出 WrappedException。我不确定为什么会这样,并且
我应该如何点击 link 并下载图像?我猜这是 GWT 的问题。我可以点击任何其他普通按钮。
任何帮助将不胜感激
主要站点是:https://www.google.com/recaptcha/api2/demo
到目前为止我已经完成了。
private static final Logger LOG = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) throws IOException {
try (WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38)) {
webClient.getCache().clear();
final WebClientOptions webClientOptions = webClient.getOptions();
webClientOptions.setTimeout(40000);
webClientOptions.setRedirectEnabled(false);
// webClientOptions.setUseInsecureSSL(true);
webClient.setAlertHandler(new AlertHandler() {
public void handleAlert(Page page, String string) {
System.out.printf("alert: %s%n", string);
LOG.info("javascript alert: {}", string);
}
});
webClientOptions.setJavaScriptEnabled(true);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
// webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClientOptions.setThrowExceptionOnScriptError(false);
webClientOptions.setThrowExceptionOnFailingStatusCode(false);
HtmlPage reCaptchaFrame;
final HtmlPage page = webClient.getPage("https://www.google.com/recaptcha/api2/demo");
webClient.getJavaScriptEngine().pumpEventLoop(1000);
webClient.waitForBackgroundJavaScript(200);
int waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
int loopCount = 0;
while (waitForBackgroundJavaScript > 0 && loopCount < 2) {
++loopCount;
waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
if (waitForBackgroundJavaScript == 0) {
if (LOG.isTraceEnabled())
LOG.trace("HtmlUnit exits background javascript at loop counter " + loopCount);
break;
}
}
JavaScriptEngine engine = webClient.getJavaScriptEngine();
engine.holdPosponedActions();
final List<FrameWindow> frames = page.getFrames();
reCaptchaFrame = (HtmlPage) frames.get(0).getEnclosedPage();
// initiating to enter the reCaptcha
final HtmlSpan reCaptchaAnchor = reCaptchaFrame.getFirstByXPath(".//span[@id='recaptcha-anchor']");
if (reCaptchaAnchor == null) {
throw new NullPointerException("Captcha not found");
}
try {
HtmlPage page1 = reCaptchaAnchor.click(); // here I get the exception
} catch (WrappedException e) {
LOG.info("Found some stupid exception {}", e.details());
}
} catch (Exception e) {
LOG.info("Found exception {}", e.getMessage());
}
}
堆栈跟踪:
net.sourceforge.htmlunit.corejs.javascript.WrappedException: Wrapped java.lang.NullPointerException
at net.sourceforge.htmlunit.corejs.javascript.Context.throwAsScriptRuntimeEx(Con text.java:2053)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:1007)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:1072)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:789)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:732)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:679)
at recaptchatest.Main.main(Main.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.NullPointerException
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.hasTopCall(ScriptRuntime.java:3263)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:102)
at com.gargoylesoftware.htmlunit.javascript.host.Promise.execute(Promise.java:136)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:1002)
... 10 more
我建议您检查元素的实际 XPATH 是什么。
盒子里有 class='recaptcha-checkbox-checkmark'
您可以将 XPath 用于:
reCaptchaFrame.getFirstByXPath("//div[@class='recaptcha-checkbox-checkmark']");
如果失败,请尝试使用 XPATH 以外的另一种选择方法,可能是 CSS 一种,即 queryselector 方法。
即:
reCaptchaFrame.querySelector("recaptcha-checkbox-checkmark");
为了不发生转换错误,请使用 HtmlElement 转换网络上的所有元素。
HtmlElement a = reCaptchaFrame.querySelector("recaptcha-checkbox-checkmark");
a.click();
将 HTMLUnit 升级到 2.22 并将 htmlunit-core-js 库升级到 2.22 后,一切正常。
我正在尝试下载 recaptcha 的所有图像,但不知何故我无法单击 reCaptcha iframe 的复选框。 单击它时,HtmlUnit 会抛出 WrappedException。我不确定为什么会这样,并且 我应该如何点击 link 并下载图像?我猜这是 GWT 的问题。我可以点击任何其他普通按钮。
任何帮助将不胜感激
主要站点是:https://www.google.com/recaptcha/api2/demo
到目前为止我已经完成了。
private static final Logger LOG = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) throws IOException {
try (WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38)) {
webClient.getCache().clear();
final WebClientOptions webClientOptions = webClient.getOptions();
webClientOptions.setTimeout(40000);
webClientOptions.setRedirectEnabled(false);
// webClientOptions.setUseInsecureSSL(true);
webClient.setAlertHandler(new AlertHandler() {
public void handleAlert(Page page, String string) {
System.out.printf("alert: %s%n", string);
LOG.info("javascript alert: {}", string);
}
});
webClientOptions.setJavaScriptEnabled(true);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
// webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClientOptions.setThrowExceptionOnScriptError(false);
webClientOptions.setThrowExceptionOnFailingStatusCode(false);
HtmlPage reCaptchaFrame;
final HtmlPage page = webClient.getPage("https://www.google.com/recaptcha/api2/demo");
webClient.getJavaScriptEngine().pumpEventLoop(1000);
webClient.waitForBackgroundJavaScript(200);
int waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
int loopCount = 0;
while (waitForBackgroundJavaScript > 0 && loopCount < 2) {
++loopCount;
waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
if (waitForBackgroundJavaScript == 0) {
if (LOG.isTraceEnabled())
LOG.trace("HtmlUnit exits background javascript at loop counter " + loopCount);
break;
}
}
JavaScriptEngine engine = webClient.getJavaScriptEngine();
engine.holdPosponedActions();
final List<FrameWindow> frames = page.getFrames();
reCaptchaFrame = (HtmlPage) frames.get(0).getEnclosedPage();
// initiating to enter the reCaptcha
final HtmlSpan reCaptchaAnchor = reCaptchaFrame.getFirstByXPath(".//span[@id='recaptcha-anchor']");
if (reCaptchaAnchor == null) {
throw new NullPointerException("Captcha not found");
}
try {
HtmlPage page1 = reCaptchaAnchor.click(); // here I get the exception
} catch (WrappedException e) {
LOG.info("Found some stupid exception {}", e.details());
}
} catch (Exception e) {
LOG.info("Found exception {}", e.getMessage());
}
}
堆栈跟踪:
net.sourceforge.htmlunit.corejs.javascript.WrappedException: Wrapped java.lang.NullPointerException
at net.sourceforge.htmlunit.corejs.javascript.Context.throwAsScriptRuntimeEx(Con text.java:2053)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:1007)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:1072)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:789)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:732)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:679)
at recaptchatest.Main.main(Main.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.NullPointerException
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.hasTopCall(ScriptRuntime.java:3263)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:102)
at com.gargoylesoftware.htmlunit.javascript.host.Promise.execute(Promise.java:136)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:1002)
... 10 more
我建议您检查元素的实际 XPATH 是什么。
盒子里有 class='recaptcha-checkbox-checkmark'
您可以将 XPath 用于:
reCaptchaFrame.getFirstByXPath("//div[@class='recaptcha-checkbox-checkmark']");
如果失败,请尝试使用 XPATH 以外的另一种选择方法,可能是 CSS 一种,即 queryselector 方法。
即:
reCaptchaFrame.querySelector("recaptcha-checkbox-checkmark");
为了不发生转换错误,请使用 HtmlElement 转换网络上的所有元素。
HtmlElement a = reCaptchaFrame.querySelector("recaptcha-checkbox-checkmark");
a.click();
将 HTMLUnit 升级到 2.22 并将 htmlunit-core-js 库升级到 2.22 后,一切正常。