使用 CasperJS 检查 AJAX 加载的 JS objects/class?
Inspecting AJAX loaded JS objects/class with CasperJS?
我使用的示例与 Checking JavaScript AJAX loaded resources with Mink/Zombie in PHP?:
中的示例相同
test_JSload.php
<?php
if (array_key_exists("QUERY_STRING", $_SERVER)) {
if ($_SERVER["QUERY_STRING"] == "getone") {
echo "<!doctype html>
<html>
<head>
<script src='test_JSload.php?gettwo'></script>
</head>
</html>
";
exit;
}
if ($_SERVER["QUERY_STRING"] == "gettwo") {
header('Content-Type: application/javascript');
echo "
function person(firstName) {
this.firstName = firstName;
this.changeName = function (name) {
this.firstName = name;
};
}
";
exit;
}
}
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<style type="text/css">
.my_btn { background-color:yellow; }
</style>
<script src="http://code.jquery.com/jquery-1.12.4.min.js"></script>
<script type="text/javascript">
var thishref = window.location.href.slice(0, window.location.href.indexOf('?')+1);
var qstr = window.location.href.slice(window.location.href.indexOf('?')+1);
function OnGetdata(inbtn) {
console.log("OnGetdata; loading ?getone via AJAX call");
//~ $.ajax(thishref + "?getone", { // works
var ptest = {}; // init as empty object
console.log(" ptest pre ajax is ", ptest);
$.ajax({url: thishref + "?getone",
async: true, // still "Synchronous XMLHttpRequest on the main thread is deprecated", because we load a script;
success: function(data) {
console.log("got getone data "); //, data);
$("#dataholder").html(data);
ptest = new person("AHA");
console.log(" ptest post getone is ", ptest);
},
error: function(xhr, ajaxOptions, thrownError) {
console.log("getone error " + thishref + " : " + xhr.status + " / " + thrownError);
}
});
ptest.changeName("Somename");
console.log(" ptest post ajax is ", ptest);
}
ondocready = function() {
$("#getdatabtn").click(function(){
OnGetdata(this);
});
}
$(document).ready(ondocready);
</script>
</head>
<body>
<h1>Hello World!</h1>
<button type="button" id="getdatabtn" class="my_btn">Get Data!</button>
<div id="dataholder"></div>
</body>
</html>
然后,您可以 运行 一个临时服务器 PHP > 5.4 CLI(命令行),在同一目录(.php
文件):
php -S localhost:8080
...最后,您可以访问位于 http://127.0.0.1:8080/test_JSload.php
的页面。
简单地说,在此页面中,当单击按钮时,JavaScript class 分两次加载 - 首先是 HTML 和 <script>
标记,其脚本将在第二遍中加载。此操作的 Firefox 在控制台中打印:
OnGetdata; loading ?getone via AJAX call test_JSload.php:13:3
ptest pre ajax is Object { } test_JSload.php:16:3
TypeError: ptest.changeName is not a function test_JSload.php:31:3
got getone data test_JSload.php:21:7
Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help http://xhr.spec.whatwg.org/ jquery-1.12.4.min.js:4:26272
ptest post getone is Object { firstName: "AHA", changeName: person/this.changeName(name) } test_JSload.php:24:7
我最终想检查 CasperJS 中的 ptest
变量或 person
class。到目前为止,我制作了这个脚本:
test_JSload_casper.js
// run with:
// ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js
// based on http://code-epicenter.com/how-to-login-to-amazon-using-casperjs-working-example/
var casper = require('casper').create({
pageSettings: {
loadImages: false,//The script is much faster when this field is set to false
loadPlugins: false,
userAgent: 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
}
});
//First step is to open page
casper.start().thenOpen("http://127.0.0.1:8080/test_JSload.php", function() {
console.log("website opened");
});
//Second step is to click to the button
casper.then(function(){
this.evaluate(function(){
document.getElementById("getdatabtn").click();
});
});
//Wait for JS to execute?!, then inspect
casper.then(function(){
console.log("After login...");
console.log("AA " + JSON.stringify(person));
});
casper.run();
...然而,当我 运行 这个 CasperJS 脚本时,我得到的只是:
$ ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js
website opened
After login...
... 仅此而已。请注意,最后一行 console.log("AA " + JSON.stringify(person));
甚至没有部分执行(即,没有打印 "AA ",也没有任何类型的错误消息)。
那么,是否可以使用 Casper JS 检查像这样的资源(AJAX 加载的 JS objects/classes,可能加载多个 runs/steps)——如果可以,怎么做?
通过点击触发的 Ajax 请求可能没有足够的时间对您正在抓取的页面产生影响。确保使用众多 wait*
函数之一等待它完成。如果 DOM 由于 Ajax 请求而更改,那么我建议 waitForSelector
.
一个相关的问题是页面的 JavaScript 已损坏。由于填充 ptest
的 Ajax 请求是异步的,因此 ptest.changeName("Somename")
在响应到达之前执行,从而导致 TypeError。您可以将 ptest.changeName(...)
移动到 Ajax 请求的 success
回调。
为了从页面上看到控制台消息,您必须收听 'remote.message' event:
casper.on("remote.message", function(msg){
this.echo("remote> " + msg);
});
casper.start(...)...
我将 post 这作为部分答案,因为至少我设法打印了 person
class - 诀窍是使用 casper.evaluate
到 运行 脚本(即 console.log(person)
)就像在远程页面上一样(见下文)。但是,还有一些问题我不清楚(我很乐意接受澄清这一点的答案):
-
person
class 应该只有在 ?gettwo
请求完成后才存在,并且已经检索到相应的JS;但是,casperjs
仅报告调用了 ?getone
,而不是调用了 ?gettwo
??!为什么?
- 如果我尝试在最后的
.then(...
中使用 JSON.stringify(person)
或 __utils__.echo('plop');
,则脚本执行会中断,就好像出现致命错误一样 - 然而,没有相关错误被举报,即使我收听多条消息;为什么?
否则,这里是修改后的 test_JSload_casper.js
文件:
// run with:
// ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js
var casper = require('casper').create({
verbose: true,
logLevel: 'debug',
userAgent: 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',
pageSettings: {
loadImages: false,//The script is much faster when this field is set to false
loadPlugins: false
}
});
casper.on('remote.message', function(message) {
this.echo('remote message caught: ' + message);
});
casper.on('resource.received', function(resource) {
var status = resource.status;
casper.log('Resource received ' + resource.url + ' (' + status + ')');
});
casper.on("resource.error", function(resourceError) {
this.echo("Resource error: " + "Error code: "+resourceError.errorCode+" ErrorString: "+resourceError.errorString+" url: "+resourceError.url+" id: "+resourceError.id, "ERROR");
});
casper.on("page.error", function(msg, trace) {
this.echo("Page Error: " + msg, "ERROR");
});
// http://docs.casperjs.org/en/latest/events-filters.html#page-initialized
casper.on("page.initialized", function(page) {
// CasperJS doesn't provide `onResourceTimeout`, so it must be set through
// the PhantomJS means. This is only possible when the page is initialized
page.onResourceTimeout = function(request) {
console.log('Response Timeout (#' + request.id + '): ' + JSON.stringify(request));
};
});
//Second step is to click to the button
casper.then(function(){
this.evaluate(function(){
document.getElementById("getdatabtn").click();
});
//~ this.wait(2000, function() { // fires, but ?gettwo never gets listed
//~ console.log("Done waiting");
//~ });
//~ this.waitForResource(/\?gettwo$/, function() { // does not ever fire: "Wait timeout of 5000ms expired, exiting."
//~ this.echo('a gettwo has been loaded.');
//~ });
});
//Wait for JS to execute?!, then inspect
casper.then(function(){
console.log("After login...");
// Code inside of this function will run
// as if it was placed inside the target page.
casper.evaluate(function(term) {
//~ console.log("EEE", ptest); // Page Error: ReferenceError: Can't find variable: ptest
console.log("EEE", person); // does dump the class function
});
__utils__.echo('plop'); // script BREAKS here....
console.log("BB ");
console.log("AA " + JSON.stringify(person));
});
casper.run();
这个输出是:
$ ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_php_mink/test_JSload_casper.js
[info] [phantom] Starting...
[info] [phantom] Running suite: 4 steps
[debug] [phantom] opening url: http://127.0.0.1:8080/test_JSload.php, HTTP GET
[debug] [phantom] Navigation requested: url=http://127.0.0.1:8080/test_JSload.php, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php (200)
[debug] [phantom] url changed to "http://127.0.0.1:8080/test_JSload.php"
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php (200)
[debug] [phantom] Resource received http://code.jquery.com/jquery-1.12.4.min.js (200)
[debug] [phantom] Resource received http://code.jquery.com/jquery-1.12.4.min.js (200)
[debug] [phantom] Successfully injected Casper client-side utilities
[info] [phantom] Step anonymous 2/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
website opened
[info] [phantom] Step anonymous 2/4: done in 312ms.
[info] [phantom] Step anonymous 3/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
remote message caught: OnGetdata; loading ?getone via AJAX call
remote message caught: ptest pre ajax is [object Object]
Page Error: TypeError: undefined is not a function (evaluating 'ptest.changeName("Somename")')
[info] [phantom] Step anonymous 3/4: done in 337ms.
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php?getone (200)
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php?getone (200)
remote message caught: got getone data
remote message caught: ptest post getone is [object Object]
[info] [phantom] Step anonymous 4/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
After login...
remote message caught: EEE function person(firstName) {
this.firstName = firstName;
this.changeName = function (name) {
this.firstName = name;
};
}
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"
从“EEE
”消息中可以看出,person
class(函数)被正确报告 - 即使 http://127.0.0.1:8080/test_JSload.php?gettwo
(定义它) 从未被列为加载资源..
我使用的示例与 Checking JavaScript AJAX loaded resources with Mink/Zombie in PHP?:
中的示例相同test_JSload.php
<?php
if (array_key_exists("QUERY_STRING", $_SERVER)) {
if ($_SERVER["QUERY_STRING"] == "getone") {
echo "<!doctype html>
<html>
<head>
<script src='test_JSload.php?gettwo'></script>
</head>
</html>
";
exit;
}
if ($_SERVER["QUERY_STRING"] == "gettwo") {
header('Content-Type: application/javascript');
echo "
function person(firstName) {
this.firstName = firstName;
this.changeName = function (name) {
this.firstName = name;
};
}
";
exit;
}
}
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<style type="text/css">
.my_btn { background-color:yellow; }
</style>
<script src="http://code.jquery.com/jquery-1.12.4.min.js"></script>
<script type="text/javascript">
var thishref = window.location.href.slice(0, window.location.href.indexOf('?')+1);
var qstr = window.location.href.slice(window.location.href.indexOf('?')+1);
function OnGetdata(inbtn) {
console.log("OnGetdata; loading ?getone via AJAX call");
//~ $.ajax(thishref + "?getone", { // works
var ptest = {}; // init as empty object
console.log(" ptest pre ajax is ", ptest);
$.ajax({url: thishref + "?getone",
async: true, // still "Synchronous XMLHttpRequest on the main thread is deprecated", because we load a script;
success: function(data) {
console.log("got getone data "); //, data);
$("#dataholder").html(data);
ptest = new person("AHA");
console.log(" ptest post getone is ", ptest);
},
error: function(xhr, ajaxOptions, thrownError) {
console.log("getone error " + thishref + " : " + xhr.status + " / " + thrownError);
}
});
ptest.changeName("Somename");
console.log(" ptest post ajax is ", ptest);
}
ondocready = function() {
$("#getdatabtn").click(function(){
OnGetdata(this);
});
}
$(document).ready(ondocready);
</script>
</head>
<body>
<h1>Hello World!</h1>
<button type="button" id="getdatabtn" class="my_btn">Get Data!</button>
<div id="dataholder"></div>
</body>
</html>
然后,您可以 运行 一个临时服务器 PHP > 5.4 CLI(命令行),在同一目录(.php
文件):
php -S localhost:8080
...最后,您可以访问位于 http://127.0.0.1:8080/test_JSload.php
的页面。
简单地说,在此页面中,当单击按钮时,JavaScript class 分两次加载 - 首先是 HTML 和 <script>
标记,其脚本将在第二遍中加载。此操作的 Firefox 在控制台中打印:
OnGetdata; loading ?getone via AJAX call test_JSload.php:13:3
ptest pre ajax is Object { } test_JSload.php:16:3
TypeError: ptest.changeName is not a function test_JSload.php:31:3
got getone data test_JSload.php:21:7
Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help http://xhr.spec.whatwg.org/ jquery-1.12.4.min.js:4:26272
ptest post getone is Object { firstName: "AHA", changeName: person/this.changeName(name) } test_JSload.php:24:7
我最终想检查 CasperJS 中的 ptest
变量或 person
class。到目前为止,我制作了这个脚本:
test_JSload_casper.js
// run with:
// ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js
// based on http://code-epicenter.com/how-to-login-to-amazon-using-casperjs-working-example/
var casper = require('casper').create({
pageSettings: {
loadImages: false,//The script is much faster when this field is set to false
loadPlugins: false,
userAgent: 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
}
});
//First step is to open page
casper.start().thenOpen("http://127.0.0.1:8080/test_JSload.php", function() {
console.log("website opened");
});
//Second step is to click to the button
casper.then(function(){
this.evaluate(function(){
document.getElementById("getdatabtn").click();
});
});
//Wait for JS to execute?!, then inspect
casper.then(function(){
console.log("After login...");
console.log("AA " + JSON.stringify(person));
});
casper.run();
...然而,当我 运行 这个 CasperJS 脚本时,我得到的只是:
$ ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js
website opened
After login...
... 仅此而已。请注意,最后一行 console.log("AA " + JSON.stringify(person));
甚至没有部分执行(即,没有打印 "AA ",也没有任何类型的错误消息)。
那么,是否可以使用 Casper JS 检查像这样的资源(AJAX 加载的 JS objects/classes,可能加载多个 runs/steps)——如果可以,怎么做?
通过点击触发的 Ajax 请求可能没有足够的时间对您正在抓取的页面产生影响。确保使用众多 wait*
函数之一等待它完成。如果 DOM 由于 Ajax 请求而更改,那么我建议 waitForSelector
.
一个相关的问题是页面的 JavaScript 已损坏。由于填充 ptest
的 Ajax 请求是异步的,因此 ptest.changeName("Somename")
在响应到达之前执行,从而导致 TypeError。您可以将 ptest.changeName(...)
移动到 Ajax 请求的 success
回调。
为了从页面上看到控制台消息,您必须收听 'remote.message' event:
casper.on("remote.message", function(msg){
this.echo("remote> " + msg);
});
casper.start(...)...
我将 post 这作为部分答案,因为至少我设法打印了 person
class - 诀窍是使用 casper.evaluate
到 运行 脚本(即 console.log(person)
)就像在远程页面上一样(见下文)。但是,还有一些问题我不清楚(我很乐意接受澄清这一点的答案):
-
person
class 应该只有在?gettwo
请求完成后才存在,并且已经检索到相应的JS;但是,casperjs
仅报告调用了?getone
,而不是调用了?gettwo
??!为什么? - 如果我尝试在最后的
.then(...
中使用JSON.stringify(person)
或__utils__.echo('plop');
,则脚本执行会中断,就好像出现致命错误一样 - 然而,没有相关错误被举报,即使我收听多条消息;为什么?
否则,这里是修改后的 test_JSload_casper.js
文件:
// run with:
// ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js
var casper = require('casper').create({
verbose: true,
logLevel: 'debug',
userAgent: 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',
pageSettings: {
loadImages: false,//The script is much faster when this field is set to false
loadPlugins: false
}
});
casper.on('remote.message', function(message) {
this.echo('remote message caught: ' + message);
});
casper.on('resource.received', function(resource) {
var status = resource.status;
casper.log('Resource received ' + resource.url + ' (' + status + ')');
});
casper.on("resource.error", function(resourceError) {
this.echo("Resource error: " + "Error code: "+resourceError.errorCode+" ErrorString: "+resourceError.errorString+" url: "+resourceError.url+" id: "+resourceError.id, "ERROR");
});
casper.on("page.error", function(msg, trace) {
this.echo("Page Error: " + msg, "ERROR");
});
// http://docs.casperjs.org/en/latest/events-filters.html#page-initialized
casper.on("page.initialized", function(page) {
// CasperJS doesn't provide `onResourceTimeout`, so it must be set through
// the PhantomJS means. This is only possible when the page is initialized
page.onResourceTimeout = function(request) {
console.log('Response Timeout (#' + request.id + '): ' + JSON.stringify(request));
};
});
//Second step is to click to the button
casper.then(function(){
this.evaluate(function(){
document.getElementById("getdatabtn").click();
});
//~ this.wait(2000, function() { // fires, but ?gettwo never gets listed
//~ console.log("Done waiting");
//~ });
//~ this.waitForResource(/\?gettwo$/, function() { // does not ever fire: "Wait timeout of 5000ms expired, exiting."
//~ this.echo('a gettwo has been loaded.');
//~ });
});
//Wait for JS to execute?!, then inspect
casper.then(function(){
console.log("After login...");
// Code inside of this function will run
// as if it was placed inside the target page.
casper.evaluate(function(term) {
//~ console.log("EEE", ptest); // Page Error: ReferenceError: Can't find variable: ptest
console.log("EEE", person); // does dump the class function
});
__utils__.echo('plop'); // script BREAKS here....
console.log("BB ");
console.log("AA " + JSON.stringify(person));
});
casper.run();
这个输出是:
$ ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_php_mink/test_JSload_casper.js
[info] [phantom] Starting...
[info] [phantom] Running suite: 4 steps
[debug] [phantom] opening url: http://127.0.0.1:8080/test_JSload.php, HTTP GET
[debug] [phantom] Navigation requested: url=http://127.0.0.1:8080/test_JSload.php, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php (200)
[debug] [phantom] url changed to "http://127.0.0.1:8080/test_JSload.php"
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php (200)
[debug] [phantom] Resource received http://code.jquery.com/jquery-1.12.4.min.js (200)
[debug] [phantom] Resource received http://code.jquery.com/jquery-1.12.4.min.js (200)
[debug] [phantom] Successfully injected Casper client-side utilities
[info] [phantom] Step anonymous 2/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
website opened
[info] [phantom] Step anonymous 2/4: done in 312ms.
[info] [phantom] Step anonymous 3/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
remote message caught: OnGetdata; loading ?getone via AJAX call
remote message caught: ptest pre ajax is [object Object]
Page Error: TypeError: undefined is not a function (evaluating 'ptest.changeName("Somename")')
[info] [phantom] Step anonymous 3/4: done in 337ms.
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php?getone (200)
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php?getone (200)
remote message caught: got getone data
remote message caught: ptest post getone is [object Object]
[info] [phantom] Step anonymous 4/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
After login...
remote message caught: EEE function person(firstName) {
this.firstName = firstName;
this.changeName = function (name) {
this.firstName = name;
};
}
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"
从“EEE
”消息中可以看出,person
class(函数)被正确报告 - 即使 http://127.0.0.1:8080/test_JSload.php?gettwo
(定义它) 从未被列为加载资源..