使用 CasperJS 检查 AJAX 加载的 JS objects/class?

Inspecting AJAX loaded JS objects/class with CasperJS?

我使用的示例与 Checking JavaScript AJAX loaded resources with Mink/Zombie in PHP?:

中的示例相同

test_JSload.php

<?php
if (array_key_exists("QUERY_STRING", $_SERVER)) {
  if ($_SERVER["QUERY_STRING"] == "getone") {
    echo "<!doctype html>
  <html>
  <head>
  <script src='test_JSload.php?gettwo'></script>
  </head>
  </html>
  ";
    exit;
  }

  if ($_SERVER["QUERY_STRING"] == "gettwo") {
    header('Content-Type: application/javascript');
    echo "
  function person(firstName) {
    this.firstName = firstName;
    this.changeName = function (name) {
        this.firstName = name;
    };
  }
  ";
    exit;
  }
}
?>
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
  <style type="text/css">
.my_btn { background-color:yellow; }
  </style>
  <script src="http://code.jquery.com/jquery-1.12.4.min.js"></script>
  <script type="text/javascript">
var thishref = window.location.href.slice(0, window.location.href.indexOf('?')+1);
var qstr = window.location.href.slice(window.location.href.indexOf('?')+1);

function OnGetdata(inbtn) {
  console.log("OnGetdata; loading ?getone via AJAX call");
  //~ $.ajax(thishref + "?getone", { // works
  var ptest = {}; // init as empty object
  console.log(" ptest pre ajax is ", ptest);

  $.ajax({url: thishref + "?getone",
    async: true, // still "Synchronous XMLHttpRequest on the main thread is deprecated", because we load a script; 
    success: function(data) {
      console.log("got getone data "); //, data);
      $("#dataholder").html(data);
      ptest = new person("AHA");
      console.log(" ptest post getone is ", ptest);
    },
    error: function(xhr, ajaxOptions, thrownError) {
      console.log("getone error " + thishref + " : " + xhr.status + " / " + thrownError);
    }
  });

  ptest.changeName("Somename");
  console.log(" ptest post ajax is ", ptest);
}

ondocready = function() {
  $("#getdatabtn").click(function(){
    OnGetdata(this);
  });
}
$(document).ready(ondocready);
  </script>
</head>


<body>
  <h1>Hello World!</h1>

  <button type="button" id="getdatabtn" class="my_btn">Get Data!</button>
  <div id="dataholder"></div>
</body>
</html>

然后,您可以 运行 一个临时服务器 PHP > 5.4 CLI(命令行),在同一目录(.php 文件):

php -S localhost:8080

...最后,您可以访问位于 http://127.0.0.1:8080/test_JSload.php 的页面。

简单地说,在此页面中,当单击按钮时,JavaScript class 分两次加载 - 首先是 HTML 和 <script>标记,其脚本将在第二遍中加载。此操作的 Firefox 在控制台中打印:

OnGetdata; loading ?getone via AJAX call      test_JSload.php:13:3
 ptest pre ajax is  Object {  }               test_JSload.php:16:3
TypeError: ptest.changeName is not a function test_JSload.php:31:3
got getone data                               test_JSload.php:21:7
Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help http://xhr.spec.whatwg.org/ jquery-1.12.4.min.js:4:26272
 ptest post getone is  Object { firstName: "AHA", changeName: person/this.changeName(name) } test_JSload.php:24:7

我最终想检查 CasperJS 中的 ptest 变量或 person class。到目前为止,我制作了这个脚本:

test_JSload_casper.js

// run with:
// ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js
// based on http://code-epicenter.com/how-to-login-to-amazon-using-casperjs-working-example/

var casper = require('casper').create({
  pageSettings: {
    loadImages: false,//The script is much faster when this field is set to false
    loadPlugins: false,
    userAgent: 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
  }
});

//First step is to open page
casper.start().thenOpen("http://127.0.0.1:8080/test_JSload.php", function() {
  console.log("website opened");
});

//Second step is to click to the button
casper.then(function(){
   this.evaluate(function(){
    document.getElementById("getdatabtn").click();
   });
});

//Wait for JS to execute?!, then inspect
casper.then(function(){
  console.log("After login...");
  console.log("AA " + JSON.stringify(person));
});

casper.run();

...然而,当我 运行 这个 CasperJS 脚本时,我得到的只是:

$ ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js
website opened
After login...

... 仅此而已。请注意,最后一行 console.log("AA " + JSON.stringify(person)); 甚至没有部分执行(即,没有打印 "AA ",也没有任何类型的错误消息)。

那么,是否可以使用 Casper JS 检查像这样的资源(AJAX 加载的 JS objects/classes,可能加载多个 runs/steps)——如果可以,怎么做?

通过点击触发的 Ajax 请求可能没有足够的时间对您正在抓取的页面产生影响。确保使用众多 wait* 函数之一等待它完成。如果 DOM 由于 Ajax 请求而更改,那么我建议 waitForSelector.

一个相关的问题是页面的 JavaScript 已损坏。由于填充 ptest 的 Ajax 请求是异步的,因此 ptest.changeName("Somename") 在响应到达之前执行,从而导致 TypeError。您可以将 ptest.changeName(...) 移动到 Ajax 请求的 success 回调。

为了从页面上看到控制台消息,您必须收听 'remote.message' event:

casper.on("remote.message", function(msg){
    this.echo("remote> " + msg);
});

casper.start(...)...

我将 post 这作为部分答案,因为至少我设法打印了 person class - 诀窍是使用 casper.evaluate 到 运行 脚本(即 console.log(person))就像在远程页面上一样(见下文)。但是,还有一些问题我不清楚(我很乐意接受澄清这一点的答案):

  • person class 应该只有在 ?gettwo 请求完成后才存在,并且已经检索到相应的JS;但是,casperjs 仅报告调用了 ?getone,而不是调用了 ?gettwo ??!为什么?
  • 如果我尝试在最后的 .then(... 中使用 JSON.stringify(person)__utils__.echo('plop');,则脚本执行会中断,就好像出现致命错误一样 - 然而,没有相关错误被举报,即使我收听多条消息;为什么?

否则,这里是修改后的 test_JSload_casper.js 文件:

// run with:
// ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js

var casper = require('casper').create({
  verbose: true,
  logLevel: 'debug',
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',
  pageSettings: {
    loadImages: false,//The script is much faster when this field is set to false
    loadPlugins: false
  }
});


casper.on('remote.message', function(message) {
  this.echo('remote message caught: ' + message);
});

casper.on('resource.received', function(resource) {
  var status = resource.status;
  casper.log('Resource received ' + resource.url + ' (' + status + ')');
});

casper.on("resource.error", function(resourceError) {
  this.echo("Resource error: " + "Error code: "+resourceError.errorCode+" ErrorString: "+resourceError.errorString+" url: "+resourceError.url+" id: "+resourceError.id, "ERROR");
});

casper.on("page.error", function(msg, trace) {
  this.echo("Page Error: " + msg, "ERROR");
});

// http://docs.casperjs.org/en/latest/events-filters.html#page-initialized
casper.on("page.initialized", function(page) {
  // CasperJS doesn't provide `onResourceTimeout`, so it must be set through
  // the PhantomJS means. This is only possible when the page is initialized
  page.onResourceTimeout = function(request) {
    console.log('Response Timeout (#' + request.id + '): ' + JSON.stringify(request));
  };
});


//Second step is to click to the button
casper.then(function(){
   this.evaluate(function(){
    document.getElementById("getdatabtn").click();
   });
   //~ this.wait(2000, function() { // fires, but ?gettwo never gets listed
    //~ console.log("Done waiting");
   //~ });

  //~ this.waitForResource(/\?gettwo$/, function() { // does not ever fire: "Wait timeout of 5000ms expired, exiting."
    //~ this.echo('a gettwo has been loaded.');
  //~ });
});

//Wait for JS to execute?!, then inspect
casper.then(function(){
  console.log("After login...");

  // Code inside of this function will run
  // as if it was placed inside the target page.
  casper.evaluate(function(term) {
    //~ console.log("EEE", ptest); // Page Error: ReferenceError: Can't find variable: ptest
    console.log("EEE", person); // does dump the class function
  });

  __utils__.echo('plop'); // script BREAKS here....
  console.log("BB ");
  console.log("AA " + JSON.stringify(person));
});

casper.run();

这个输出是:

$ ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_php_mink/test_JSload_casper.js 
[info] [phantom] Starting...
[info] [phantom] Running suite: 4 steps
[debug] [phantom] opening url: http://127.0.0.1:8080/test_JSload.php, HTTP GET
[debug] [phantom] Navigation requested: url=http://127.0.0.1:8080/test_JSload.php, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php (200)
[debug] [phantom] url changed to "http://127.0.0.1:8080/test_JSload.php"
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php (200)
[debug] [phantom] Resource received http://code.jquery.com/jquery-1.12.4.min.js (200)
[debug] [phantom] Resource received http://code.jquery.com/jquery-1.12.4.min.js (200)
[debug] [phantom] Successfully injected Casper client-side utilities
[info] [phantom] Step anonymous 2/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
website opened
[info] [phantom] Step anonymous 2/4: done in 312ms.
[info] [phantom] Step anonymous 3/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
remote message caught: OnGetdata; loading ?getone via AJAX call
remote message caught:  ptest pre ajax is  [object Object]
Page Error: TypeError: undefined is not a function (evaluating 'ptest.changeName("Somename")')
[info] [phantom] Step anonymous 3/4: done in 337ms.
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php?getone (200)
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php?getone (200)
remote message caught: got getone data 
remote message caught:  ptest post getone is  [object Object]
[info] [phantom] Step anonymous 4/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
After login...
remote message caught: EEE function person(firstName) {
    this.firstName = firstName;
    this.changeName = function (name) {
        this.firstName = name;
    };
  }
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"

从“EEE”消息中可以看出,person class(函数)被正确报告 - 即使 http://127.0.0.1:8080/test_JSload.php?gettwo(定义它) 从未被列为加载资源..