Goutte Scrape 登录到 https 安全网站
Goutte Scrape Login to https Secure Website
所以我尝试使用 Goutte 登录 https 网站,但出现以下错误:
cURL error 60: SSL certificate problem: unable to get local issuer certificate
500 Internal Server Error - RequestException
1 linked Exception: RingException
这是 Goutte 的创建者说要使用的代码:
use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', 'http://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, array('login' => 'fabpot', 'password' => 'xxxxxx'));
$crawler->filter('.flash-error')->each(function ($node) {
print $node->text()."\n";
});
或者这里是 Symfony 推荐的代码:
use Goutte\Client;
// make a real request to an external site
$client = new Client();
$crawler = $client->request('GET', 'https://github.com/login');
// select the form and fill in some values
$form = $crawler->selectButton('Log in')->form();
$form['login'] = 'symfonyfan';
$form['password'] = 'anypass';
// submit that form
$crawler = $client->submit($form);
问题是它们都不起作用,我收到上面发布的错误。我 CAN,但是使用我问过的这个问题中编写的代码登录:cURL Scrape then Parse/Find Specific Content
我只想使用 Symfony/Goutte 登录,这样抓取我需要的数据会更容易。请问有什么帮助或建议吗?谢谢!
在代码中添加以下内容修复错误(curl 配置):
// make a real request to an external site
$client = new Client();
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYHOST, FALSE);
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYPEER, FALSE);
$crawler = $client->request('GET', 'https://github.com/login');
但随后出现另一个错误:
The current node list is empty.
500 Internal Server Error - InvalidArgumentException
再一次,我将 Goutte 与 Symfony 和默认代码一起使用来执行测试任务,例如登录 https github。
关于 node list empty
的先前错误的修复是 Github 登录页面按钮实际上说的是 "Sign in" 而不是 提交 或 登录按钮。不幸的是,Goutte api 不清楚 $form = $crawler->selectButton('Sign in')->form();
是指 html name
属性还是按钮的实际纯文本。显然是纯文本;有点混乱。因此,在对记录不完整的 api 进行更多研究后,我以以下有效代码结束:
// make a real request to an external site
$client = new Client();
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYHOST, FALSE);
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYPEER, FALSE);
$crawler = $client->request('GET', 'https://github.com/login');
// select the form and fill in some values
$form = $crawler->selectButton('Sign in')->form();
$form['login'] = 'symfonyfan';
$form['password'] = 'anypass';
// submit that form
$crawler = $client->submit($form);
echo $crawler->html();
所以我尝试使用 Goutte 登录 https 网站,但出现以下错误:
cURL error 60: SSL certificate problem: unable to get local issuer certificate
500 Internal Server Error - RequestException
1 linked Exception: RingException
这是 Goutte 的创建者说要使用的代码:
use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', 'http://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, array('login' => 'fabpot', 'password' => 'xxxxxx'));
$crawler->filter('.flash-error')->each(function ($node) {
print $node->text()."\n";
});
或者这里是 Symfony 推荐的代码:
use Goutte\Client;
// make a real request to an external site
$client = new Client();
$crawler = $client->request('GET', 'https://github.com/login');
// select the form and fill in some values
$form = $crawler->selectButton('Log in')->form();
$form['login'] = 'symfonyfan';
$form['password'] = 'anypass';
// submit that form
$crawler = $client->submit($form);
问题是它们都不起作用,我收到上面发布的错误。我 CAN,但是使用我问过的这个问题中编写的代码登录:cURL Scrape then Parse/Find Specific Content
我只想使用 Symfony/Goutte 登录,这样抓取我需要的数据会更容易。请问有什么帮助或建议吗?谢谢!
在代码中添加以下内容修复错误(curl 配置):
// make a real request to an external site
$client = new Client();
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYHOST, FALSE);
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYPEER, FALSE);
$crawler = $client->request('GET', 'https://github.com/login');
但随后出现另一个错误:
The current node list is empty.
500 Internal Server Error - InvalidArgumentException
再一次,我将 Goutte 与 Symfony 和默认代码一起使用来执行测试任务,例如登录 https github。
关于 node list empty
的先前错误的修复是 Github 登录页面按钮实际上说的是 "Sign in" 而不是 提交 或 登录按钮。不幸的是,Goutte api 不清楚 $form = $crawler->selectButton('Sign in')->form();
是指 html name
属性还是按钮的实际纯文本。显然是纯文本;有点混乱。因此,在对记录不完整的 api 进行更多研究后,我以以下有效代码结束:
// make a real request to an external site
$client = new Client();
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYHOST, FALSE);
$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_SSL_VERIFYPEER, FALSE);
$crawler = $client->request('GET', 'https://github.com/login');
// select the form and fill in some values
$form = $crawler->selectButton('Sign in')->form();
$form['login'] = 'symfonyfan';
$form['password'] = 'anypass';
// submit that form
$crawler = $client->submit($form);
echo $crawler->html();