Laravel 通过 guzzle 请求获取 RSS javascript
Laravel grab RSS by guzzle request javascript
我正在尝试使用以下代码获取 RSS。
<?php
$client = new \GuzzleHttp\Client(['User-Agent' => 'idap']);
$content = $client->request('GET', 'alarabiya.net/.mrss/ar.xml');
dd($content->getBody()->getContents());
它 returns 以下内容:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">\n
<html>\n
<head>\n
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">\n
<meta http-equiv="Content-Script-Type" content="text/javascript">\n
<script type="text/javascript">\n
function getCookie(c_name) { // Local function for getting a cookie value\n
if (document.cookie.length > 0) {\n
c_start = document.cookie.indexOf(c_name + "=");\n
if (c_start!=-1) {\n
c_start=c_start + c_name.length + 1;\n
c_end=document.cookie.indexOf(";", c_start);\n
\n
if (c_end==-1) \n
c_end = document.cookie.length;\n
\n
return unescape(document.cookie.substring(c_start,c_end));\n
}\n
}\n
return "";\n
}\n
function setCookie(c_name, value, expiredays) { // Local function for setting a value of a cookie\n
var exdate = new Date();\n
exdate.setDate(exdate.getDate()+expiredays);\n
document.cookie = c_name + "=" + escape(value) + ((expiredays==null) ? "" : ";expires=" + exdate.toGMTString()) + ";path=/";\n
}\n
function getHostUri() {\n
var loc = document.location;\n
return loc.toString();\n
}\n
setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '46.252.205.139', 10);\n
try { \n
location.reload(true); \n
} catch (err1) { \n
try { \n
location.reload(); \n
} catch (err2) { \n
\tlocation.href = getHostUri(); \n
} \n
}\n
</script>\n
</head>\n
<body>\n
<noscript>This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your browser.</noscript>\n
</body>\n
</html>\n
如何从 https://www.alarabiya.net/.mrss/ar.xml link 获取 RSS。还有很多网站没有在 RSS 中给出完整的描述。我怎样才能像 fivefilters.org 那样通过代码获得完整的描述,如果 RSS 文件很大并且加载时间很长怎么办。
谢谢,
我已更新我的答案以使用 GuzzleHttp\Client
。我自己测试了这段代码并使用了 GuzzleHttp version ^6.2
。您必须使用 composer 来安装特定版本以防万一。我假设您知道如何获取提供的代码(如下所示)和 运行 composer.
描述
当我们尝试访问 RSS 提要 http://www.alarabiya.net/.mrss/ar.xml
时,它首先会尝试查找请求从中发送到其服务器的 IP 的 cookie。如果它没有找到为 IP 设置的任何 cookie,则它会使用 Cookie_Hash:IP
设置 cookie。设置 cookie 的代码部分是:
setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '49.49.242.64', 10);
设置 cookie 后,javascript 代码会重定向浏览器。重定向后,由于已为 IP 设置了 cookie,因此请求成功完成。因此完整的 RSS 提要被发送到浏览器。
您可以阅读完整的 javascript 源代码,其中发生了所有这些事情。 header需要和我们的guzzle请求一起发送的请求可以很容易地从使用chrome/firefox.
调试工具通过浏览器发送的请求header中获取。
如果您有任何困惑,请告诉我们。
<?php
require_once 'vendor/autoload.php';
$client = new \GuzzleHttp\Client([
'base_uri' => 'http://www.alarabiya.net/',
'cookies' => true,
]);
$res = $client->request('GET', '/.mrss/ar.xml');
$firstResponse = $res->getBody();
// Search for following string
// setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '49.49.242.64', 10);
$pattern = '/[^setCookie\(\')](.*?),/';
preg_match_all($pattern, $firstResponse, $matches);
// You may have to adjust this
$cookie = $matches[1][4]; // YPF8827340282Jdskjhfiw_928937459182JAX666
$ip = $matches[1][5]; // 49.49.242.64
$cookieName = explode("'", $cookie)[1];
$cookieValue = explode("'", $ip)[1];
// Set cookie value, Cookie: $cookieName=$cookieValue
$res = $client->request('GET', '/.mrss/ar.xml', [
'headers' => [
'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 ' .
'(KHTML, like Gecko) Chrome/53.0.2785.89 Safari/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,' .
'image/webp,*/*;q=0.8',
'Accept-Encoding' => 'gzip, deflate, sdch',
'Cookie' => ["$cookieName=$cookieValue"],
'Referer' => 'http://www.alarabiya.net/.mrss/ar.xml',
'Upgrade-Insecure-Requests' => 1,
'Connection' => 'keep-alive',
],
// 'debug' => false, // Set to true for debugging
]);
echo $res->getBody();
注意:我已经用"guzzlehttp/guzzle": "^6.2"
测试了这段代码。
我正在尝试使用以下代码获取 RSS。
<?php
$client = new \GuzzleHttp\Client(['User-Agent' => 'idap']);
$content = $client->request('GET', 'alarabiya.net/.mrss/ar.xml');
dd($content->getBody()->getContents());
它 returns 以下内容:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">\n
<html>\n
<head>\n
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">\n
<meta http-equiv="Content-Script-Type" content="text/javascript">\n
<script type="text/javascript">\n
function getCookie(c_name) { // Local function for getting a cookie value\n
if (document.cookie.length > 0) {\n
c_start = document.cookie.indexOf(c_name + "=");\n
if (c_start!=-1) {\n
c_start=c_start + c_name.length + 1;\n
c_end=document.cookie.indexOf(";", c_start);\n
\n
if (c_end==-1) \n
c_end = document.cookie.length;\n
\n
return unescape(document.cookie.substring(c_start,c_end));\n
}\n
}\n
return "";\n
}\n
function setCookie(c_name, value, expiredays) { // Local function for setting a value of a cookie\n
var exdate = new Date();\n
exdate.setDate(exdate.getDate()+expiredays);\n
document.cookie = c_name + "=" + escape(value) + ((expiredays==null) ? "" : ";expires=" + exdate.toGMTString()) + ";path=/";\n
}\n
function getHostUri() {\n
var loc = document.location;\n
return loc.toString();\n
}\n
setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '46.252.205.139', 10);\n
try { \n
location.reload(true); \n
} catch (err1) { \n
try { \n
location.reload(); \n
} catch (err2) { \n
\tlocation.href = getHostUri(); \n
} \n
}\n
</script>\n
</head>\n
<body>\n
<noscript>This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your browser.</noscript>\n
</body>\n
</html>\n
如何从 https://www.alarabiya.net/.mrss/ar.xml link 获取 RSS。还有很多网站没有在 RSS 中给出完整的描述。我怎样才能像 fivefilters.org 那样通过代码获得完整的描述,如果 RSS 文件很大并且加载时间很长怎么办。
谢谢,
我已更新我的答案以使用 GuzzleHttp\Client
。我自己测试了这段代码并使用了 GuzzleHttp version ^6.2
。您必须使用 composer 来安装特定版本以防万一。我假设您知道如何获取提供的代码(如下所示)和 运行 composer.
描述
当我们尝试访问 RSS 提要 http://www.alarabiya.net/.mrss/ar.xml
时,它首先会尝试查找请求从中发送到其服务器的 IP 的 cookie。如果它没有找到为 IP 设置的任何 cookie,则它会使用 Cookie_Hash:IP
设置 cookie。设置 cookie 的代码部分是:
setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '49.49.242.64', 10);
设置 cookie 后,javascript 代码会重定向浏览器。重定向后,由于已为 IP 设置了 cookie,因此请求成功完成。因此完整的 RSS 提要被发送到浏览器。
您可以阅读完整的 javascript 源代码,其中发生了所有这些事情。 header需要和我们的guzzle请求一起发送的请求可以很容易地从使用chrome/firefox.
调试工具通过浏览器发送的请求header中获取。如果您有任何困惑,请告诉我们。
<?php
require_once 'vendor/autoload.php';
$client = new \GuzzleHttp\Client([
'base_uri' => 'http://www.alarabiya.net/',
'cookies' => true,
]);
$res = $client->request('GET', '/.mrss/ar.xml');
$firstResponse = $res->getBody();
// Search for following string
// setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '49.49.242.64', 10);
$pattern = '/[^setCookie\(\')](.*?),/';
preg_match_all($pattern, $firstResponse, $matches);
// You may have to adjust this
$cookie = $matches[1][4]; // YPF8827340282Jdskjhfiw_928937459182JAX666
$ip = $matches[1][5]; // 49.49.242.64
$cookieName = explode("'", $cookie)[1];
$cookieValue = explode("'", $ip)[1];
// Set cookie value, Cookie: $cookieName=$cookieValue
$res = $client->request('GET', '/.mrss/ar.xml', [
'headers' => [
'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 ' .
'(KHTML, like Gecko) Chrome/53.0.2785.89 Safari/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,' .
'image/webp,*/*;q=0.8',
'Accept-Encoding' => 'gzip, deflate, sdch',
'Cookie' => ["$cookieName=$cookieValue"],
'Referer' => 'http://www.alarabiya.net/.mrss/ar.xml',
'Upgrade-Insecure-Requests' => 1,
'Connection' => 'keep-alive',
],
// 'debug' => false, // Set to true for debugging
]);
echo $res->getBody();
注意:我已经用"guzzlehttp/guzzle": "^6.2"
测试了这段代码。