当 Curl 正在跟踪位置时,有什么方法可以 urlencode/escape 吗?

Is there any way to urlencode/escape while Curl is Following Location?

我正在 Api 集成 SEB 开放式银行业务,而 Curl 跟随位置它没有像普通浏览器那样对 url 进行编码。

        $url = 'https://api-sandbox.sebgroup.com/mga/sps/oauth/oauth20/authorize?' . 'client_id=XXXXXXXXXXXX&response_type=code&scope=psd2_accounts%20psd2_payments&redirect_uri=https://testcallback.com/test';
        curl_setopt_array($curl, array(
            CURLOPT_HEADER => true,
            CURLOPT_URL => $url,
            CURLOPT_RETURNTRANSFER => true,
            CURLOPT_MAXREDIRS => 10,
            CURLOPT_TIMEOUT => 30,
            CURLOPT_FOLLOWLOCATION => true,
            CURLOPT_CUSTOMREQUEST => "GET",
            CURLOPT_VERBOSE => true,
            CURLOPT_HTTPHEADER => array(
                "accept: text/html",
            ),
        ));

        $response = curl_exec($curl);
        $err = curl_error($curl);

这是来自日志

的 curl 详细信息
< HTTP/1.1 302 Found
< content-language: en-US
< date: Thu, 25 Jul 2019 21:15:49 GMT
< location: https://api-sandbox.sebgroup.com/mga/sps/authsvc?PolicyId=urn:ibm:security:authentication:asf:username_login&client_id=XXXXXXXXXXXX&response_type=code&scope=psd2_accounts psd2_payments&redirect_uri=https://testcallback.com/test&state=undefined
< p3p: CP="NON CUR OTPi OUR NOR UNI"
< x-frame-options: SAMEORIGIN
< Strict-Transport-Security: max-age=15552000; includeSubDomains
< Transfer-Encoding: chunked
< 
* Ignoring the response-body
* Connection #0 to host api-sandbox.sebgroup.com left intact
* Issue another request to this URL: 'https://api-sandbox.sebgroup.com/mga/sps/authsvc?PolicyId=urn:ibm:security:authentication:asf:username_login&client_id=XXXXXXXXXXXX&response_type=code&scope=psd2_accounts psd2_payments&redirect_uri=https://testcallback.com/test&state=undefined'
* Expire in 30000 ms for 8 (transfer 0x5572d97d6ad0)
* Found bundle for host api-sandbox.sebgroup.com: 0x5572d97744f0 [can pipeline]
* Could pipeline, but not asked to!
* Re-using existing connection! (#0) with host api-sandbox.sebgroup.com
* Connected to api-sandbox.sebgroup.com (129.178.54.70) port 443 (#0)
* Expire in 0 ms for 6 (transfer 0x5572d97d6ad0)
> GET /mga/sps/authsvc?PolicyId=urn:ibm:security:authentication:asf:username_login&client_id=XXXXXXXXXXXX&response_type=code&scope=**psd2_accounts psd2_payments**&redirect_uri=https://testcallback.com/test&state=undefined HTTP/1.1
Host: api-sandbox.sebgroup.com
Cookie: AMWEBJCT!%2Fmga!JSESSIONID=00009xuAYPuCp9GW43jcmC-CafK:f218d509-b31a-4e85-82f3-4026c87d2a41; TS01edf909=0107224bed281ed0132bcd33d1abd742777866cf59ada955adfb4e11b262eec4177bcfece6d5008e34b56a7ab37f409ab22798b97dd781fcdbe67b1d85c3acb10a1c21f2ca; TS01ef558a=0107224bed32bbf99c1c620e086bb40f0577a7d1fcada955adfb4e11b262eec4177bcfece69dd83308b2725dc487ace1c823d15bd6e2e5d0d2968f3683570ed32b96ea5da2; C0WNET=03758b02-5d3a-4321-a19f-1c022988e2f4
accept: text/html

< HTTP/1.1 400 Bad Request
< Cache-Control: no-cache
< Connection: close
< Content-Type: text/html; charset=utf-8
< Pragma: no-cache
< Content-Length: 246
< 
* Closing connection 0

此关注位置包含 space betweeb (psd2_accounts psd2_payments)。哪个没有被转换成 %20

/mga/sps/authsvc?PolicyId=urn:ibm:security:authentication:asf:username_login&client_id=XXXXXXXXXXXX&response_type=code&scope=**psd2_accounts psd2_payments**&redirect_uri=https://testcallback.com/test&state=undefined

如何对后续位置参数进行编码,以便上面的 url 自动变为

/mga/sps/authsvc?PolicyId=urn:ibm:security:authentication:asf:username_login&client_id=XXXXXXXXXXXX&response_type=code&scope=**psd2_accounts%20psd2_payments**&redirect_uri=https://testcallback.com/test&state=undefined

URL 根据定义 URL 已经编码。否则就不是URL。 HTTP 重定向应该根据定义重定向到 URLs 并且它们必须已经被 URL 编码。不这样做违反了 HTTP 规范 (Source)。 api-sandbox.sebgroup.com 网站在其重定向中未返回真正的 URL。也许您应该考虑联系他们并通知他们这个问题,因为 cURL 是访问 API.

的一种非常常见的方式

如果他们不能及时修复此问题,我不会只推荐 url-encoding Location header 因为他们可能会在将来修复它然后你会double-encoding URL,这也显然是错误的。仅当它无效时才需要对它进行 urlencode。

因此,我的建议是 删除 CURLOPT_FOLLOWLOCATION 选项以确保它不遵循重定向并添加 CURLOPT_HEADERFUNCTION,将由 curl 为每个收到的 header 调用,urlencode Location header,仅当存在且无效时,然后循环执行 curl 直到没有 Location header。由于 URL 中的空格违反规范,PHP 的 filter_var() 函数正确地认为它是无效的。

$url = 'https://example.com';

$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

// this function is called by curl for each header received
curl_setopt($ch, CURLOPT_HEADERFUNCTION,
    function($curl, $header) use (&$headers) {
        $len = strlen($header);
        $header = explode(':', $header, 2);
        if (count($header) < 2) {
            // ignore invalid headers
            return $len;
        }

        $name = strtolower(trim($header[0]));

        if ($name == 'location' && !filter_var(trim($header[1]), FILTER_VALIDATE_URL)) {
            $header[1] = urlencode(trim($header[1]));
        }

        $headers[$name][] = trim($header[1]);

        return $len;
    }
);

// Maximum number of redirects
$max_iterations = 10;
$iterations = 0;

do {
    $url = $headers['location'][0] ?? $url;
    $headers = [];
    curl_setopt($ch, CURLOPT_URL, $url);
    $data = curl_exec($ch);
    print_r($headers);

} while (isset($headers['location']) && ++$iterations < $max_iterations);