boost::asio::streambuf 通过 https 检索 xml 数据

boost::asio::streambuf retrieve xml data though https

我正在为 Asio 中的 streambuf 管理而苦恼。我在 ubuntu 上使用 boost 1.58。首先,这是代码:

#include <iostream>

#include <boost/bind.hpp>
#include <boost/asio.hpp>
#include <boost/asio/ssl.hpp>
#include <boost/asio/buffer.hpp>
#include <boost/asio/completion_condition.hpp>

class example
{
private:
    // asio components
    boost::asio::io_service service;
    boost::asio::ssl::context context;
    boost::asio::ip::tcp::resolver::query query;
    boost::asio::ip::tcp::resolver resolver;
    boost::asio::ssl::stream<boost::asio::ip::tcp::socket> socket;
    boost::asio::streambuf requestBuf, responseBuf;

    // callbacks
    void handle_resolve(const boost::system::error_code& err,
                            boost::asio::ip::tcp::resolver::iterator endpoint_iterator)
    {
        if (!err)
        {
            boost::asio::async_connect(socket.lowest_layer(), endpoint_iterator,
                boost::bind(&example::handle_connect, this,
                    boost::asio::placeholders::error));
        }
    }
    void handle_connect(const boost::system::error_code& err)
    {
        if (!err)
        {
          socket.async_handshake(boost::asio::ssl::stream_base::client,
              boost::bind(&example::handle_handshake, this,
                boost::asio::placeholders::error));
        }
    }
    void handle_handshake(const boost::system::error_code& err)
    {
        if (!err)
        {
            boost::asio::async_write(socket, requestBuf,
                boost::bind(&example::handle_write_request, this,
                    boost::asio::placeholders::error,
                    boost::asio::placeholders::bytes_transferred));
        }
    }

    void handle_write_request(const boost::system::error_code& err, size_t bytes_transferred)
        {
            if (!err)
            {
                boost::asio::async_read(socket, responseBuf,
                    boost::asio::transfer_at_least(1),
                    boost::bind(&example::handle_read, this,
                        boost::asio::placeholders::error,
                        boost::asio::placeholders::bytes_transferred));
            }
        }

    void handle_read(const boost::system::error_code& err,
                             size_t bytes_transferred)
    {
        if (!err)
        {
            boost::asio::async_read(socket, responseBuf,
                boost::asio::transfer_at_least(1),
                boost::bind(&example::handle_read, this,
                    boost::asio::placeholders::error,
                    boost::asio::placeholders::bytes_transferred));
        }
    }
public:
    example() : context(boost::asio::ssl::context::sslv23),
                resolver(service),
                socket(service, context),
                query("www.quandl.com", "443") {}

    void work()
    {
        // set security
        context.set_default_verify_paths();
        socket.set_verify_mode(boost::asio::ssl::verify_peer);

        // in case this no longer works, generate a new key from https://www.quandl.com/
        std::string api_key = "4jufXHL8S4XxyM6gzbA_";

        // build the query
        std::stringstream ss;

        ss << "api/v3/datasets/";
        ss << "RBA" << "/" << "FXRUKPS" << ".";
        ss << "xml" << "?sort_order=asc";
        ss << "?api_key=" << api_key;
        ss << "&start_date=" << "2000-01-01";
        ss << "&end_date=" << "2003-01-01";

        std::ostream request_stream(&requestBuf);
        request_stream << "GET /";
        request_stream << ss.str();
        request_stream << " HTTP/1.1\r\n";
        request_stream << "Host: " << "www.quandl.com" << "\r\n";
        request_stream << "Accept: */*\r\n";
        request_stream << "Connection: close\r\n\r\n";

        resolver.async_resolve(query,
            boost::bind(&example::handle_resolve, this,
                boost::asio::placeholders::error,
                boost::asio::placeholders::iterator));

        service.run();

        std::cout << &responseBuf;
    }
};

int main(int argc, char * argv[])
{
    // this is a test
    int retVal; try
    {
        example f; f.work();
        retVal = 0;

    }
    catch (std::exception & ex)
    {
        std::cout << "an error occured:" << ex.what() << std::endl;
        retVal = 1;
    }

    return retVal;

}

这是我的问题:如果生成的数据不是太长(几千个字符),该示例将完美运行。 但是,一旦 async_read returns 字符数不均匀(默认 bytes_transferred 是 512 个字符),streambuf 就会损坏,下一个 async_read 调用将包含一些额外的字符.

我尝试了上面代码的许多变体都没有成功:使用 transfer_exactly(),调用 streambuf.consume() 来清除缓冲区,一旦我检测到字符数量不均匀就传递另一个缓冲区返回等。None 这些解决方案有效。

我在这里错过了什么?谢谢

在评论交流中确定,服务器正在使用 chunked transfer encoding:

Chunked transfer encoding is a data transfer mechanism in version 1.1 of the Hypertext Transfer Protocol (HTTP) in which data is sent in a series of "chunks". It uses the Transfer-Encoding HTTP header in place of the Content-Length header, ...

每个块都以十六进制块长度和 CRLF 开头。如果您不熟悉分块传输,确实会出现一些奇怪的字符破坏您的数据流。

分块传输编码一般用于在发送前不方便确定响应主体的准确长度的情况。因此,接收方在处理最终的零长度块之前不知道主体长度(请注意,尾随 "headers",又名 "trailers" 可能跟在最终块之后)。

有了boost::asio,就可以用async_read_until() to read the chunk header through the CRLF delimiter, parse the length, and then use async_read() with a transfer_exactly to get the chunk data. Note that once you begin using a streambuf for your reads, you should continue using the same streambuf instance because it may buffer additional data (extracting a particular amount of data from a streambuf is discussed )。另请注意,块数据以 CRLF(不包括在长度中)结尾,您应该丢弃它。

用 boost::asio 编写自己的 HTTP 客户端可能很有启发性(如果您有时间和好奇心,甚至会很有趣),但涵盖所有选项(例如压缩、压缩、 HTTP 标准中的预告片、重定向)。您可能需要考虑像 libcurl 这样的成熟客户端库是否适合您的需求。