Winsock 2 从 URL 读取文本

Winsock 2 Reading text from a URL

例如,这就是我想要做的:

    if (http->Connect("http://pastebin.com/raw/9uL16CyN"))
    {
        YString data = "";
        if (http->ReceiveData(data))
        {
            std::cout << "Networked data: " << std::endl;
            std::cout << data << std::endl;
        }
        else
            std::cout << "Failed to connect to internet.\n";
    }

我尝试读取的页面是原始 ASCII 文本 (http://pastebin.com/raw/9uL16CyN)

我希望这会很容易工作,但显然不是,我不断收到 WSA 错误:WSAHOST_NOT_FOUND (11001)

我的连接函数:

bool Http::Connect(YString addr)
{
    _socket = Network::CreateConnectSocket(addr, 53); // 53 is the port
    return _socket != INVALID_SOCKET;
}

CreateConnectSocket 函数:

int iResult;
SOCKET ConnectSocket = INVALID_SOCKET;

// holds address info for socket to connect to
struct addrinfo *result = NULL,
    *ptr = NULL,
    hints;

ZeroMemory(&hints, sizeof(hints));
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
hints.ai_protocol = IPPROTO_TCP;  //TCP connection!!!

                                  //resolve server address and port
iResult = getaddrinfo(addr.c_str(), std::to_string(port).c_str(), &hints, &result);
if (iResult != 0)
{
    printf("Network::CreateSocket failed with %s as addr, and %i as port.\nError code: %i.\n", (char*)addr.c_str(), port, iResult);
    return INVALID_SOCKET;
}

for (ptr = result; ptr != NULL; ptr = ptr->ai_next) {

    // Create a SOCKET for connecting to server
    ConnectSocket = socket(ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol);

    if (ConnectSocket == INVALID_SOCKET) {
        printf("Network::CreateSocket failed with error: %ld\n", WSAGetLastError());
        return INVALID_SOCKET;
    }

    // Connect to server.
    iResult = connect(ConnectSocket, ptr->ai_addr, (int)ptr->ai_addrlen);

    if (iResult == SOCKET_ERROR)
    {
        closesocket(ConnectSocket);
        ConnectSocket = INVALID_SOCKET;
        printf("Network::CreateSocket failed the server is down... did not connect.\n");
    }
}

freeaddrinfo(result);

if (ConnectSocket == INVALID_SOCKET)
{
    printf("Network::CreateSocket failed.\n");
    return INVALID_SOCKET;
}

u_long iMode = 1;
iResult = ioctlsocket(ConnectSocket, FIONBIO, &iMode);
if (iResult == SOCKET_ERROR)
{
    printf("Network::CreateSocket ioctlsocket failed with error: %d\n", WSAGetLastError());
    closesocket(ConnectSocket);
    return INVALID_SOCKET;
}
char value = 1;
setsockopt(ConnectSocket, IPPROTO_TCP, TCP_NODELAY, &value, sizeof(value));
return ConnectSocket;

大部分来自现有资源。

您对 Connect() 的调用是错误的。您不能将完整的 URL 传递给 getaddrinfo()。只需要自己传递域名和端口号即可。顺便说一句,HTTP 端口是 80,而不是 53。

此外,您没有向服务器发送 HTTP GET 请求,要求它向您发送文本文档。在您先发送请求之前,HTTP 服务器不会发送响应。

你需要更像这样的东西:

bool Http::Connect(YString addr, int port)
{
    _socket = Network::CreateConnectSocket(addr, port);
    return _socket != INVALID_SOCKET;
}
if (http->Connect("pastebin.com", 80))
{
    YString data = "GET /raw/9uL16CyN HTTP/1.1\r\n"
                   "Host: pastebin.com\r\n"
                   "Connection: close\r\n"
                   "\r\n";

    if (http->SendData(data))
    {
        YString data = "";
        if (http->ReceiveData(data))
        {
            std::cout << "Networked data: " << std::endl;
            std::cout << data << std::endl;
        }
        else
            std::cout << "Failed to receive data from internet.\n";
    }
    else
        std::cout << "Failed to send request to Pastebin.\n";
}
else
    std::cout << "Failed to connect to Pastebin.\n";

也就是说,您需要考虑到服务器将使用 headers 构建响应数据,例如:

GET /raw/9uL16CyN HTTP/1.1
Host: pastebin.com
HTTP/1.1 200 OK
Date: Wed, 23 Dec 2015 00:00:01 GMT
Content-Type: text/plain; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=db6ba4b037d673b67757500aca4e2227b1450828801; expires=Thu, 22-Dec-16 00:00:01 GMT; path=/; domain=.pastebin.com; HttpOnly
X-Powered-By: PHP/5.5.5
Cache-Control: public, max-age=1801
Vary: Accept-Encoding
CF-Cache-Status: HIT
Expires: Wed, 23 Dec 2015 00:30:02 GMT
Server: cloudflare-nginx
CF-RAY: 258fc8a8168a2276-LAX

2a
Text, text, text, text! Some more text! :D
0

因此,假设 ReceiveData() 只是返回它接收到的任何内容,您将必须剥离那些 headers,并撤消 chunked 编码,然后才能使用的内容文本文件本身。请阅读 RFC 2616(或其后继 RFC 7230-7235),其中详细概述了 HTTP 协议。

也就是说,您应该停止尝试手动实现 HTTP(它比您意识到的更复杂),而是使用 pre-existing 库,就像 libcurl, or even Microsoft's own WinInet or WinHTTP API 一样。让他们为您完成繁重的工作。