服务器在 16000 个请求后挂起一段时间

Server hangs for while after 16000 requests

我是 boost::asio 的新手。正在尝试 运行

ab -n 20000 -c 5  -r http://127.0.0.1:9999/

测试每次都在 16000 个请求后卡住。但它确实完成了。我也收到很多失败的请求。

代码在做什么:

代码如下:

#include <iostream>
#include <functional>
#include <string>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/thread.hpp>
#include <memory>

// global variable for service and acceptor
boost::asio::io_service ioService;
boost::asio::ip::tcp::acceptor accp(ioService);

// callback for accept
void onAccept(const boost::system::error_code &ec, shared_ptr<boost::asio::ip::tcp::socket> soc) {
    using boost::asio::ip::tcp;
    soc->send(boost::asio::buffer("In Accept"));
    soc->shutdown(boost::asio::ip::tcp::socket::shutdown_send);
    soc.reset(new tcp::socket(ioService));
    accp.async_accept(*soc, [=](const boost::system::error_code &ec) {
            onAccept(ec, soc);
        });
}

int main(int argc, char *argv[]) {
    using boost::asio::ip::tcp;
    boost::asio::ip::tcp::resolver resolver(ioService);
    try {
        boost::asio::ip::tcp::resolver::query query("127.0.0.1", boost::lexical_cast<std::string>(9999));
        boost::asio::ip::tcp::endpoint endpoint = *resolver.resolve(query);
        accp.open(endpoint.protocol());
        accp.set_option(boost::asio::ip::tcp::acceptor::reuse_address(true));
        accp.bind(endpoint);
        cout << "Ready to accept @ 9999" << endl;

        auto t1 = boost::thread([&]() { ioService.run(); });

        accp.listen(boost::asio::socket_base::max_connections);
        std::shared_ptr<tcp::socket> soc = make_shared<tcp::socket>(ioService);

        accp.async_accept(*soc, [=](const boost::system::error_code &ec) { onAccept(ec, soc); });

        t1.join();
    } catch (std::exception &ex) {
        std::cout << "[" << boost::this_thread::get_id() << "] Exception: " << ex.what() << std::endl;
    }
}

完整性:

  1. 我根据@A运行mu
  2. 更改了我的代码
  3. 我使用 docker 和 linux 因为 @david-schwartz
  4. 建议的套接字问题
  5. 服务器现在永远不会挂起。
    • 单线程 - 每秒 6045 个请求
    • 线程 - 每秒 5849 个请求
  6. 使用 async_write

首先,让我们把事情做的更正确一点。我更改了代码以使用独立的 asio 而不是 boost 并使用 c++14 功能。使用您的原始代码,我通过更改减少了很多失败。

代码:

#include <iostream>
#include <functional>
#include <string>
#include <asio.hpp>
#include <thread>
#include <memory>
#include <system_error>
#include <chrono>

//global variable for service and acceptor
asio::io_service ioService;
asio::ip::tcp::acceptor accp(ioService); 

const char* response = "HTTP/1.1 200 OK\r\n\r\n\r\n";

//callback for accept 
void onAccept(const std::error_code& ec, std::shared_ptr<asio::ip::tcp::socket> soc)
{
    using asio::ip::tcp;
    soc->set_option(asio::ip::tcp::no_delay(true));
    auto buf = new asio::streambuf;
    asio::async_read_until(*soc, *buf, "\r\n\r\n",
        [=](auto ec, auto siz) {
          asio::write(*soc, asio::buffer(response, std::strlen(response)));
          soc->shutdown(asio::ip::tcp::socket::shutdown_send);
          delete buf;
          soc->close();
        });
    auto nsoc = std::make_shared<tcp::socket>(ioService);
    //soc.reset(new tcp::socket(ioService));
    accp.async_accept(*nsoc, [=](const std::error_code& ec){
      onAccept(ec, nsoc);
    });

}

int main( int argc, char * argv[] )
{
    using asio::ip::tcp;
    asio::ip::tcp::resolver resolver(ioService);

    try{
        asio::ip::tcp::resolver::query query( 
            "127.0.0.1", 
            std::to_string(9999)
        );

     asio::ip::tcp::endpoint endpoint = *resolver.resolve( query );
     accp.open( endpoint.protocol() );
     accp.set_option( asio::ip::tcp::acceptor::reuse_address( true ) );
     accp.bind( endpoint );

     std::cout << "Ready to accept @ 9999" << std::endl;

     auto t1 = std::thread([&]() { ioService.run(); });
     auto t2 = std::thread([&]() { ioService.run(); });

     accp.listen( 1000 );

     std::shared_ptr<tcp::socket> soc = std::make_shared<tcp::socket>(ioService);

     accp.async_accept(*soc, [=](const std::error_code& ec) {
                                onAccept(ec, soc);
                              });

    t1.join();
    t2.join();
    } catch(const std::exception & ex){
      std::cout << "[" << std::this_thread::get_id()
        << "] Exception: " << ex.what() << std::endl;
    } catch (...) {
      std::cerr << "Caught unknown exception" << std::endl;
    }
}

主要变化是:

  1. 发送正确的 HTTP 响应。
  2. 阅读请求。否则你只是在填满你的套接字接收缓冲区。
  3. 正确的套接字关闭。
  4. 使用多线程。这主要是 Mac OS 所必需的,linux.
  5. 不需要

使用的测试命令:ab -n 20000 -c 1 -r http://127.0.0.1:9999/

Linux 上,测试很快就通过了,没有任何错误,并且没有为 io_service 使用额外的线程。

但是,在 Mac 我能够重现该问题,即它在处理 16000 个请求后卡住了。那一刻的过程样本是:

Call graph:
    906 Thread_1887605   DispatchQueue_1: com.apple.main-thread  (serial)
    + 906 start  (in libdyld.dylib) + 1  [0x7fff868bc5c9]
    +   906 main  (in server_hangs_so) + 2695  [0x10d3622b7]
    +     906 std::__1::thread::join()  (in libc++.1.dylib) + 20  [0x7fff86ad6ba0]
    +       906 __semwait_signal  (in libsystem_kernel.dylib) + 10  [0x7fff8f44c48a]
    906 Thread_1887609
    + 906 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fff8d0983ed]
    +   906 _pthread_start  (in libsystem_pthread.dylib) + 176  [0x7fff8d09afd7]
    +     906 _pthread_body  (in libsystem_pthread.dylib) + 131  [0x7fff8d09b05a]
    +       906 void* std::__1::__thread_proxy<std::__1::tuple<main::$_2> >(void*)  (in server_hangs_so) + 124  [0x10d36317c]
    +         906 asio::detail::scheduler::run(std::__1::error_code&)  (in server_hangs_so) + 181  [0x10d36bc25]
    +           906 asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::__1::error_code const&)  (in server_hangs_so) + 393  [0x10d36bfe9]
    +             906 kevent  (in libsystem_kernel.dylib) + 10  [0x7fff8f44d21a]
    906 Thread_1887610
      906 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fff8d0983ed]
        906 _pthread_start  (in libsystem_pthread.dylib) + 176  [0x7fff8d09afd7]
          906 _pthread_body  (in libsystem_pthread.dylib) + 131  [0x7fff8d09b05a]
            906 void* std::__1::__thread_proxy<std::__1::tuple<main::$_3> >(void*)  (in server_hangs_so) + 124  [0x10d36324c]
              906 asio::detail::scheduler::run(std::__1::error_code&)  (in server_hangs_so) + 181  [0x10d36bc25]
                906 asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::__1::error_code const&)  (in server_hangs_so) + 263  [0x10d36bf67]
                  906 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff8f44c136]

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
        __psynch_cvwait  (in libsystem_kernel.dylib)        906
        __semwait_signal  (in libsystem_kernel.dylib)        906
        kevent  (in libsystem_kernel.dylib)        906

只有在提供附加线程后,我才能完成测试,结果如下:

Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
Completed 18000 requests
Completed 20000 requests
Finished 20000 requests


Server Software:
Server Hostname:        127.0.0.1
Server Port:            9999

Document Path:          /
Document Length:        2 bytes

Concurrency Level:      1
Time taken for tests:   33.328 seconds
Complete requests:      20000
Failed requests:        3
   (Connect: 1, Receive: 1, Length: 1, Exceptions: 0)
Total transferred:      419979 bytes
HTML transferred:       39998 bytes
Requests per second:    600.09 [#/sec] (mean)
Time per request:       1.666 [ms] (mean)
Time per request:       1.666 [ms] (mean, across all concurrent requests)
Transfer rate:          12.31 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0  30.7      0    4346
Processing:     0    1 184.4      0   26075
Waiting:        0    0   0.0      0       1
Total:          0    2 186.9      0   26075

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%  26075 (longest request)

可能被卡住的线程的堆栈跟踪:

* thread #3: tid = 0x0002, 0x00007fff8f44d21a libsystem_kernel.dylib`kevent + 10, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff8f44d21a libsystem_kernel.dylib`kevent + 10
    frame #1: 0x0000000109c482ec server_hangs_so`asio::detail::kqueue_reactor::run(bool, asio::detail::op_queue<asio::detail::scheduler_operation>&) + 268
    frame #2: 0x0000000109c48039 server_hangs_so`asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::__1::error_code const&) + 393
    frame #3: 0x0000000109c47c75 server_hangs_so`asio::detail::scheduler::run(std::__1::error_code&) + 181
    frame #4: 0x0000000109c3f2fc server_hangs_so`void* std::__1::__thread_proxy<std::__1::tuple<main::$_3> >(void*) + 124
    frame #5: 0x00007fff8d09b05a libsystem_pthread.dylib`_pthread_body + 131
    frame #6: 0x00007fff8d09afd7 libsystem_pthread.dylib`_pthread_start + 176
    frame #7: 0x00007fff8d0983ed libsystem_pthread.dylib`thread_start + 13

这可能是 asio 或 mac 系统本身(不太可能)kqueue_reactor 实施的问题

更新: libevent 也观察到相同的行为。因此,asio 实施不是问题。这一定是 kqueue 内核实现中的一些错误。 linux 上的 epoll 未发现此问题。

您 运行 没有本地插座。您不应该通过从单个 IP 地址生成所有负载来进行测试。 (此外,您的负载生成器应该足够智能,能够检测并解决这种情况,但遗憾的是,很多都没有。)