MRI 的并发请求 Ruby

Question

我整理了一个简单示例，试图使用一个基本示例来证明 Rails 中的并发请求。请注意，我使用的是 MRI Ruby2 和 Rails 4.2.

  def api_call
    sleep(10)
    render :json => "done"
  end

然后我在 mac（I7 / 4 Core）上转到 Chrome 中的 4 个不同的选项卡，看看它们是否得到运行串行或并行（真正并发的是接近但不是一回事）。即 http://localhost:3000/api_call

我无法使用 Puma、Thin 或 Unicorn 使其正常工作。每个请求都是按顺序出现的。 10 秒后第一个选项卡，20 秒后第二个选项卡（因为它必须等待第一个完成），之后是第三个选项卡....

据我所读，我相信以下内容是正确的（请纠正我）并且是我的结果：

Unicorn 是多进程的，我的示例应该可以运行（在 unicorn.rb 配置文件中定义了工人数量之后），但它没有运行。我可以看到 4 名工人开始工作，但一切都是按顺序进行的。我正在使用 unicorn-rails gem，从 rails 开始使用 unicorn -c config/unicorn.rb，在我的 unicorn.rb 中我有：

-- unicorn.rb

worker_processes 4
preload_app true
timeout 30
listen 3000
after_fork do |server, worker|
  ActiveRecord::Base.establish_connection
end

Thin 和 Puma 是多线程的（尽管 Puma 至少有一个“clustered”模式，您可以在其中使用 -w 参数启动 worker）并且无论如何（在多线程模式下）不应该使用 MRI Ruby2.0 因为 "there is a Global Interpreter Lock (GIL) that ensures only one thread can be run at a time"。

所以，

我有一个有效的例子吗（或者使用 sleep 是错误的）？
我上面关于多进程和多线程（关于 MRI Rails 2）的陈述是否正确？
关于为什么我不能让它与 Unicorn（或与此相关的任何服务器）一起工作的任何想法？

有一个非常 similar question to mine 但我不能让它像回答的那样工作，它没有回答我关于使用 MRI 的并发请求的所有问题 Ruby。

Github project: https://github.com/afrankel/limitedBandwidth (注：project is looking than this question of multi-process/threading on the server)

Answer 1

我邀请您阅读 Jesse Storimer 的系列文章 Nobody understands the GIL 它可能会帮助您更好地理解一些 MRI 内部结构。

我还发现了Pragmatic Concurrency with Ruby，读起来很有趣。它有一些并发测试的例子。

编辑： 另外我可以推荐这篇文章Removing config.threadsafe! 可能与 Rails 4 无关，但它解释了配置选项，您可以使用其中之一来允许并发。

让我们讨论一下您问题的答案。

即使使用 Puma，您也可以有多个线程（使用 MRI）。 GIL 确保一次只有一个线程处于活动状态，这是开发人员称之为限制性的约束（因为没有真正的并行执行）。请记住，GIL 不保证 gua运行tee 线程安全。这并不意味着其他线程没有运行ning，它们正在等待轮到它们。它们可以交错（文章可以帮助更好地理解）。

让我澄清一些术语：辅助进程、线程。进程运行在单独的内存 space 中并且可以服务于多个线程。同一进程的线程运行在共享内存 space 中，这是它们的进程。对于线程，我们指的是 Ruby 个线程，而不是 CPU 个线程。

关于您的问题配置和您分享的 GitHub 存储库，我认为合适的配置（我使用的是 Puma）是设置 4 个 worker 和 1 到 40 个线程。这个想法是一个工人服务一个标签。每个选项卡最多发送 10 个请求。

让我们开始吧：

我在虚拟机上 Ubuntu 工作。因此，首先我在我的虚拟机设置中启用了 4 个内核（以及我认为可能有帮助的其他一些设置）。我可以在我的机器上验证这一点。所以我同意了。

Linux command --> lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 69
Stepping:              1
CPU MHz:               2306.141
BogoMIPS:              4612.28
L1d cache:             32K
L1d cache:             32K
L2d cache:             6144K
NUMA node0 CPU(s):     0-3

我使用了您共享的 GitHub 项目并稍作修改。我创建了一个名为 puma.rb 的 Puma 配置文件（放在 config 目录中），内容如下：

workers Integer(ENV['WEB_CONCURRENCY'] || 1)
threads_count = Integer(ENV['MAX_THREADS'] || 1)
threads 1, threads_count

preload_app!

rackup      DefaultRackup
port        ENV['PORT']     || 3000
environment ENV['RACK_ENV'] || 'development'

on_worker_boot do
  # Worker specific setup for Rails 4.1+
  # See: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#on-worker-boot
  #ActiveRecord::Base.establish_connection
end

默认情况下，Puma 以 1 个工作线程和 1 个线程启动。您可以使用环境变量来修改这些参数。我这样做了：

export MAX_THREADS=40
export WEB_CONCURRENCY=4

使用我输入的配置启动 Puma

bundle exec puma -C config/puma.rb

在 Rails 应用程序目录中。

我用四个选项卡打开浏览器来调用应用程序的 URL。

第一个请求在 15:45:05 左右开始，最后一个请求在 15h49:44 左右。那是经过了 4 分 39 秒的时间。您还可以在日志文件中以未排序的顺序查看请求的 ID。（见下文）

GitHub 项目中的每个 API 调用休眠 15 秒。我们有四个 4 选项卡，每个选项卡有 10 个 API 调用。这使得最大运行时间为 600 秒，即 10 分钟（在严格的串行模式下）。

理论上理想的结果应该是全部并行，经过的时间离15秒不远，但我完全没想到。我不确定结果会是什么，但我仍然感到非常惊讶（考虑到我在虚拟机上运行并且 MRI 受到 GIL 和其他一些因素的限制）。本次测试的运行时间小于最大运行时间的一半（严格串行模式下），我们将结果切割成不到一半。

EDIT I read further about the Rack::Lock that wraps a mutex around each request (Third article above). I found the option config.allow_concurrency = true to be a time saver. A little caveat was to increase the connection pool (though the request do no query the database had to be set accordingly); the number of maximum threads is a good default. 40 in this case.

I tested the app with jRuby and the actual elapsed time was 2mins, with allow_concurrency=true.

I tested the app with MRI and the actual elapsed time was 1min47s, with allow_concurrency=true. This was a big surprise to me. This really surprised me, because I expected MRI to be slower than JRuby. It was not. This makes me questioning the widespread discussion about the speed differences between MRI and JRuby.

Watching the responses on the different tabs are "more random" now. It happens that tab 3 or 4 completes before tab 1, which I requested first.

I think because you don't have race conditions the test seems to be OK. However, I am not sure about the application wide consequences if you set config.allow_concurrency=true in a real world application.

请随时查看并让我知道您的读者可能有的任何反馈。我的机器上仍然有克隆。如果您有兴趣，请告诉我。

按顺序回答您的问题：

我认为你的例子在结果上是有效的。然而，对于并发性，最好使用共享资源进行测试（例如在第二篇文章中）。
关于您的陈述，正如本文开头所述回答，MRI 是多线程的，但是被 GIL 限制为一个活动的一次穿线。这就提出了一个问题：MRI 不是更好吗？测试更多的进程和更少的线程？我真的不知道，一个第一个猜测是没有或差别不大。也许有人可以阐明这一点。
我觉得你的例子很好。只是需要一些轻微的修改。

附录

日志文件Rails应用程序：

**config.allow_concurrency = false (by default)**
-> Ideally 1 worker per core, each worker servers up to 10 threads.

[3045] Puma starting in cluster mode...
[3045] * Version 2.11.2 (ruby 2.1.5-p273), codename: Intrepid Squirrel
[3045] * Min threads: 1, max threads: 40
[3045] * Environment: development
[3045] * Process workers: 4
[3045] * Preloading application
[3045] * Listening on tcp://0.0.0.0:3000
[3045] Use Ctrl-C to stop
[3045] - Worker 0 (pid: 3075) booted, phase: 0
[3045] - Worker 1 (pid: 3080) booted, phase: 0
[3045] - Worker 2 (pid: 3087) booted, phase: 0
[3045] - Worker 3 (pid: 3098) booted, phase: 0
Started GET "/assets/angular-ui-router/release/angular-ui-router.js?body=1" for 127.0.0.1 at 2015-05-11 15:45:05 +0800
...
...
...
Processing by ApplicationController#api_call as JSON
  Parameters: {"t"=>"15?id=9"}
Completed 200 OK in 15002ms (Views: 0.2ms | ActiveRecord: 0.0ms)
[3075] 127.0.0.1 - - [11/May/2015:15:49:44 +0800] "GET /api_call.json?t=15?id=9 HTTP/1.1" 304 - 60.0230

**config.allow_concurrency = true**
-> Ideally 1 worker per core, each worker servers up to 10 threads.

[22802] Puma starting in cluster mode...
[22802] * Version 2.11.2 (ruby 2.2.0-p0), codename: Intrepid Squirrel
[22802] * Min threads: 1, max threads: 40
[22802] * Environment: development
[22802] * Process workers: 4
[22802] * Preloading application
[22802] * Listening on tcp://0.0.0.0:3000
[22802] Use Ctrl-C to stop
[22802] - Worker 0 (pid: 22832) booted, phase: 0
[22802] - Worker 1 (pid: 22835) booted, phase: 0
[22802] - Worker 3 (pid: 22852) booted, phase: 0
[22802] - Worker 2 (pid: 22843) booted, phase: 0
Started GET "/" for 127.0.0.1 at 2015-05-13 17:58:20 +0800
Processing by ApplicationController#index as HTML
  Rendered application/index.html.erb within layouts/application (3.6ms)
Completed 200 OK in 216ms (Views: 200.0ms | ActiveRecord: 0.0ms)
[22832] 127.0.0.1 - - [13/May/2015:17:58:20 +0800] "GET / HTTP/1.1" 200 - 0.8190
...
...
...
Completed 200 OK in 15003ms (Views: 0.1ms | ActiveRecord: 0.0ms)
[22852] 127.0.0.1 - - [13/May/2015:18:00:07 +0800] "GET /api_call.json?t=15?id=10 HTTP/1.1" 304 - 15.0103

**config.allow_concurrency = true (by default)**
-> Ideally each thread serves a request.

Puma starting in single mode...
* Version 2.11.2 (jruby 2.2.2), codename: Intrepid Squirrel
* Min threads: 1, max threads: 40
* Environment: development
NOTE: ActiveRecord 4.2 is not (yet) fully supported by AR-JDBC, please help us finish 4.2 support - check http://bit.ly/jruby-42 for starters
* Listening on tcp://0.0.0.0:3000
Use Ctrl-C to stop
Started GET "/" for 127.0.0.1 at 2015-05-13 18:23:04 +0800
Processing by ApplicationController#index as HTML
  Rendered application/index.html.erb within layouts/application (35.0ms)
...
...
...
Completed 200 OK in 15020ms (Views: 0.7ms | ActiveRecord: 0.0ms)
127.0.0.1 - - [13/May/2015:18:25:19 +0800] "GET /api_call.json?t=15?id=9 HTTP/1.1" 304 - 15.0640

Answer 2

@Elyasin 和@Arthur Frankel，我创建了这个 repo 用于在 MRI 和 JRuby 中测试 Puma 运行。在这个小项目中，我没有做 sleep 来模拟一个很长的运行请求。正如我在 MRI 中发现的那样，GIL 似乎以不同于常规处理的方式对待它，更类似于外部 I/O 请求。

我把斐波那契数列计算放在控制器里了。在我的机器上，fib(39) 在 JRuby 中花费了 6.x 秒，在 MRI 中花费了 11 秒，这足以显示差异。

我打开了 2 个浏览器 windows。我没有在同一浏览器中打开选项卡，而是这样做是为了防止浏览器发送到同一域的并发请求受到某些限制。我现在确定了细节，但 2 种不同的浏览器足以防止这种情况发生。

我测试了 thin + MRI，然后是 Puma + MRI，然后是 Puma + JRuby。结果是：

thin + MRI: 并不奇怪，当我快速重新加载两个浏览器时，第一个在 11 秒后完成。然后第二个请求开始，又花了11秒才完成。
先说Puma+JRuby。当我快速重新加载这两个浏览器时，它们似乎几乎同时启动，也同时完成。两者都花了大约 6.9 秒才能完成。 Puma 是多线程服务器，JRuby 支持多线程。
最后是 Puma + MRI。在我快速重新加载这两个浏览器后，两个浏览器都花了 22 秒才完成。他们几乎同时开始，也几乎同时结束。但是两者都花了两倍的时间才完成。这正是 GIL 所做的：在线程之间切换以实现并发，但锁本身会阻止并行性的发生。

关于我的设置：

服务器全部在 Rails 生产模式下启动。在生产模式下，config.cache_classes 设置为 true，这意味着 config.allow_concurrency = true
Puma 启动时最少 8 个线程，最多 8 个线程。

MRI 的并发请求 Ruby

Concurrent requests with MRI Ruby

ruby

multithreading

ruby-on-rails

multiprocessing

ruby-on-rails-4

附录