Docker daemon/container 与 Ubuntu (Linux) 主机实时调度
Docker daemon/container real-time scheduling with Ubuntu (Linux) host
在我开始之前,对于是否应该在 SuperUser 或 Whosebug 中提出这个问题,我有两种想法 - 如果它的位置不正确,请提前致歉。
我有一个 docker 容器(包含 C/C++ 可执行代码),它执行 audio/video 处理。因此,我想测试 运行 将容器与 RT 调度约束结合起来的好处。在网上搜索时,我遇到了各种各样的信息,但我正在努力将所有信息放在一起。
系统环境:
- 主机:Ubuntu(库存)Zesty 17.04(无 RT 内核补丁,内核:4.10.0-35-genric)
- Docker版本:17.05.0-ce
- Docker 图片 OS: Ubuntu Zesty 17.04.
在嵌套在 docker image/container 中的可执行文件中,执行以下代码将调度程序从 'SCHED_OTHER' 更改为 'SCHED_FIFO'(请参阅 docs ):
struct sched_param sched = {};
const int nMin = sched_get_priority_min(SCHED_FIFO);
const int nMax = sched_get_priority_max(SCHED_FIFO);
const int nHlf = (nMax - nMin) / 2;
const int nPriority = nMin + nHlf + 1;
sched.sched_priority = boost::algorithm::clamp(nPriority, nMin, nMax);
if (sched_setscheduler(0, SCHED_FIFO, &sched) < 0)
std::cerr << "SETSCHEDULER failed - err = " << strerror(errno) << std::endl;
else
std::cout << "Priority set to \"" << sched.sched_priority << "\"" << std::endl;
我一直在阅读有关使用实时调度程序的各种 Docker 文档。一个有趣的 page 状态,
Verify that CONFIG_RT_GROUP_SCHED is enabled in the Linux kernel by running zcat /proc/config.gz | grep CONFIG_RT_GROUP_SCHED or by checking for the existence of the file /sys/fs/cgroup/cpu.rt_runtime_us. For guidance on configuring the kernel realtime scheduler, consult the documentation for your operating system.
根据上述建议,股票 Ubuntu Zesty 17.04 OS 似乎未通过这些检查。
第一个问题:我不能使用RT调度程序吗?什么是 'CONFIG_RT_GROUP_SCHED'?让我感到困惑的一件事是,2010-2012 年网络上有一些关于使用 RT 补丁修补内核的旧帖子。 Linux kernel中好像从那时候开始有一些soft RT相关的工作
引用 here 引发了我的问题:
From kernel version 2.6.18 onward, however, Linux is gradually becoming equipped with real-time capabilities, most of which are derived from the former realtime-preempt patches developed by Ingo Molnar, Thomas Gleixner, Steven Rostedt, and others. Until the patches have been completely merged into the mainline kernel (this is expected to be around kernel version 2.6.30), they must be installed to achieve the best real-time performance. These patches are named:
进行中...
阅读其他信息后,我注意到设置 ulimits 很重要。我改变了 /etc/security/limits.conf:
#* soft core 0
#root hard core 100000
#* hard rss 10000
# NEW ADDITION
gavin hard rtprio 99
第二个问题:想必以上是启用docker 守护进程到运行 RT 所必需的吗?看起来好像守护进程是通过 systemd 控制的。
我继续进一步调查,在同一个 Docker 文档页面上看到以下片段:
To run containers using the realtime scheduler, run the Docker daemon with the --cpu-rt-runtime flag set to the maximum number of microseconds reserved for realtime tasks per runtime period. For instance, with the default period of 10000 microseconds (1 second), setting --cpu-rt-runtime=95000 ensures that containers using the realtime scheduler can run for 95000 microseconds for every 10000-microsecond period, leaving at least 5000 microseconds available for non-realtime tasks. To make this configuration permanent on systems which use systemd, see Control and configure Docker with systemd.
在 this page 之后,我发现守护进程有两个参数值得关注:
--cpu-rt-period int Limit the CPU real-time period in microseconds
--cpu-rt-runtime int Limit the CPU real-time runtime in microseconds
同一个页面指出 docker daemon parameters can be specified via '/etc/docker/daemon.json',所以我尝试了:
{
"cpu-rt-period": 92500,
"cpu-rt-runtime": 100000
}
注意:文档没有将上述选项指定为'allowed configuration options on Linux'。我想我还是会试一试。
Docker 重启后的守护程序输出:
-- Logs begin at Wed 2017-10-04 09:58:38 BST, end at Wed 2017-10-04 10:01:32 BST. --
Oct 04 09:58:47 gavin systemd[1]: Starting Docker Application Container Engine...
Oct 04 09:58:47 gavin dockerd[1501]: time="2017-10-04T09:58:47.885882588+01:00" level=info msg="libcontainerd: new containerd process, pid: 1531"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.053986072+01:00" level=warning msg="failed to rename /var/lib/docker/tmp for background deletion: %!s(<nil>).
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.161303803+01:00" level=info msg="[graphdriver] using prior storage driver: aufs"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.303409053+01:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304002725+01:00" level=warning msg="Your kernel does not support swap memory limit"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304078792+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304201239+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.305534113+01:00" level=info msg="Loading containers: start."
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.730193030+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemo
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.784938130+01:00" level=info msg="Loading containers: done."
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.888035017+01:00" level=info msg="Daemon has completed initialization"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.888104120+01:00" level=info msg="Docker daemon" commit=89658be graphdriver=aufs version=17.05.0-ce
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.903280645+01:00" level=info msg="API listen on /var/run/docker.sock"
Oct 04 09:58:48 gavin systemd[1]: Started Docker Application Container Engine.
特别感兴趣的行:
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304078792+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304201239+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
考虑到我之前的发现,这并不奇怪。
最后一个问题:当这最终起作用时,我如何才能确定我的容器是真正的运行 RT 调度? 'top' 之类的就够了吗?
编辑:我运行一个内核诊断script which I found通过github上的moby。这是输出:
warning: /proc/config.gz does not exist, searching other paths for kernel config ...
info: reading kernel config from /boot/config-4.10.0-35-generic ...
Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled
Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: missing
(cgroup swap accounting is currently not enabled, you can enable it by setting boot option "swapaccount=1")
- CONFIG_LEGACY_VSYSCALL_EMULATE: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
- "overlay":
- CONFIG_VXLAN: enabled (as module)
Optional (for encrypted networks):
- CONFIG_CRYPTO: enabled
- CONFIG_CRYPTO_AEAD: enabled
- CONFIG_CRYPTO_GCM: enabled (as module)
- CONFIG_CRYPTO_SEQIV: enabled
- CONFIG_CRYPTO_GHASH: enabled (as module)
- CONFIG_XFRM: enabled
- CONFIG_XFRM_USER: enabled (as module)
- CONFIG_XFRM_ALGO: enabled (as module)
- CONFIG_INET_ESP: enabled (as module)
- CONFIG_INET_XFRM_MODE_TRANSPORT: enabled (as module)
- "ipvlan":
- CONFIG_IPVLAN: enabled (as module)
- "macvlan":
- CONFIG_MACVLAN: enabled (as module)
- CONFIG_DUMMY: enabled (as module)
- "ftp,tftp client in container":
- CONFIG_NF_NAT_FTP: enabled (as module)
- CONFIG_NF_CONNTRACK_FTP: enabled (as module)
- CONFIG_NF_NAT_TFTP: enabled (as module)
- CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
- "aufs":
- CONFIG_AUFS_FS: enabled (as module)
- "btrfs":
- CONFIG_BTRFS_FS: enabled (as module)
- CONFIG_BTRFS_FS_POSIX_ACL: enabled
- "devicemapper":
- CONFIG_BLK_DEV_DM: enabled
- CONFIG_DM_THIN_PROVISIONING: enabled (as module)
- "overlay":
- CONFIG_OVERLAY_FS: enabled (as module)
- "zfs":
- /dev/zfs: missing
- zfs command: missing
- zpool command: missing
Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000
显着性线:
- CONFIG_RT_GROUP_SCHED: missing
容器等级
在容器内进行 RT 调度有两种选择:
- 添加
SYS_NICE
功能
docker run --cap-add SYS_NICE ...
- 使用带有
--privileged
标志的特权模式
docker run --privileged ...
注意:--privileged
flag 授予了比必要更多的权限!
越有限的 --cap-add SYS_NICE
选项更安全。
OS 系统配置
您可能还必须在您的 sysctl 中启用实时调度。
如果您是 运行 作为 root 用户(Docker 容器的默认值):
sysctl -w kernel.sched_rt_runtime_us=-1
要使其永久化(更新您的图像):
echo 'kernel.sched_rt_runtime_us=-1' >> /etc/sysctl.conf
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
did not work for me. What I had to do (clearly after having compiled my kernel with support for control groups) was to 首先用以下命令:
$ sudo systemctl stop docker
$ sudo systemctl stop docker.socket
然后我可以重新打开守护进程,为其控制组分配一个大的时间片(例如950000
):
$ sudo dockerd --cpu-rt-runtime=950000
对 Docker 守护程序的这些更改可以通过按照 here and here.
中描述的方式进行配置来永久化。
然后我终于可以使用实时调度程序启动我的容器,如下所示:
$ sudo docker run -it --cpu-rt-runtime=950000 --ulimit rtprio=99 ubuntu:20.04
在 Docker-Compose 文件中,您可以使用以下设置实现此目的(如以下文档中所指出:1, 2, 3):
cpu_rt_runtime: 950000
ulimits:
rtprio: 99
另外启动容器作为 privileged
和 net=host
有助于减少开销 here and in this post。
每个控制组分配的实时运行时间cpu.rt_runtime_us
可以在/sys/fs/cgroup/cpu,cpuacct
文件夹中查看。如果您已经将大部分实时运行时间分配给另一个 cgroup,这可能会导致错误消息 failed to write 95000 to cpu.rt_runtime_us: write /sys/fs/cgroup/cpu,cpuacct/system.slice/.../cpu.rt_runtime_us: invalid argument
或类似信息(参见 here and here). For more details on control groups in general see the corresponding official documentation (4, 5)。
对于来自 Docker 内部的 实时进程,我找到了控制组的替代方法,PREEMPT_RT
patch, way more useful: You can install it easily from a Debian package and it is sufficient to run the Docker then with the privileged
option in order to set real-time priorities to processes from inside it. The advantage is mainly a significantly lower maximum latency compared to control groups. I have discussed this in more details in this post and created a Github repository with guides and scripts 有助于安装 PREEMPT_RT
.
在我开始之前,对于是否应该在 SuperUser 或 Whosebug 中提出这个问题,我有两种想法 - 如果它的位置不正确,请提前致歉。
我有一个 docker 容器(包含 C/C++ 可执行代码),它执行 audio/video 处理。因此,我想测试 运行 将容器与 RT 调度约束结合起来的好处。在网上搜索时,我遇到了各种各样的信息,但我正在努力将所有信息放在一起。
系统环境:
- 主机:Ubuntu(库存)Zesty 17.04(无 RT 内核补丁,内核:4.10.0-35-genric)
- Docker版本:17.05.0-ce
- Docker 图片 OS: Ubuntu Zesty 17.04.
在嵌套在 docker image/container 中的可执行文件中,执行以下代码将调度程序从 'SCHED_OTHER' 更改为 'SCHED_FIFO'(请参阅 docs ):
struct sched_param sched = {};
const int nMin = sched_get_priority_min(SCHED_FIFO);
const int nMax = sched_get_priority_max(SCHED_FIFO);
const int nHlf = (nMax - nMin) / 2;
const int nPriority = nMin + nHlf + 1;
sched.sched_priority = boost::algorithm::clamp(nPriority, nMin, nMax);
if (sched_setscheduler(0, SCHED_FIFO, &sched) < 0)
std::cerr << "SETSCHEDULER failed - err = " << strerror(errno) << std::endl;
else
std::cout << "Priority set to \"" << sched.sched_priority << "\"" << std::endl;
我一直在阅读有关使用实时调度程序的各种 Docker 文档。一个有趣的 page 状态,
Verify that CONFIG_RT_GROUP_SCHED is enabled in the Linux kernel by running zcat /proc/config.gz | grep CONFIG_RT_GROUP_SCHED or by checking for the existence of the file /sys/fs/cgroup/cpu.rt_runtime_us. For guidance on configuring the kernel realtime scheduler, consult the documentation for your operating system.
根据上述建议,股票 Ubuntu Zesty 17.04 OS 似乎未通过这些检查。
第一个问题:我不能使用RT调度程序吗?什么是 'CONFIG_RT_GROUP_SCHED'?让我感到困惑的一件事是,2010-2012 年网络上有一些关于使用 RT 补丁修补内核的旧帖子。 Linux kernel中好像从那时候开始有一些soft RT相关的工作
引用 here 引发了我的问题:
From kernel version 2.6.18 onward, however, Linux is gradually becoming equipped with real-time capabilities, most of which are derived from the former realtime-preempt patches developed by Ingo Molnar, Thomas Gleixner, Steven Rostedt, and others. Until the patches have been completely merged into the mainline kernel (this is expected to be around kernel version 2.6.30), they must be installed to achieve the best real-time performance. These patches are named:
进行中...
阅读其他信息后,我注意到设置 ulimits 很重要。我改变了 /etc/security/limits.conf:
#* soft core 0
#root hard core 100000
#* hard rss 10000
# NEW ADDITION
gavin hard rtprio 99
第二个问题:想必以上是启用docker 守护进程到运行 RT 所必需的吗?看起来好像守护进程是通过 systemd 控制的。
我继续进一步调查,在同一个 Docker 文档页面上看到以下片段:
To run containers using the realtime scheduler, run the Docker daemon with the --cpu-rt-runtime flag set to the maximum number of microseconds reserved for realtime tasks per runtime period. For instance, with the default period of 10000 microseconds (1 second), setting --cpu-rt-runtime=95000 ensures that containers using the realtime scheduler can run for 95000 microseconds for every 10000-microsecond period, leaving at least 5000 microseconds available for non-realtime tasks. To make this configuration permanent on systems which use systemd, see Control and configure Docker with systemd.
在 this page 之后,我发现守护进程有两个参数值得关注:
--cpu-rt-period int Limit the CPU real-time period in microseconds
--cpu-rt-runtime int Limit the CPU real-time runtime in microseconds
同一个页面指出 docker daemon parameters can be specified via '/etc/docker/daemon.json',所以我尝试了:
{
"cpu-rt-period": 92500,
"cpu-rt-runtime": 100000
}
注意:文档没有将上述选项指定为'allowed configuration options on Linux'。我想我还是会试一试。
Docker 重启后的守护程序输出:
-- Logs begin at Wed 2017-10-04 09:58:38 BST, end at Wed 2017-10-04 10:01:32 BST. --
Oct 04 09:58:47 gavin systemd[1]: Starting Docker Application Container Engine...
Oct 04 09:58:47 gavin dockerd[1501]: time="2017-10-04T09:58:47.885882588+01:00" level=info msg="libcontainerd: new containerd process, pid: 1531"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.053986072+01:00" level=warning msg="failed to rename /var/lib/docker/tmp for background deletion: %!s(<nil>).
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.161303803+01:00" level=info msg="[graphdriver] using prior storage driver: aufs"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.303409053+01:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304002725+01:00" level=warning msg="Your kernel does not support swap memory limit"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304078792+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304201239+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.305534113+01:00" level=info msg="Loading containers: start."
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.730193030+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemo
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.784938130+01:00" level=info msg="Loading containers: done."
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.888035017+01:00" level=info msg="Daemon has completed initialization"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.888104120+01:00" level=info msg="Docker daemon" commit=89658be graphdriver=aufs version=17.05.0-ce
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.903280645+01:00" level=info msg="API listen on /var/run/docker.sock"
Oct 04 09:58:48 gavin systemd[1]: Started Docker Application Container Engine.
特别感兴趣的行:
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304078792+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304201239+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
考虑到我之前的发现,这并不奇怪。
最后一个问题:当这最终起作用时,我如何才能确定我的容器是真正的运行 RT 调度? 'top' 之类的就够了吗?
编辑:我运行一个内核诊断script which I found通过github上的moby。这是输出:
warning: /proc/config.gz does not exist, searching other paths for kernel config ...
info: reading kernel config from /boot/config-4.10.0-35-generic ...
Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled
Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: missing
(cgroup swap accounting is currently not enabled, you can enable it by setting boot option "swapaccount=1")
- CONFIG_LEGACY_VSYSCALL_EMULATE: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
- "overlay":
- CONFIG_VXLAN: enabled (as module)
Optional (for encrypted networks):
- CONFIG_CRYPTO: enabled
- CONFIG_CRYPTO_AEAD: enabled
- CONFIG_CRYPTO_GCM: enabled (as module)
- CONFIG_CRYPTO_SEQIV: enabled
- CONFIG_CRYPTO_GHASH: enabled (as module)
- CONFIG_XFRM: enabled
- CONFIG_XFRM_USER: enabled (as module)
- CONFIG_XFRM_ALGO: enabled (as module)
- CONFIG_INET_ESP: enabled (as module)
- CONFIG_INET_XFRM_MODE_TRANSPORT: enabled (as module)
- "ipvlan":
- CONFIG_IPVLAN: enabled (as module)
- "macvlan":
- CONFIG_MACVLAN: enabled (as module)
- CONFIG_DUMMY: enabled (as module)
- "ftp,tftp client in container":
- CONFIG_NF_NAT_FTP: enabled (as module)
- CONFIG_NF_CONNTRACK_FTP: enabled (as module)
- CONFIG_NF_NAT_TFTP: enabled (as module)
- CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
- "aufs":
- CONFIG_AUFS_FS: enabled (as module)
- "btrfs":
- CONFIG_BTRFS_FS: enabled (as module)
- CONFIG_BTRFS_FS_POSIX_ACL: enabled
- "devicemapper":
- CONFIG_BLK_DEV_DM: enabled
- CONFIG_DM_THIN_PROVISIONING: enabled (as module)
- "overlay":
- CONFIG_OVERLAY_FS: enabled (as module)
- "zfs":
- /dev/zfs: missing
- zfs command: missing
- zpool command: missing
Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000
显着性线:
- CONFIG_RT_GROUP_SCHED: missing
容器等级
在容器内进行 RT 调度有两种选择:
- 添加
SYS_NICE
功能
docker run --cap-add SYS_NICE ...
- 使用带有
--privileged
标志的特权模式
docker run --privileged ...
注意:--privileged
flag 授予了比必要更多的权限!
越有限的 --cap-add SYS_NICE
选项更安全。
OS 系统配置
您可能还必须在您的 sysctl 中启用实时调度。 如果您是 运行 作为 root 用户(Docker 容器的默认值):
sysctl -w kernel.sched_rt_runtime_us=-1
要使其永久化(更新您的图像):
echo 'kernel.sched_rt_runtime_us=-1' >> /etc/sysctl.conf
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
$ sudo systemctl stop docker
$ sudo systemctl stop docker.socket
然后我可以重新打开守护进程,为其控制组分配一个大的时间片(例如950000
):
$ sudo dockerd --cpu-rt-runtime=950000
对 Docker 守护程序的这些更改可以通过按照 here and here.
中描述的方式进行配置来永久化。然后我终于可以使用实时调度程序启动我的容器,如下所示:
$ sudo docker run -it --cpu-rt-runtime=950000 --ulimit rtprio=99 ubuntu:20.04
在 Docker-Compose 文件中,您可以使用以下设置实现此目的(如以下文档中所指出:1, 2, 3):
cpu_rt_runtime: 950000
ulimits:
rtprio: 99
另外启动容器作为 privileged
和 net=host
有助于减少开销 here and in this post。
每个控制组分配的实时运行时间cpu.rt_runtime_us
可以在/sys/fs/cgroup/cpu,cpuacct
文件夹中查看。如果您已经将大部分实时运行时间分配给另一个 cgroup,这可能会导致错误消息 failed to write 95000 to cpu.rt_runtime_us: write /sys/fs/cgroup/cpu,cpuacct/system.slice/.../cpu.rt_runtime_us: invalid argument
或类似信息(参见 here and here). For more details on control groups in general see the corresponding official documentation (4, 5)。
对于来自 Docker 内部的 实时进程,我找到了控制组的替代方法,PREEMPT_RT
patch, way more useful: You can install it easily from a Debian package and it is sufficient to run the Docker then with the privileged
option in order to set real-time priorities to processes from inside it. The advantage is mainly a significantly lower maximum latency compared to control groups. I have discussed this in more details in this post and created a Github repository with guides and scripts 有助于安装 PREEMPT_RT
.