为什么 docker pull 不并行提取层?
Why is docker pull not extracting layers in parallel?
docker pull
对docker 个图像层的提取(解压缩)是否必须按顺序进行或是否可以并行进行?
例子
docker pull mirekphd/ml-cpu-r40-base
- 出于构建性能原因,必须将图像拆分为 50 多个层 - 它包含大约 4k R 包预编译为 DEB(整个 CRAN 任务视图内容),这是不可能的在 docker 中构建而不将这些包拆分为大小大致相等的多个层,这将构建时间从一整天缩短到几分钟。提取阶段 - 如果并行化 - 可能会快 50 倍...
上下文
当您观察 docker pull
大型多层图像(大小为千兆字节)时,您会注意到每一层的 下载 可以单独执行,在平行下。对于这些层中的每一层的后续提取(untarring),情况并非如此,这是按顺序执行的。我们知道为什么吗?
根据我对如此大图像的轶事观察,它会大大加快 docker pull
操作的执行速度。
此外,如果将图像拆分为更多层可以让您更快地启动容器,人们会开始编写更具可读性和更快的 both Dockerfile
[=] =40=] 和 pull
/run
,而不是试图将所有指令堆放在一个缓慢构建、令人难以置信的复杂和缓存破坏的指令字符串中,只是为了节省几兆字节的额外层开销(这很容易通过并行提取收回)。
根据 https://github.com/moby/moby/issues/21814 的讨论,层未并行提取的主要原因有两个:
- 它不适用于所有存储驱动程序。
- 它可能会使用很多 CPu。
查看下面的相关评论:
Note that not all storage drivers would be able to support parallel extraction. Some are snapshotting filesystems where the first layer must be extracted and snapshotted before the next one can be applied.
@aaronlehmann
We also don't really want a pull
operation consuming tons of CPU on a host with running containers.
@cpuguy83
关闭链接问题的用户写道:
This isn't going to happen for technical reasons. There's no room for debate here. AUFS would support this, but most of the other storage drivers wouldn't support this. This also requires having specific code to implement at least two different code paths: one with this parallel extraction and one without it.
An image is basically something like this graph A->B->C->D and most Docker storage drivers can't handle extracting any layers which depend on layers which haven't been extracted already.
Should you want to speed up docker pull, you most certainly want faster storage and faster network. Go itself will contribute to performance gains once Go 1.7 is out and we start using it.
I'm going to close this right now because any gains from parallel extraction for specific drivers aren't worth the complexity for the code, the effort needed to implement it and the effort needed to maintain this in the future.
@unclejack
docker pull
对docker 个图像层的提取(解压缩)是否必须按顺序进行或是否可以并行进行?
例子
docker pull mirekphd/ml-cpu-r40-base
- 出于构建性能原因,必须将图像拆分为 50 多个层 - 它包含大约 4k R 包预编译为 DEB(整个 CRAN 任务视图内容),这是不可能的在 docker 中构建而不将这些包拆分为大小大致相等的多个层,这将构建时间从一整天缩短到几分钟。提取阶段 - 如果并行化 - 可能会快 50 倍...
上下文
当您观察 docker pull
大型多层图像(大小为千兆字节)时,您会注意到每一层的 下载 可以单独执行,在平行下。对于这些层中的每一层的后续提取(untarring),情况并非如此,这是按顺序执行的。我们知道为什么吗?
根据我对如此大图像的轶事观察,它会大大加快 docker pull
操作的执行速度。
此外,如果将图像拆分为更多层可以让您更快地启动容器,人们会开始编写更具可读性和更快的 both Dockerfile
[=] =40=] 和 pull
/run
,而不是试图将所有指令堆放在一个缓慢构建、令人难以置信的复杂和缓存破坏的指令字符串中,只是为了节省几兆字节的额外层开销(这很容易通过并行提取收回)。
根据 https://github.com/moby/moby/issues/21814 的讨论,层未并行提取的主要原因有两个:
- 它不适用于所有存储驱动程序。
- 它可能会使用很多 CPu。
查看下面的相关评论:
Note that not all storage drivers would be able to support parallel extraction. Some are snapshotting filesystems where the first layer must be extracted and snapshotted before the next one can be applied.
@aaronlehmann
We also don't really want a
pull
operation consuming tons of CPU on a host with running containers.@cpuguy83
关闭链接问题的用户写道:
This isn't going to happen for technical reasons. There's no room for debate here. AUFS would support this, but most of the other storage drivers wouldn't support this. This also requires having specific code to implement at least two different code paths: one with this parallel extraction and one without it.
An image is basically something like this graph A->B->C->D and most Docker storage drivers can't handle extracting any layers which depend on layers which haven't been extracted already.
Should you want to speed up docker pull, you most certainly want faster storage and faster network. Go itself will contribute to performance gains once Go 1.7 is out and we start using it.
I'm going to close this right now because any gains from parallel extraction for specific drivers aren't worth the complexity for the code, the effort needed to implement it and the effort needed to maintain this in the future.
@unclejack