使用 `CLONE_NEWUSER|CLONE_NEWNS` 调用克隆时,挂载传播的行为如何?
How does mount propagation behave when calling clone with `CLONE_NEWUSER|CLONE_NEWNS`?
我的程序调用clone
并在子进程中调用/bin/sh
。
在shell,我运行cat /proc/$$/mountinfo
看传播归属。
如果标志是 CLONE_NEWNS
,我得到这个:
# cat /proc/$$/mountinfo
194 193 8:1 / / rw,relatime shared:1 - ext4 /dev/sda1 rw,discard,errors=remount-ro
...
如果结合 CLONE_NEWNS
和 CLONE_NEWUSER
(在以下来源中取消注释 flags |= CLONE_NEWUSER;
),我得到了这个:
199 198 8:1 / / rw,relatime master:1 - ext4 /dev/sda1 rw,discard,errors=remount-ro
...
为什么 CLONE_NEWUSER
会有所不同?在我的机器(Debian 9)上,它应该总是 MS_SHARED
因为它是从 MS_SHARED 安装点创建的。
#define _GNU_SOURCE
#include <sched.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];
char *const container_args[] = {"/bin/sh", NULL};
int container_main(void *arg) {
printf("Container - inside the container!\n");
printf("container pid is %d\n", getpid());
int status = execv(container_args[0], container_args);
if (status < 0) perror("execv");
printf("Something's wrong!\n");
return 0;
}
int main() {
printf("Parent [ %d ] - start a container!\n", getpid());
int flags = CLONE_NEWNS;
//flags |= CLONE_NEWUSER;
int container_pid = clone(container_main, container_stack + STACK_SIZE,
SIGCHLD | flags, NULL);
if (container_pid < 0) {
perror("clone");
return -1;
}
printf("Container pid is %d\n", container_pid);
waitpid(container_pid, NULL, 0);
printf("Parent - container stopped!\n");
return 0;
}
man 7 mount_namespaces
解释了。相关摘录:
* Each mount namespace has an owner user namespace. As
explained above, when a new mount namespace is created, its
mount point list is initialized as a copy of the mount point
list of another mount namespace. If the new namespace and the
namespace from which the mount point list was copied are owned
by different user namespaces, then the new mount namespace is
considered less privileged.
* When creating a less privileged mount namespace, shared mounts
are reduced to slave mounts. (Shared and slave mounts are
discussed below.) This ensures that mappings performed in
less privileged mount namespaces will not propagate to more
privileged mount namespaces
shared:X
This mount point is shared in peer group X. Each peer
group has a unique ID that is automatically generated by
the kernel, and all mount points in the same peer group
will show the same ID. (These IDs are assigned starting
from the value 1, and may be recycled when a peer group
ceases to have any members.)
master:X
This mount is a slave to shared peer group X.
我的程序调用clone
并在子进程中调用/bin/sh
。
在shell,我运行cat /proc/$$/mountinfo
看传播归属。
如果标志是 CLONE_NEWNS
,我得到这个:
# cat /proc/$$/mountinfo
194 193 8:1 / / rw,relatime shared:1 - ext4 /dev/sda1 rw,discard,errors=remount-ro
...
如果结合 CLONE_NEWNS
和 CLONE_NEWUSER
(在以下来源中取消注释 flags |= CLONE_NEWUSER;
),我得到了这个:
199 198 8:1 / / rw,relatime master:1 - ext4 /dev/sda1 rw,discard,errors=remount-ro
...
为什么 CLONE_NEWUSER
会有所不同?在我的机器(Debian 9)上,它应该总是 MS_SHARED
因为它是从 MS_SHARED 安装点创建的。
#define _GNU_SOURCE
#include <sched.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];
char *const container_args[] = {"/bin/sh", NULL};
int container_main(void *arg) {
printf("Container - inside the container!\n");
printf("container pid is %d\n", getpid());
int status = execv(container_args[0], container_args);
if (status < 0) perror("execv");
printf("Something's wrong!\n");
return 0;
}
int main() {
printf("Parent [ %d ] - start a container!\n", getpid());
int flags = CLONE_NEWNS;
//flags |= CLONE_NEWUSER;
int container_pid = clone(container_main, container_stack + STACK_SIZE,
SIGCHLD | flags, NULL);
if (container_pid < 0) {
perror("clone");
return -1;
}
printf("Container pid is %d\n", container_pid);
waitpid(container_pid, NULL, 0);
printf("Parent - container stopped!\n");
return 0;
}
man 7 mount_namespaces
解释了。相关摘录:
* Each mount namespace has an owner user namespace. As
explained above, when a new mount namespace is created, its
mount point list is initialized as a copy of the mount point
list of another mount namespace. If the new namespace and the
namespace from which the mount point list was copied are owned
by different user namespaces, then the new mount namespace is
considered less privileged.
* When creating a less privileged mount namespace, shared mounts
are reduced to slave mounts. (Shared and slave mounts are
discussed below.) This ensures that mappings performed in
less privileged mount namespaces will not propagate to more
privileged mount namespaces
shared:X
This mount point is shared in peer group X. Each peer
group has a unique ID that is automatically generated by
the kernel, and all mount points in the same peer group
will show the same ID. (These IDs are assigned starting
from the value 1, and may be recycled when a peer group
ceases to have any members.)
master:X
This mount is a slave to shared peer group X.