C中变量的逻辑内存地址

Question

考虑以下代码：

#include <stdio.h>
int main() {
  int a = 10;
  printf("%d %p\n", a, &a);
}

如果我反复编译和执行上面的代码，它会为printf语句的地址部分打印不同的值。

如果逻辑内存space是16位，&运算符的地址应该在0x0000到0xFFFF之间。我们知道 & 运算符的地址对于不同的执行是不一样的。我的问题是 - 导致内存地址分配不确定的原因是什么？由于逻辑地址映射到物理地址，难道物理地址发生变化时逻辑地址值不应该保持一致吗？

此外，如果我 fork 进程，子进程和父进程将为 printf 语句打印完全相同的输出。为什么当我们 fork 一个子进程时，上述行为不会发生，即使它正在生成一个新进程？

Answer 1

引自 linked by @tpr in the comments, the difference in addresses you observed is due to address space layout randomization:

Local variables are allocated on the stack. Traditionally, stack allocation would be repeatable, but this has changed in recent years. Address space layout randomization (ASR) is a relatively recent innovation in OS memory management, which deliberately makes memory addresses in stack allocations (such as those you have observed) as non-deterministic as possible at runtime. It’s a security feature: this keeps bad actors from exploiting heap buffer overflows, because if the ASLR implementation is entropic enough, who knows what’s going to be there at the end of the overflowing buffer?

重要的是，ASLR 适用于堆栈本身的分配（以及连接到可执行文件的其他数据区域）。正如维基百科上的简洁表述：

In order to prevent an attacker from reliably jumping to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.

分叉进程中相同的地址是 而不是 ，因为 copy-on-write 正如我最初回答的那样。即使您在分叉进程中修改变量，地址也将保持不变（尽管会生成变量的副本）。尝试运行以下代码：

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main() {
  int a = 10;
  int status;
  printf("%d %p\n", a, &a);
  pid_t pid = fork();
  if (pid == 0)
  {
    printf("FORKED: %d %p\n", a, &a);
    a = 11;
    printf("FORKED: %d %p\n", a, &a);
    return 0;
  } else {
  wait(&status);
  printf("%d %p\n", a, &a);
  return 0;
  }
}

你会看到 a 只在分叉进程中被修改，但父进程将打印它不变。但是，地址在所有打印行中保持不变。写这个答案时有点意外，所以在搜索时我发现 this question。答案很简单：

Every single process gets its own 4G virtual address space and it's the job of the operating systems and hardware memory managers to map your virtual addresses to physical ones.

So, while it may seem that two processes have the same address for a variable, that's only the virtual address.

The memory manager will map that to a totally different physical address

以下两句引自 fork(2) 手册页：

The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects

[...]

Under Linux, fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory required to duplicate the parent's page tables, and to create a unique task structure for the child.

由于第二个引用中提到的copy-on-write，底层物理地址对于相同的虚拟内存可能是相同的分叉进程及其父进程中的地址。来自维基百科：

Copy-on-write (CoW or COW), sometimes referred to as implicit sharing or shadowing, is a resource-management technique used in computer programming to efficiently implement a "duplicate" or "copy" operation on modifiable resources. If a resource is duplicated but not modified, it is not necessary to create a new resource; the resource can be shared between the copy and the original. Modifications must still create a copy, hence the technique: the copy operation is deferred to the first write.

因此，在修改变量（或调用 exec* 系列的成员）之前，相同的虚拟地址很可能对应相同的物理地址（请参阅手册页了解例外情况）。

C中变量的逻辑内存地址

Logical memory address for variables in C

c

memory

pointers

fork

process