GCC 内联汇编的副作用

GCC Inline Assembly side effects

有人可以向我解释一下(换句话说)GCC doc 的以下部分:

Here is a fictitious sum of squares instruction, that takes two pointers to floating point values in memory and produces a floating point register output. Notice that x, and y both appear twice in the asm parameters, once to specify memory accessed, and once to specify a base register used by the asm. You won’t normally be wasting a register by doing this as GCC can use the same register for both purposes. However, it would be foolish to use both %1 and %3 for x in this asm and expect them to be the same. In fact, %3 may well not be a register. It might be a symbolic memory reference to the object pointed to by x.

asm ("sumsq %0, %1, %2"
 : "+f" (result)
 : "r" (x), "r" (y), "m" (*x), "m" (*y));

Here is a fictitious *z++ = *x++ * *y++ instruction. Notice that the x, y and z pointer registers must be specified as input/output because the asm modifies them.

asm ("vecmul %0, %1, %2"
 : "+r" (z), "+r" (x), "+r" (y), "=m" (*z)
 : "m" (*x), "m" (*y));

在第一个例子中,在输入操作数中列出 *x*y 有什么意义?同一文档指出:

In particular, there is no way to specify that input operands get modified without also specifying them as output operands.

在第二个例子中,为什么要使用输入操作数部分? None 的操作数无论如何都用在汇编语句中。

作为奖励,如何将以下示例从 SO post 更改为不需要 volatile 关键字?

void swap_2 (int *a, int *b)
{
int tmp0, tmp1;

__asm__ volatile (
    "movl (%0), %k2\n\t" /* %2 (tmp0) = (*a) */
    "movl (%1), %k3\n\t" /* %3 (tmp1) = (*b) */
    "cmpl %k3, %k2\n\t"
    "jle  %=f\n\t"       /* if (%2 <= %3) (at&t!) */
    "movl %k3, (%0)\n\t"
    "movl %k2, (%1)\n\t"
    "%=:\n\t"

    : "+r" (a), "+r" (b), "=r" (tmp0), "=r" (tmp1) :
    : "memory" /* "cc" */ );
}

提前致谢。我现在已经为此苦苦挣扎了两天。

在第一个示例中,*x*y 必须列为输入操作数,以便 GCC 知道指令的结果取决于它们。否则,GCC 可以将存储移动到 *x*y 通过内联汇编片段,然后这将访问未初始化的内存。通过编译这个例子可以看出这一点:

double
f (void)
{
  double result;
  double a = 5;
  double b = 7;
  double *x = &a;
  double *y = &b;
  asm ("sumsq %0, %1, %2"
       : "+X" (result)
       : "r" (x), "r" (y) /*, "m" (*x), "m" (*y)*/);
  return result;
}

这导致:

f:
    leaq    -16(%rsp), %rax
    leaq    -8(%rsp), %rdx
    pxor    %xmm0, %xmm0
#APP
# 8 "t.c" 1
    sumsq %xmm0, %rax, %rdx
# 0 "" 2
#NO_APP
    ret

两条leaq指令只是将寄存器设置为指向堆栈上未初始化的红色区域。作业都没有了。

第二个例子也是如此

我认为您可以使用相同的技巧来消除 volatile。但我认为这里实际上没有必要,因为已经有一个 "memory" clobber,它告诉 GCC 内存是从内联汇编读取或写入的。