llvm IR 中的冗余基本块

Question

我只是在玩一个程序并在 llvm 中观察它的 IR，我注意到某些对我来说没有意义的基本块。我的代码：

void proc()
{
    int i, j, k, m, n, l;
    k = 119;
    for (i = 20; i < 200; i++)
    {
        for (j = 13; j < 130; j++)
        {
                l = 80;
        }
    }
}

对应的IR：

 store i32 119, i32* %3, align 4
  store i32 20, i32* %1, align 4
  br label %7

; <label>:7:                                      ; preds = %19, %0
  %8 = load i32, i32* %1, align 4
  %9 = icmp slt i32 %8, 200
  br i1 %9, label %10, label %22

; <label>:10:                                     ; preds = %7
  store i32 13, i32* %2, align 4
  br label %11

; <label>:11:                                     ; preds = %15, %10
  %12 = load i32, i32* %2, align 4
  %13 = icmp slt i32 %12, 130
  br i1 %13, label %14, label %18

; <label>:14:                                     ; preds = %11
  store i32 80, i32* %6, align 4
  br label %15

; <label>:15:                                     ; preds = %14
  %16 = load i32, i32* %2, align 4
  %17 = add nsw i32 %16, 1
  store i32 %17, i32* %2, align 4
  br label %11

; <label>:18:                                     ; preds = %11
  br label %19

; <label>:19:                                     ; preds = %18
  %20 = load i32, i32* %1, align 4
  %21 = add nsw i32 %20, 1
  store i32 %21, i32* %1, align 4
  br label %7

; <label>:22:                                     ; preds = %7
  ret void

我的问题是标签 14 和 15。看起来标签 15 只有一个前身，即 14。鉴于此，我们为什么不能将它与标签 14 合并？我假设基本块构造是由提到的算法完成的 here。

Answer 1

（这个答案主要是对我在评论中已经推测的内容的总结，除了它更详细并且不再是推测，因为我现在实际上已经深入研究了 clang 的源代码以验证正在发生的事情）

LLVM 代码总是构造成基本块，API 中用于表示 LLVM 代码的类型已经形成了控制流图。没有基本块的 LLVM 的非结构化形式是不存在的，因此没有将非结构化 LLVM 转换为 CFG 的过程。 Clang 直接将 C AST 翻译成 LLVM。所以它不会在非结构化三地址代码中找到基本块，而是在一步从 C 转换为 LLVM 时找到基本块。因此维基百科的算法不适用。

下面大致基于 CodeGenFunction::EmitForStmt in CodeGen/CGStmt.cpp 的高度简化的伪代码总结了正在发生的事情。这是翻译形式为 for(init; cond; incr) body 的语句的逻辑。为简单起见，我们假设 cond 和 incr 都不为空，并且 cond 是一个表达式，而不是声明。

创建新的基本块：conditionBlock、bodyBlock、incrBlock和exitBlock
将 init 的代码附加到当前基本块，然后跳转到 conditionBlock
将 cond 的代码附加到 conditionBlock，然后是 br i1 %cond, label %bodyBlock, label %exitBlock
将 {break: exitBlock, continue: incrBlock} 压入 break/continue 堆栈
将 body 的代码附加到 bodyBlock，然后跳转到 conditionBlock
弹出 break/continue 堆栈
设置exitBlock为当前基本块

要获得您期望的输出，它必须将 incr 的代码发送到 bodyBlock 中，而不是为其创建一个单独的块。但随后它无法将 incrBlock 压入 break/continue 堆栈。当然这对你的情况并不重要，因为你的代码不包含任何 continue 语句，但在一般情况下，编译器需要 break/continue 堆栈来知道在 [= 的情况下跳转到哪里31=]s 或 continues.

所以编译器总是简单地生成这些块，然后在优化阶段合并不必要的块。

llvm IR 中的冗余基本块

Redundant basic blocks in llvm IR

compiler-construction

llvm