包含文件中的#endif 是否可以用于关闭包含文件中的#if？

Question

假设我有两个文件，a.h:

#if 1
#include "b.h"

和b.h:

#endif

gcc 和 clang 的预处理器都拒绝 a.h:

$ cpp -ansi -pedantic a.h >/dev/null
In file included from a.h:2:0:
b.h:1:2: error: #endif without #if
 #endif
  ^
a.h:1:0: error: unterminated #if
 #if 1
 ^

然而，C标准（N1570 6.10.2.3）说：

A preprocessing directive of the form

# include "q-char-sequence" new-line

causes the replacement of that directive by the entire contents of the source file identified by the specified sequence between the " delimiters.

这似乎允许上述构造。

gcc 和 clang 是否不符合拒绝我的代码的要求？

Answer 1

#if / #ifdef / #ifndef
#elif
#else
#endif

必须在一个文件中匹配。

Answer 2

将 C 预处理器视为一个非常简单的编译器，要翻译一个文件，C 预处理器在概念上会执行几个阶段。

词法分析 – 将构成预处理翻译单元的字符序列分组为具有确定含义（tokens）的字符串预处理器语言。
句法分析 – 将预处理翻译单元的token分组为根据预处理语言语法构建的句法结构。
代码生成 – 将构成预处理翻译单元的所有文件翻译成仅包含 'pure' C 指令的单个文件。

严格来说，C Standard (ISO/IEC 9899:201x)的§5.1.1.2中提到的与预处理相关的翻译阶段是阶段3和阶段4。阶段3几乎完全对应于词法分析，而阶段4是关于代码生成的。

该图片中似乎缺少句法分析（解析）。事实上，C 预处理器语法非常简单，真正的 preprocessors/compilers 将其与词法分析一起执行。

如果句法分析阶段成功结束——即根据预处理器语法，预处理翻译单元中的所有语句都是合法的——可以生成代码并执行所有预处理指令。
执行预处理指令意味着根据其语义转换源文件，然后从源文件中删除该指令。
每个预处理器指令的语义在 C 标准的 §6.10.1-6.10.9 中指定。

回到您的示例程序，您提供的 2 个文件，即 a.h 和 b.h，在概念上处理如下。

词法分析 - 每个单独的预处理标记由左侧的“{”和右侧的“}”分隔.

a.h

{#}{if} {1}
{#}{include} {"b.h"}

b.h

{#}{endif}

此阶段执行无误，其结果（预处理标记序列）被传递到后续阶段：句法分析。

句法分析

a.h的初步推导如下

preprocessing-file →
group →
group-part →
if-section →
if-group endif-line → 
if-group #endif new-line →
…

并且很明显 a.h 的内容不能从预处理语法中导出——实际上终止符 #endif 丢失了——因此 a.h 在句法上是不正确的。这正是你的编译器在写

时告诉你的

a.h:1:0: error: unterminated #if

b.h 也发生了类似的事情；向后推理，#endif 只能从规则

中推导出来

if-section → 
if-group elif-groups[opt] else-group[opt] endif-line

这意味着文件内容应来自以下 3 个组之一

# if constant-expression new-line group[opt]
# ifdef identifier new-line group[opt]
# ifndef identifier new-line group[opt]

因为情况并非如此，因为 b.h 不包含 # if/# ifdef/# ifndef 而只包含单个 #endif 行，再次 b.h 的内容在语法上不正确你的编译器会这样告诉你

In file included from a.h:2:0:
b.h:1:2: error: #endif without #if

代码生成

当然，因为你的程序在词汇上是正确的但在语法上是不正确的，这个阶段永远不会执行。

Answer 3

C 标准定义了 8 个 翻译阶段 。源文件由 8 个阶段中的每一个依次处理（或以等效方式）。

第 4 阶段，如 N1570 第 5.1.1.2 节中所定义，是：

Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.

这里的相关句子是：

A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively.

这意味着每个包含的源文件都是自己预处理的。这排除了在一个文件中有 #if 而在另一个文件中有相应的 #endif。

(正如评论中提到的 "A wild elephant" 和所说，第 6.10 节中的语法还说 if-section，它开始带有 #if（或 #ifdef 或 #ifndef）行并以 #endif 行结尾，只能作为 预处理文件 [= 的一部分出现36=].)

Answer 4

我认为编译器是正确的，或者至多标准是模棱两可的。

诀窍不在于如何 #include 实现，而在于完成预处理的顺序。

看C99标准6.10节的语法规则：

preprocessing-file:
    group[opt]

group:
    group-part
    group group-part

group-part:
    if-section
    control-line
    text-line
    # non-directive

if-section:
    if-group elif-groups[opt] else-group[opt] endif-line

if-group:
    # if constant-expression new-line group[opt]
...
control-line:
    # include pp-tokens new-line
    ...

如您所见，#include 嵌套在 group 中，而 group 是 #if / #endif 中的东西。

例如，在格式良好的文件中，例如：

#if 1
#include <a.h>
#endif

这将解析为 #if 1，加上 group，再加上 #endif。而里面 group 有一个 #include.

但是在你的例子中：

#if 1
#include <a.h>

规则 if-section 不适用于此来源，因此 group 作品甚至未被检查。

您可能会争辩说标准是模棱两可的，因为它没有指定 #include 指令的替换发生的时间，并且符合标准的实现可能会改变很多语法规则并替换 #include 在因找不到 #endif 而失败之前。但是，如果语法的副作用修改了您正在解析的文本，那么这些歧义就无法避免。 C是不是很棒？

包含文件中的#endif 是否可以用于关闭包含文件中的#if？

Can #endif in an included file be used to close a #if in the including file?

c

language-lawyer