在 C 中访问链接器脚本变量的 "value" 未定义行为?
Is accessing the "value" of a linker script variable undefined behavior in C?
GNU ld(linker 脚本)手册部分 3.5.5 Source Code Reference has some really important information on how to access linker script "variables" (which are actually just integer addresses) in C source code. I used this info. to extensively use linker script variables, and I wrote this answer here: How to get value of variable defined in ld linker script from C。
然而,很容易做错,并犯下试图访问 linker 脚本变量的 value(错误地)而不是其地址的错误,因为这有点深奥。手册(上面的link)说:
This means that you cannot access the
value of a linker script defined symbol - it has no value - all you can do is access the address of a linker script defined symbol.
Hence when you are using a linker script defined symbol in source code you should always take the address of the symbol, and never attempt to use its value.
问题:因此,如果您确实尝试访问linker脚本变量的 value,这是“未定义的行为”吗?
快速复习:
假设在 linker 脚本中(例如:STM32F103RBTx_FLASH.ld)您有:
/* Specify the memory areas */
MEMORY
{
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 20K
}
/* Some custom variables (addresses) I intend to access from my C source code */
__flash_start__ = ORIGIN(FLASH);
__flash_end__ = ORIGIN(FLASH) + LENGTH(FLASH);
__ram_start__ = ORIGIN(RAM);
__ram_end__ = ORIGIN(RAM) + LENGTH(RAM);
并且在您的 C 源代码中您这样做:
// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);
// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);
// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);
示例打印输出
(这是真实的输出:它实际上是编译的,运行,并由 STM32 mcu 打印):
__flash_start__ addr = 0x8000000
__flash_start__ addr = 0x8000000
__flash_start__ addr = 0x20080000
<== 注意就像我上面说的:这个是 完全错误的 (即使它编译并且 运行s)! <== 2020 年 3 月更新:实际上,看我的回答,这也很好,只是做了一些不同的事情而已。
更新:
回复@Eric Postpischil 的第一条评论:
The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing flash_start is normally accessible memory, and except for any requirements of your system about what is at flash_start, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via flash_start.
是的,但这不是我的问题。我不确定您是否理解我问题的微妙之处。看看我提供的例子。确实您可以很好地访问此位置,但请确保您了解如何 您这样做,然后我的问题就会变得明显。特别注意上面的示例 3,它是 错误的 ,即使对于 C 程序员来说它 看起来是正确的 。要阅读 uint32_t
,例如,在 __flash_start__
,您可以这样做:
extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)&__flash_start__); // correct, even though it *looks like* you're taking the address (&) of an address (__flash_start__)
或者这个:
extern uint32_t __flash_start__[];
uint32_t u32 = *((uint32_t *)__flash_start__); // also correct, and my preferred way of doing it because it looks more correct to the trained "C-programmer" eye
但绝对不是这个:
extern uint32_t __flash_start__;
uint32_t u32 = __flash_start__; // incorrect; <==UPDATE: THIS IS ALSO CORRECT! (and more straight-forward too, actually; see comment discussion under this question)
而不是这个:
extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)__flash_start__); // incorrect, but *looks* right
相关:
- Why do STM32 gcc linker scripts automatically discard all input sections from these standard libraries: libc.a, libm.a, libgcc.a?
- [我的回答]How to get value of variable defined in ld linker script from C
更简短的回答:
访问链接描述文件变量的“值”不是未定义的行为,并且可以这样做,只要您希望实际数据存储在内存中的那个位置而不是该内存的地址或链接脚本变量的“值”,它恰好被 C 代码视为内存中的 地址 仅 不是一个值。
是的,这有点令人困惑,所以请仔细阅读 3 遍。 本质上,如果您想访问链接描述文件变量的值,只需确保您的链接描述文件已设置为防止任何您不想要的内容结束在该内存地址中,这样无论您想要什么事实上那里。这样,读取该内存地址的值将为您提供您期望在那里的有用信息。
但是,如果您使用链接描述文件变量来存储某种“值”,那么在 C 中获取这些链接描述文件变量的“值”的方法是读取它们的 addresses,因为你在链接描述文件中分配给变量的“值”被 C 编译器视为该链接描述文件变量的“地址”,因为链接描述文件旨在操纵内存和内存地址,而不是传统的 C 变量。
在我的问题下有一些非常有价值和正确的评论,我认为这些评论值得张贴在这个答案中,这样它们就不会丢失。 请在我上面的问题下给他的评论点赞。
The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing __flash_start__
is normally accessible memory, and except for any requirements of your system about what is at __flash_start__
, you could, in theory, put a uint32_t
(using appropriate input to the linker) and then access it via __flash_start__
.
– Eric Postpischil
That documentation is not written very well, and you are taking the first sentence too literally. What is really happening here is that the linker’s notion of the “value” of a symbol and a programming language’s notion of the “value” of an identifier are different things. To the linker, the value of a symbol is simply a number associated with it. In a programming language, the value is a number (or other element in the set of values of some type) stored in the (sometimes notional) storage associated with the identifier. The documentation is advising you that the linker’s value of a symbol appears inside a language like C as the address associated with the identifier, rather than the contents of its storage...
这部分非常重要,我们应该更新 GNU 链接描述文件手册:
It goes too far when it tells you to “never attempt to use its value.”
It is correct that merely defining a linker symbol does not reserve the necessary storage for a programming language object, and therefore merely having a linker symbol does not provide you storage you can access. However if you ensure storage is allocated by some other means, then, sure, it can work as a programming language object. There is no general prohibition on using a linker symbol as an identifier in C, including accessing its C value, if you have properly allocated storage and otherwise satisfied the requirements for this. If the linker value of __flash_start__
is a valid memory address, and you have ensure there is storage for a uint32_t
at that address, and it is a properly aligned address for a uint32_t
, then it is okay to access __flash_start__
in C as if it were a uint32_t
. That would not be defined by the C standard, but by the GNU tools.
– Eric Postpischil
长答案:
我在问题中说:
// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);
// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);
// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);
(请参阅问题下的讨论,了解我是如何得出这个结论的)。
具体看上面#3:
嗯,实际上,如果你的目标是读取__flash_start__
的地址,在这种情况下是0x8000000
,那么是的,这是完全错误。但是,这不是未定义的行为!相反,它实际上正在做的是将该地址 (0x8000000
) 的 contents(值)读取为 uint32_t
类型。换句话说,它只是读取 FLASH 部分的前 4 个字节,并将它们解释为 uint32_t
。在这种情况下,内容(此地址的uint32_t
值)恰好是0x20080000
。
为进一步证明这一点,以下完全相同:
// Read the actual *contents* of the `__flash_start__` address as a 4-byte value!
// forward declaration to make a variable defined in the linker script
// accessible in the C code
extern uint32_t __flash_start__;
// These 2 read techniques do the exact same thing.
uint32_t u32_1 = __flash_start__; // technique 1
uint32_t u32_2 = *((uint32_t *)&__flash_start__); // technique 2
printf("u32_1 = 0x%lX\n", u32_1);
printf("u32_2 = 0x%lX\n", u32_2);
输出为:
u32_1 = 0x20080000
u32_2 = 0x20080000
注意它们产生相同的结果。它们每个都产生一个有效的 uint32_t
类型的值,该值存储在地址 0x8000000
.
然而,事实证明,上面显示的 u32_1
技术是一种更直接和直接的读取值的方法,而且 不是 未定义的行为。相反,它正在正确读取该地址的值(内容)。
我好像在兜圈子。无论如何,我很震惊,但我现在明白了。在我应该只使用上面显示的 u32_2
技术之前我被说服了,但事实证明它们都很好,而且 u32_1
技术显然更直接(我去又在兜圈子)。 :)
干杯。
深入挖掘:存储在闪存开头的 0x20080000
值从何而来?
再来一个小花絮。我实际上 运行 这个测试代码是在一个 STM32F777 mcu 上的,它有 512KiB 的 RAM。由于 RAM 从地址 0x20000000 开始,这意味着 0x20000000 + 512K = 0x20080000。这恰好也是 RAM 中地址为零的内容,因为 Programming Manual PM0253 Rev 4, pg. 42,“图 10。向量 table”显示向量 Table 的前 4 个字节包含“初始 SP [堆栈指针] 值”。看这里:
我知道 Vector Table 位于闪存中程序存储器的开头,这意味着 0x20080000 是我的初始堆栈指针值。这是有道理的,因为 Reset_Handler
是程序的开始(顺便说一句,它的向量恰好是向量 Table 开始处的第二个 4 字节值),并且它所做的第一件事,如我的“startup_stm32f777xx.s”启动程序集文件所示,是将堆栈指针 (sp) 设置为 _estack
:
Reset_Handler:
ldr sp, =_estack /* set stack pointer */
此外,_estack
在我的链接描述文件中定义如下:
/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM); /* end of RAM */
好了!我的 Vector Table 中的第一个 4 字节值,就在 Flash 的开头,被设置为初始堆栈指针值,它在我的链接描述文件中被定义为 _estack
,并且 _estack
是我RAM末尾的地址,即0x20000000 + 512K = 0x20080000。所以,这一切都是有道理的!我刚刚证明我读到了正确的值!
另请参阅:
- [我的回答]How to get value of variable defined in ld linker script from C
GNU ld(linker 脚本)手册部分 3.5.5 Source Code Reference has some really important information on how to access linker script "variables" (which are actually just integer addresses) in C source code. I used this info. to extensively use linker script variables, and I wrote this answer here: How to get value of variable defined in ld linker script from C。
然而,很容易做错,并犯下试图访问 linker 脚本变量的 value(错误地)而不是其地址的错误,因为这有点深奥。手册(上面的link)说:
This means that you cannot access the value of a linker script defined symbol - it has no value - all you can do is access the address of a linker script defined symbol.
Hence when you are using a linker script defined symbol in source code you should always take the address of the symbol, and never attempt to use its value.
问题:因此,如果您确实尝试访问linker脚本变量的 value,这是“未定义的行为”吗?
快速复习:
假设在 linker 脚本中(例如:STM32F103RBTx_FLASH.ld)您有:
/* Specify the memory areas */
MEMORY
{
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 20K
}
/* Some custom variables (addresses) I intend to access from my C source code */
__flash_start__ = ORIGIN(FLASH);
__flash_end__ = ORIGIN(FLASH) + LENGTH(FLASH);
__ram_start__ = ORIGIN(RAM);
__ram_end__ = ORIGIN(RAM) + LENGTH(RAM);
并且在您的 C 源代码中您这样做:
// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);
// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);
// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);
示例打印输出
(这是真实的输出:它实际上是编译的,运行,并由 STM32 mcu 打印):
__flash_start__ addr = 0x8000000
__flash_start__ addr = 0x8000000
__flash_start__ addr = 0x20080000
<== 注意就像我上面说的:这个是 完全错误的 (即使它编译并且 运行s)! <== 2020 年 3 月更新:实际上,看我的回答,这也很好,只是做了一些不同的事情而已。
更新:
回复@Eric Postpischil 的第一条评论:
The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing flash_start is normally accessible memory, and except for any requirements of your system about what is at flash_start, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via flash_start.
是的,但这不是我的问题。我不确定您是否理解我问题的微妙之处。看看我提供的例子。确实您可以很好地访问此位置,但请确保您了解如何 您这样做,然后我的问题就会变得明显。特别注意上面的示例 3,它是 错误的 ,即使对于 C 程序员来说它 看起来是正确的 。要阅读 uint32_t
,例如,在 __flash_start__
,您可以这样做:
extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)&__flash_start__); // correct, even though it *looks like* you're taking the address (&) of an address (__flash_start__)
或者这个:
extern uint32_t __flash_start__[];
uint32_t u32 = *((uint32_t *)__flash_start__); // also correct, and my preferred way of doing it because it looks more correct to the trained "C-programmer" eye
但绝对不是这个:
extern uint32_t __flash_start__;
uint32_t u32 = __flash_start__; // incorrect; <==UPDATE: THIS IS ALSO CORRECT! (and more straight-forward too, actually; see comment discussion under this question)
而不是这个:
extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)__flash_start__); // incorrect, but *looks* right
相关:
- Why do STM32 gcc linker scripts automatically discard all input sections from these standard libraries: libc.a, libm.a, libgcc.a?
- [我的回答]How to get value of variable defined in ld linker script from C
更简短的回答:
访问链接描述文件变量的“值”不是未定义的行为,并且可以这样做,只要您希望实际数据存储在内存中的那个位置而不是该内存的地址或链接脚本变量的“值”,它恰好被 C 代码视为内存中的 地址 仅 不是一个值。
是的,这有点令人困惑,所以请仔细阅读 3 遍。 本质上,如果您想访问链接描述文件变量的值,只需确保您的链接描述文件已设置为防止任何您不想要的内容结束在该内存地址中,这样无论您想要什么事实上那里。这样,读取该内存地址的值将为您提供您期望在那里的有用信息。
但是,如果您使用链接描述文件变量来存储某种“值”,那么在 C 中获取这些链接描述文件变量的“值”的方法是读取它们的 addresses,因为你在链接描述文件中分配给变量的“值”被 C 编译器视为该链接描述文件变量的“地址”,因为链接描述文件旨在操纵内存和内存地址,而不是传统的 C 变量。
在我的问题下有一些非常有价值和正确的评论,我认为这些评论值得张贴在这个答案中,这样它们就不会丢失。 请在我上面的问题下给他的评论点赞。
The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing
__flash_start__
is normally accessible memory, and except for any requirements of your system about what is at__flash_start__
, you could, in theory, put auint32_t
(using appropriate input to the linker) and then access it via__flash_start__
.
– Eric Postpischil
That documentation is not written very well, and you are taking the first sentence too literally. What is really happening here is that the linker’s notion of the “value” of a symbol and a programming language’s notion of the “value” of an identifier are different things. To the linker, the value of a symbol is simply a number associated with it. In a programming language, the value is a number (or other element in the set of values of some type) stored in the (sometimes notional) storage associated with the identifier. The documentation is advising you that the linker’s value of a symbol appears inside a language like C as the address associated with the identifier, rather than the contents of its storage...
这部分非常重要,我们应该更新 GNU 链接描述文件手册:
It goes too far when it tells you to “never attempt to use its value.”
It is correct that merely defining a linker symbol does not reserve the necessary storage for a programming language object, and therefore merely having a linker symbol does not provide you storage you can access. However if you ensure storage is allocated by some other means, then, sure, it can work as a programming language object. There is no general prohibition on using a linker symbol as an identifier in C, including accessing its C value, if you have properly allocated storage and otherwise satisfied the requirements for this. If the linker value of
__flash_start__
is a valid memory address, and you have ensure there is storage for auint32_t
at that address, and it is a properly aligned address for auint32_t
, then it is okay to access__flash_start__
in C as if it were auint32_t
. That would not be defined by the C standard, but by the GNU tools.
– Eric Postpischil
长答案:
我在问题中说:
// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);
// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);
// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);
(请参阅问题下的讨论,了解我是如何得出这个结论的)。
具体看上面#3:
嗯,实际上,如果你的目标是读取__flash_start__
的地址,在这种情况下是0x8000000
,那么是的,这是完全错误。但是,这不是未定义的行为!相反,它实际上正在做的是将该地址 (0x8000000
) 的 contents(值)读取为 uint32_t
类型。换句话说,它只是读取 FLASH 部分的前 4 个字节,并将它们解释为 uint32_t
。在这种情况下,内容(此地址的uint32_t
值)恰好是0x20080000
。
为进一步证明这一点,以下完全相同:
// Read the actual *contents* of the `__flash_start__` address as a 4-byte value!
// forward declaration to make a variable defined in the linker script
// accessible in the C code
extern uint32_t __flash_start__;
// These 2 read techniques do the exact same thing.
uint32_t u32_1 = __flash_start__; // technique 1
uint32_t u32_2 = *((uint32_t *)&__flash_start__); // technique 2
printf("u32_1 = 0x%lX\n", u32_1);
printf("u32_2 = 0x%lX\n", u32_2);
输出为:
u32_1 = 0x20080000
u32_2 = 0x20080000
注意它们产生相同的结果。它们每个都产生一个有效的 uint32_t
类型的值,该值存储在地址 0x8000000
.
然而,事实证明,上面显示的 u32_1
技术是一种更直接和直接的读取值的方法,而且 不是 未定义的行为。相反,它正在正确读取该地址的值(内容)。
我好像在兜圈子。无论如何,我很震惊,但我现在明白了。在我应该只使用上面显示的 u32_2
技术之前我被说服了,但事实证明它们都很好,而且 u32_1
技术显然更直接(我去又在兜圈子)。 :)
干杯。
深入挖掘:存储在闪存开头的 0x20080000
值从何而来?
再来一个小花絮。我实际上 运行 这个测试代码是在一个 STM32F777 mcu 上的,它有 512KiB 的 RAM。由于 RAM 从地址 0x20000000 开始,这意味着 0x20000000 + 512K = 0x20080000。这恰好也是 RAM 中地址为零的内容,因为 Programming Manual PM0253 Rev 4, pg. 42,“图 10。向量 table”显示向量 Table 的前 4 个字节包含“初始 SP [堆栈指针] 值”。看这里:
我知道 Vector Table 位于闪存中程序存储器的开头,这意味着 0x20080000 是我的初始堆栈指针值。这是有道理的,因为 Reset_Handler
是程序的开始(顺便说一句,它的向量恰好是向量 Table 开始处的第二个 4 字节值),并且它所做的第一件事,如我的“startup_stm32f777xx.s”启动程序集文件所示,是将堆栈指针 (sp) 设置为 _estack
:
Reset_Handler:
ldr sp, =_estack /* set stack pointer */
此外,_estack
在我的链接描述文件中定义如下:
/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM); /* end of RAM */
好了!我的 Vector Table 中的第一个 4 字节值,就在 Flash 的开头,被设置为初始堆栈指针值,它在我的链接描述文件中被定义为 _estack
,并且 _estack
是我RAM末尾的地址,即0x20000000 + 512K = 0x20080000。所以,这一切都是有道理的!我刚刚证明我读到了正确的值!
另请参阅:
- [我的回答]How to get value of variable defined in ld linker script from C