具有关系运算符的字符行为

Question

谁能解释一下为什么 R 会这样做？背后的原因是什么？

"-1" < 0
#[1] TRUE
# expected [1] FALSE # OR better NA

"-abc" < 0
#[1] TRUE
# expected [1] FALSE # OR better NA

来自?Comparison：

If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw

这对 FWIW 都没有帮助：

toString(-1) < 0
as.character(-1) < 0
toString("-abc") < 0
as.character("-abc") < 0

我期望得到不同的结果是不是错了？我问这个是因为在我看来，如果不知道的话，这可能会在函数内部产生意想不到的结果。

Answer 1

引用你已经引用的优先规则：

the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw

所以在表达式中：

"-abc" < 0

发生的事情是 RHS 上的 0 被强制转换为字符。这给我们留下了：

"-abc" < "0"

这在字典上是正确的（您可以自己检查一下）。所以表达式的计算结果为真。请注意，如果强制转换以另一种方式进行，即如果 R 试图将 "-abc" 强制转换为数字类型，则结果将是 NA，并且整个表达式的计算结果将是 NA, 不正确:

"-abc" < 0
NA < 0
NA

所以，这就是我们如何知道 R 将 RHS 强制转换为字符。

R（或SQL、Java、Java脚本，实际上任何语言）中的一个好的经验法则是不要乱用类型。如果您知道您的数据是数字，则使用数字类型并将其视为数字类型，反之亦然处理字符数据。

具有关系运算符的字符行为

Behaviour of characters with relational operators

r

character

logical-operators