R中的子集()与$
subset() in R with $
我试图理解 R 中 subset()
函数的怪癖以及 $
运算符的使用。我将使用 R 中的 CO2
数据集作为示例:
我可以运行
sub <- subset(CO2, CO2$Type=="Quebec")
没有错误地到达与我运行
相同的数据集
sub <- subset(CO2, Type=="Quebec")
但是,我观察到情况并非总是如此。
有时在 subset()
函数中包含 $
会产生以下错误
$ operator is invalid for atomic vectors
是什么触发了“$ operator is invalid for atomic vectors”错误?
为什么在某些情况下允许 $
(如上面的 CO2 示例)但在其他情况下不允许?(当我通过 read.csv()
和有时我在尝试使用 $
进行子集化时出现错误,有时我没有任何可辨别的模式)
谢谢!
根据下面的评论,我正在尝试 post 可重现的示例。
这里是触发错误的情况:
Moose<-structure(list(Moose = 1:25, Tagging_Loc = structure(c(1L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Gender = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L), .Label = c("F", "M"), class = "factor"), Age = c(20L,
23L, 14L, 15L, 10L, 9L, 5L, 10L, 19L, 22L, 21L, 21L, 7L,
16L, 19L, 9L, 23L, 5L, 9L, 10L, 16L, 8L, 13L, 14L, 6L), Weight = c(1366L,
1006L, 888L, 1359L, 899L, 635L, 400L, 1000L, 1012L, 1480L,
1001L, 1100L, 482L, 1414L, 971L, 725L, 1400L, 416L, 790L,
970L, 921L, 560L, 1103L, 904L, 669L), Distance = c(250.5,
410.239, 457.6402591, 245.8523, 430.9975, 308.8673107, 212.5212497,
414.2093545, 439.6581, 215.6491489, 464.2384, 425.4256828,
233.5635555, 207.98, 453.7098751, 390.0506365, 235.5212497,
207.368, 427.5084899, 443.0452824, 459.8999274, 274.6856592,
350.5661674, 456.9600032, 330.146)), .Names = c("Moose",
"Tagging_Loc", "Gender", "Age", "Weight", "Distance"), class = "data.frame", row.names = c(NA,
-25L))
sub_Moose<-subset(Moose, Moose$Tagging_Loc=="A")
sub_Moose<-subset(Moose, Tagging_Loc=="A")'
但是如果我只更改数据集的名称,subset()
运行 的两个版本都很好 - 没有错误:
mOose<-structure(list(Moose = 1:25, Tagging_Loc = structure(c(1L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Gender = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L), .Label = c("F", "M"), class = "factor"), Age = c(20L,
23L, 14L, 15L, 10L, 9L, 5L, 10L, 19L, 22L, 21L, 21L, 7L,
16L, 19L, 9L, 23L, 5L, 9L, 10L, 16L, 8L, 13L, 14L, 6L), Weight = c(1366L,
1006L, 888L, 1359L, 899L, 635L, 400L, 1000L, 1012L, 1480L,
1001L, 1100L, 482L, 1414L, 971L, 725L, 1400L, 416L, 790L,
970L, 921L, 560L, 1103L, 904L, 669L), Distance = c(250.5,
410.239, 457.6402591, 245.8523, 430.9975, 308.8673107, 212.5212497,
414.2093545, 439.6581, 215.6491489, 464.2384, 425.4256828,
233.5635555, 207.98, 453.7098751, 390.0506365, 235.5212497,
207.368, 427.5084899, 443.0452824, 459.8999274, 274.6856592,
350.5661674, 456.9600032, 330.146)), .Names = c("Moose",
"Tagging_Loc", "Gender", "Age", "Weight", "Distance"), class = "data.frame", row.names = c(NA,
-25L))
sub_Moose<-subset(mOose, mOose$Tagging_Loc=="A")
sub_Moose<-subset(mOose, Tagging_Loc=="A")
不要对子集使用 $
!要么使用
sub <- subset(CO2, Type=="Quebec")
或使用
sub <- CO2[CO2$Type=="Quebec", ]
subset()
函数通过评估 data.frame 环境中的所有符号来工作。在您的 Moose
示例中,您的 data.frame Moose
有一个列名称 Moose
。所以当你 运行
sub_Moose <- subset(Moose, Moose$Tagging_Loc=="A")
表达式 Moose$Tagging_Loc=="A"
在 data.frame 的环境中计算。在那个 data.frame 中,有一个名为 Moose
的列,因此在找到同名的 data.frame 之前计算列向量。请注意,with()
与 subset()
非常相似,因为它在环境或 data.frame 的上下文中计算表达式。观察
class(Moose)
# [1] "data.frame"
with(Moose, class(Moose))
# [1] "integer"
class(Moose$Moose)
# [1] "integer"
所以 Moose$Tagging_Loc=="A"
仅在 Moose
是 data.frame 时有效,但是当您使用 subset()
时,Moose
是整数向量,因为它正在寻找专栏第一。
我试图理解 R 中 subset()
函数的怪癖以及 $
运算符的使用。我将使用 R 中的 CO2
数据集作为示例:
我可以运行
sub <- subset(CO2, CO2$Type=="Quebec")
没有错误地到达与我运行
相同的数据集sub <- subset(CO2, Type=="Quebec")
但是,我观察到情况并非总是如此。
有时在 subset()
函数中包含 $
会产生以下错误
$ operator is invalid for atomic vectors
是什么触发了“$ operator is invalid for atomic vectors”错误?
为什么在某些情况下允许 $
(如上面的 CO2 示例)但在其他情况下不允许?(当我通过 read.csv()
和有时我在尝试使用 $
进行子集化时出现错误,有时我没有任何可辨别的模式)
谢谢!
根据下面的评论,我正在尝试 post 可重现的示例。
这里是触发错误的情况:
Moose<-structure(list(Moose = 1:25, Tagging_Loc = structure(c(1L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Gender = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L), .Label = c("F", "M"), class = "factor"), Age = c(20L,
23L, 14L, 15L, 10L, 9L, 5L, 10L, 19L, 22L, 21L, 21L, 7L,
16L, 19L, 9L, 23L, 5L, 9L, 10L, 16L, 8L, 13L, 14L, 6L), Weight = c(1366L,
1006L, 888L, 1359L, 899L, 635L, 400L, 1000L, 1012L, 1480L,
1001L, 1100L, 482L, 1414L, 971L, 725L, 1400L, 416L, 790L,
970L, 921L, 560L, 1103L, 904L, 669L), Distance = c(250.5,
410.239, 457.6402591, 245.8523, 430.9975, 308.8673107, 212.5212497,
414.2093545, 439.6581, 215.6491489, 464.2384, 425.4256828,
233.5635555, 207.98, 453.7098751, 390.0506365, 235.5212497,
207.368, 427.5084899, 443.0452824, 459.8999274, 274.6856592,
350.5661674, 456.9600032, 330.146)), .Names = c("Moose",
"Tagging_Loc", "Gender", "Age", "Weight", "Distance"), class = "data.frame", row.names = c(NA,
-25L))
sub_Moose<-subset(Moose, Moose$Tagging_Loc=="A")
sub_Moose<-subset(Moose, Tagging_Loc=="A")'
但是如果我只更改数据集的名称,subset()
运行 的两个版本都很好 - 没有错误:
mOose<-structure(list(Moose = 1:25, Tagging_Loc = structure(c(1L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Gender = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L), .Label = c("F", "M"), class = "factor"), Age = c(20L,
23L, 14L, 15L, 10L, 9L, 5L, 10L, 19L, 22L, 21L, 21L, 7L,
16L, 19L, 9L, 23L, 5L, 9L, 10L, 16L, 8L, 13L, 14L, 6L), Weight = c(1366L,
1006L, 888L, 1359L, 899L, 635L, 400L, 1000L, 1012L, 1480L,
1001L, 1100L, 482L, 1414L, 971L, 725L, 1400L, 416L, 790L,
970L, 921L, 560L, 1103L, 904L, 669L), Distance = c(250.5,
410.239, 457.6402591, 245.8523, 430.9975, 308.8673107, 212.5212497,
414.2093545, 439.6581, 215.6491489, 464.2384, 425.4256828,
233.5635555, 207.98, 453.7098751, 390.0506365, 235.5212497,
207.368, 427.5084899, 443.0452824, 459.8999274, 274.6856592,
350.5661674, 456.9600032, 330.146)), .Names = c("Moose",
"Tagging_Loc", "Gender", "Age", "Weight", "Distance"), class = "data.frame", row.names = c(NA,
-25L))
sub_Moose<-subset(mOose, mOose$Tagging_Loc=="A")
sub_Moose<-subset(mOose, Tagging_Loc=="A")
不要对子集使用 $
!要么使用
sub <- subset(CO2, Type=="Quebec")
或使用
sub <- CO2[CO2$Type=="Quebec", ]
subset()
函数通过评估 data.frame 环境中的所有符号来工作。在您的 Moose
示例中,您的 data.frame Moose
有一个列名称 Moose
。所以当你 运行
sub_Moose <- subset(Moose, Moose$Tagging_Loc=="A")
表达式 Moose$Tagging_Loc=="A"
在 data.frame 的环境中计算。在那个 data.frame 中,有一个名为 Moose
的列,因此在找到同名的 data.frame 之前计算列向量。请注意,with()
与 subset()
非常相似,因为它在环境或 data.frame 的上下文中计算表达式。观察
class(Moose)
# [1] "data.frame"
with(Moose, class(Moose))
# [1] "integer"
class(Moose$Moose)
# [1] "integer"
所以 Moose$Tagging_Loc=="A"
仅在 Moose
是 data.frame 时有效,但是当您使用 subset()
时,Moose
是整数向量,因为它正在寻找专栏第一。