重命名列条目,当它是组的最大值时,会给出不一致的结果
Renaming a column entry, when it is the maximum value by group, gives inconsistent results
我有如下数据:
library(data.table)
DT <- structure(list(State_Ab = c("VA", "VA", "VA", "VA", "VA", "VA",
"VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA",
"VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA",
"VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA",
"VA", "VA", "VA", "VA", "VA", "VA", "VA"), year = c(1995, 1995,
1995, 1995, 1999, 1999, 1999, 1999, 2001, 2001, 2001, 2001, 2005,
2005, 2005, 2005, 2007, 2007, 2007, 2007, 2011, 2011, 2011, 2011,
2017, 2017, 2017, 2005, 2005, 2005, 2005, 2017, 2017, 2017, 1995,
1995, 1995, 1995, 2001, 2001, 2001, 2001, 2007, 2007, 2007, 2007
), County = c("Bedford", "Fairfax", "Bedford", "Fairfax", "Bedford",
"Fairfax", "Bedford", "Fairfax", "Bedford", "Bedford", "Fairfax",
"Fairfax", "Bedford", "Bedford", "Fairfax", "Fairfax", "Bedford",
"Bedford", "Fairfax", "Fairfax", "Bedford", "Bedford", "Fairfax",
"Fairfax", "Bedford", "Fairfax", "Fairfax", "Bedford", "Bedford",
"Fairfax", "Fairfax", "Bedford", "Fairfax", "Fairfax", "Fairfax",
"Fairfax", "Bedford", "Bedford", "Bedford", "Fairfax", "Bedford",
"Fairfax", "Bedford", "Fairfax", "Bedford", "Fairfax"), Type = c("B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"A", "A", "A", "A", "A", "A", "A", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C"), Population = c(15528, 297214,
2053, 7505, 8963, 199282, 829, 4299, 20040, 2018, 9095, 392987,
26930, 2319, 10225, 448078, 24499, 1935, 8048, 340397, 24012,
1926, 7112, 303379, 41681, 479086, 9552, 31404, 2542, 10546,
461379, 42525, 551183, 12028, 303203, 7600, 2160, 17988, 25284,
410475, 2379, 9462, 25122, 342998, 1940, 8096)), row.names = c(NA,
-46L), class = c("data.table", "data.frame"))
其中一些值适用于贝德福德市,一些适用于贝德福德县。根据我掌握的信息,最小值应该是 Bedford City,最大值应该是 Bedford County。我以为我会做以下事情,但它以某种方式失败了。
我想根据 执行以下操作:
DT[County=="Bedford" & order(Population), County := c("Bedford County", "Bedford City"), .(State_Ab, year, County, Type)]
但我收到错误消息:
Error in `[.data.table`(DT, County == "Bedford" & order(Population), `:=`(County, :
Supplied 2 items to be assigned to group 1 of size 4 in column 'County'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
然后输出变成:
State_Ab year County Type Population
1: VA 2017 Bedford B 41681
2: VA 2005 Bedford A 31404
3: VA 2005 Bedford A 2542
4: VA 2017 Bedford A 42525
5: VA 1995 Bedford C 2160
6: VA 1995 Bedford C 17988
7: VA 2001 Bedford C 25284
8: VA 2001 Bedford C 2379
9: VA 2007 Bedford C 25122
10: VA 2007 Bedford C 1940
11: VA 1995 Bedford City B 2053
12: VA 1999 Bedford City B 829
13: VA 2001 Bedford City B 2018
14: VA 2005 Bedford City B 2319
15: VA 2007 Bedford City B 1935
16: VA 2011 Bedford City B 1926
17: VA 1995 Bedford County B 15528
18: VA 1999 Bedford County B 8963
19: VA 2001 Bedford County B 20040
20: VA 2005 Bedford County B 26930
21: VA 2007 Bedford County B 24499
22: VA 2011 Bedford County B 24012
我真的不明白这个问题是从哪里来的..
当尝试数据集中的其他县时,我这样做:
DT[County=="Fairfax" & order(Population), County := c("Fairfax County", "Fairfax City"), .(State_Ab, year, County, Type)]
我没有收到任何错误,但输出是错误的(费尔法克斯县比费尔法克斯市大很多,但它并不总是在数据中):
23: VA 1995 Fairfax City B 7505
24: VA 1999 Fairfax City B 4299
25: VA 2001 Fairfax City B 392987
26: VA 2005 Fairfax City B 448078
27: VA 2007 Fairfax City B 340397
28: VA 2011 Fairfax City B 303379
29: VA 2017 Fairfax City B 9552
30: VA 2005 Fairfax City A 461379
31: VA 2017 Fairfax City A 12028
32: VA 1995 Fairfax City C 7600
33: VA 2001 Fairfax City C 9462
34: VA 2007 Fairfax City C 8096
35: VA 1995 Fairfax County B 297214
36: VA 1999 Fairfax County B 199282
37: VA 2001 Fairfax County B 9095
38: VA 2005 Fairfax County B 10225
39: VA 2007 Fairfax County B 8048
40: VA 2011 Fairfax County B 7112
41: VA 2017 Fairfax County B 479086
42: VA 2005 Fairfax County A 10546
43: VA 2017 Fairfax County A 551183
44: VA 1995 Fairfax County C 303203
45: VA 2001 Fairfax County C 410475
46: VA 2007 Fairfax County C 342998
这真让我抓狂..这是怎么回事?
想要的结果:
23: VA 1995 Fairfax City B 7505
24: VA 1999 Fairfax City B 4299
25: VA 2001 Fairfax County B 392987
26: VA 2005 Fairfax County B 448078
27: VA 2007 Fairfax County B 340397
28: VA 2011 Fairfax County B 303379
29: VA 2017 Fairfax City B 9552
30: VA 2005 Fairfax County A 461379
31: VA 2017 Fairfax City A 12028
32: VA 1995 Fairfax City C 7600
33: VA 2001 Fairfax City C 9462
34: VA 2007 Fairfax City C 8096
35: VA 1995 Fairfax County B 297214
36: VA 1999 Fairfax County B 199282
37: VA 2001 Fairfax City B 9095
38: VA 2005 Fairfax City B 10225
39: VA 2007 Fairfax City B 8048
40: VA 2011 Fairfax City B 7112
41: VA 2017 Fairfax County B 479086
42: VA 2005 Fairfax City A 10546
43: VA 2017 Fairfax County A 551183
44: VA 1995 Fairfax County C 303203
45: VA 2001 Fairfax County C 410475
46: VA 2007 Fairfax County C 342998
我正在使用一个函数来排序县和人口,然后相应地更改县。
我注意到 VA 2017 Bedford A
那一年只有一个条目。
fn2 <- function(County, Population) {
if (length(County) == 1) {
return(list(County, Population))
} else {
list(County = paste(County, c("City", "County")),
Population = sort(Population))
}
}
DT[County == "Bedford", c("County", "Population") := fn2(County, Population),
.(State_Ab, year, Type)]
part of DT
State_Ab year County Type Population
1: VA 1995 Bedford City B 2053
2: VA 1995 Fairfax B 297214
3: VA 1995 Bedford County B 15528
4: VA 1995 Fairfax B 7505
5: VA 1999 Bedford City B 829
6: VA 1999 Fairfax B 199282
7: VA 1999 Bedford County B 8963
8: VA 1999 Fairfax B 4299
9: VA 2001 Bedford City B 2018
10: VA 2001 Bedford County B 20040
11: VA 2001 Fairfax B 9095
12: VA 2001 Fairfax B 392987
我有如下数据:
library(data.table)
DT <- structure(list(State_Ab = c("VA", "VA", "VA", "VA", "VA", "VA",
"VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA",
"VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA",
"VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA",
"VA", "VA", "VA", "VA", "VA", "VA", "VA"), year = c(1995, 1995,
1995, 1995, 1999, 1999, 1999, 1999, 2001, 2001, 2001, 2001, 2005,
2005, 2005, 2005, 2007, 2007, 2007, 2007, 2011, 2011, 2011, 2011,
2017, 2017, 2017, 2005, 2005, 2005, 2005, 2017, 2017, 2017, 1995,
1995, 1995, 1995, 2001, 2001, 2001, 2001, 2007, 2007, 2007, 2007
), County = c("Bedford", "Fairfax", "Bedford", "Fairfax", "Bedford",
"Fairfax", "Bedford", "Fairfax", "Bedford", "Bedford", "Fairfax",
"Fairfax", "Bedford", "Bedford", "Fairfax", "Fairfax", "Bedford",
"Bedford", "Fairfax", "Fairfax", "Bedford", "Bedford", "Fairfax",
"Fairfax", "Bedford", "Fairfax", "Fairfax", "Bedford", "Bedford",
"Fairfax", "Fairfax", "Bedford", "Fairfax", "Fairfax", "Fairfax",
"Fairfax", "Bedford", "Bedford", "Bedford", "Fairfax", "Bedford",
"Fairfax", "Bedford", "Fairfax", "Bedford", "Fairfax"), Type = c("B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"A", "A", "A", "A", "A", "A", "A", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C"), Population = c(15528, 297214,
2053, 7505, 8963, 199282, 829, 4299, 20040, 2018, 9095, 392987,
26930, 2319, 10225, 448078, 24499, 1935, 8048, 340397, 24012,
1926, 7112, 303379, 41681, 479086, 9552, 31404, 2542, 10546,
461379, 42525, 551183, 12028, 303203, 7600, 2160, 17988, 25284,
410475, 2379, 9462, 25122, 342998, 1940, 8096)), row.names = c(NA,
-46L), class = c("data.table", "data.frame"))
其中一些值适用于贝德福德市,一些适用于贝德福德县。根据我掌握的信息,最小值应该是 Bedford City,最大值应该是 Bedford County。我以为我会做以下事情,但它以某种方式失败了。
我想根据
DT[County=="Bedford" & order(Population), County := c("Bedford County", "Bedford City"), .(State_Ab, year, County, Type)]
但我收到错误消息:
Error in `[.data.table`(DT, County == "Bedford" & order(Population), `:=`(County, :
Supplied 2 items to be assigned to group 1 of size 4 in column 'County'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
然后输出变成:
State_Ab year County Type Population
1: VA 2017 Bedford B 41681
2: VA 2005 Bedford A 31404
3: VA 2005 Bedford A 2542
4: VA 2017 Bedford A 42525
5: VA 1995 Bedford C 2160
6: VA 1995 Bedford C 17988
7: VA 2001 Bedford C 25284
8: VA 2001 Bedford C 2379
9: VA 2007 Bedford C 25122
10: VA 2007 Bedford C 1940
11: VA 1995 Bedford City B 2053
12: VA 1999 Bedford City B 829
13: VA 2001 Bedford City B 2018
14: VA 2005 Bedford City B 2319
15: VA 2007 Bedford City B 1935
16: VA 2011 Bedford City B 1926
17: VA 1995 Bedford County B 15528
18: VA 1999 Bedford County B 8963
19: VA 2001 Bedford County B 20040
20: VA 2005 Bedford County B 26930
21: VA 2007 Bedford County B 24499
22: VA 2011 Bedford County B 24012
我真的不明白这个问题是从哪里来的..
当尝试数据集中的其他县时,我这样做:
DT[County=="Fairfax" & order(Population), County := c("Fairfax County", "Fairfax City"), .(State_Ab, year, County, Type)]
我没有收到任何错误,但输出是错误的(费尔法克斯县比费尔法克斯市大很多,但它并不总是在数据中):
23: VA 1995 Fairfax City B 7505
24: VA 1999 Fairfax City B 4299
25: VA 2001 Fairfax City B 392987
26: VA 2005 Fairfax City B 448078
27: VA 2007 Fairfax City B 340397
28: VA 2011 Fairfax City B 303379
29: VA 2017 Fairfax City B 9552
30: VA 2005 Fairfax City A 461379
31: VA 2017 Fairfax City A 12028
32: VA 1995 Fairfax City C 7600
33: VA 2001 Fairfax City C 9462
34: VA 2007 Fairfax City C 8096
35: VA 1995 Fairfax County B 297214
36: VA 1999 Fairfax County B 199282
37: VA 2001 Fairfax County B 9095
38: VA 2005 Fairfax County B 10225
39: VA 2007 Fairfax County B 8048
40: VA 2011 Fairfax County B 7112
41: VA 2017 Fairfax County B 479086
42: VA 2005 Fairfax County A 10546
43: VA 2017 Fairfax County A 551183
44: VA 1995 Fairfax County C 303203
45: VA 2001 Fairfax County C 410475
46: VA 2007 Fairfax County C 342998
这真让我抓狂..这是怎么回事?
想要的结果:
23: VA 1995 Fairfax City B 7505
24: VA 1999 Fairfax City B 4299
25: VA 2001 Fairfax County B 392987
26: VA 2005 Fairfax County B 448078
27: VA 2007 Fairfax County B 340397
28: VA 2011 Fairfax County B 303379
29: VA 2017 Fairfax City B 9552
30: VA 2005 Fairfax County A 461379
31: VA 2017 Fairfax City A 12028
32: VA 1995 Fairfax City C 7600
33: VA 2001 Fairfax City C 9462
34: VA 2007 Fairfax City C 8096
35: VA 1995 Fairfax County B 297214
36: VA 1999 Fairfax County B 199282
37: VA 2001 Fairfax City B 9095
38: VA 2005 Fairfax City B 10225
39: VA 2007 Fairfax City B 8048
40: VA 2011 Fairfax City B 7112
41: VA 2017 Fairfax County B 479086
42: VA 2005 Fairfax City A 10546
43: VA 2017 Fairfax County A 551183
44: VA 1995 Fairfax County C 303203
45: VA 2001 Fairfax County C 410475
46: VA 2007 Fairfax County C 342998
我正在使用一个函数来排序县和人口,然后相应地更改县。
我注意到 VA 2017 Bedford A
那一年只有一个条目。
fn2 <- function(County, Population) {
if (length(County) == 1) {
return(list(County, Population))
} else {
list(County = paste(County, c("City", "County")),
Population = sort(Population))
}
}
DT[County == "Bedford", c("County", "Population") := fn2(County, Population),
.(State_Ab, year, Type)]
part of DT State_Ab year County Type Population 1: VA 1995 Bedford City B 2053 2: VA 1995 Fairfax B 297214 3: VA 1995 Bedford County B 15528 4: VA 1995 Fairfax B 7505 5: VA 1999 Bedford City B 829 6: VA 1999 Fairfax B 199282 7: VA 1999 Bedford County B 8963 8: VA 1999 Fairfax B 4299 9: VA 2001 Bedford City B 2018 10: VA 2001 Bedford County B 20040 11: VA 2001 Fairfax B 9095 12: VA 2001 Fairfax B 392987