按组 ID 简单扩展 data.table 中的行
simple expansion of rows in data.table by group IDs
对于以下给定的数据集:
> df <- data.table(ID=LETTERS[1:4],y_min=c(1970,1973,1976,1971),y_max=c(1974,1975,1980,1974))
> df
ID y_min y_max
1: A 1970 1974
2: B 1973 1975
3: C 1976 1980
4: D 1971 1974
ID 表示公司 ID,y_min 和 y_max 是公司数据出现在数据集中的第一年和最后一年。
我如何通过创建包含最小和最大年份之间的所有年份的新列“年份”来扩展行(按每个公司),因此生成以下内容:
> df
ID y_min y_max year
1: A 1970 1974 1970
2: A 1970 1974 1971
3: A 1970 1974 1972
4: A 1970 1974 1973
5: A 1970 1974 1974
6: B 1973 1975 1973
7: B 1973 1975 1974
8: B 1973 1975 1975
9: C 1976 1980 1976
10: C 1976 1980 1977
11: C 1976 1980 1978
12: C 1976 1980 1979
13: C 1976 1980 1980
14: D 1971 1974 1971
15: D 1971 1974 1972
16: D 1971 1974 1973
17: D 1971 1974 1974
提前致谢!
一个选项是通过对 'ID' 列进行分组来创建 list
列,然后在 'y_min'、'y_max' 和 'y_max' 之间执行 :
unnest
(来自tidyr
)list
列
library(data.table)
library(tidyr)
df[, year := .(list(y_min:y_max)), ID]
df %>%
unnest(c(year))
-输出
# A tibble: 17 x 4
# ID y_min y_max year
# <chr> <dbl> <dbl> <int>
# 1 A 1970 1974 1970
# 2 A 1970 1974 1971
# 3 A 1970 1974 1972
# 4 A 1970 1974 1973
# 5 A 1970 1974 1974
# 6 B 1973 1975 1973
# 7 B 1973 1975 1974
# 8 B 1973 1975 1975
# 9 C 1976 1980 1976
#10 C 1976 1980 1977
#11 C 1976 1980 1978
#12 C 1976 1980 1979
#13 C 1976 1980 1980
#14 D 1971 1974 1971
#15 D 1971 1974 1972
#16 D 1971 1974 1973
#17 D 1971 1974 1974
或仅使用 data.table
df[, year := Map(`:`, y_min, y_max)]
df[rep(seq_len(.N), lengths(year))][, year := unlist(df$year)][]
-输出
# ID y_min y_max year
# 1: A 1970 1974 1970
# 2: A 1970 1974 1971
# 3: A 1970 1974 1972
# 4: A 1970 1974 1973
# 5: A 1970 1974 1974
# 6: B 1973 1975 1973
# 7: B 1973 1975 1974
# 8: B 1973 1975 1975
# 9: C 1976 1980 1976
#10: C 1976 1980 1977
#11: C 1976 1980 1978
#12: C 1976 1980 1979
#13: C 1976 1980 1980
#14: D 1971 1974 1971
#15: D 1971 1974 1972
#16: D 1971 1974 1973
#17: D 1971 1974 1974
或使用自连接
df[df, .(y_min, y_max, year = y_min:y_max), on = .(ID), by = .EACHI]
对于以下给定的数据集:
> df <- data.table(ID=LETTERS[1:4],y_min=c(1970,1973,1976,1971),y_max=c(1974,1975,1980,1974))
> df
ID y_min y_max
1: A 1970 1974
2: B 1973 1975
3: C 1976 1980
4: D 1971 1974
ID 表示公司 ID,y_min 和 y_max 是公司数据出现在数据集中的第一年和最后一年。
我如何通过创建包含最小和最大年份之间的所有年份的新列“年份”来扩展行(按每个公司),因此生成以下内容:
> df
ID y_min y_max year
1: A 1970 1974 1970
2: A 1970 1974 1971
3: A 1970 1974 1972
4: A 1970 1974 1973
5: A 1970 1974 1974
6: B 1973 1975 1973
7: B 1973 1975 1974
8: B 1973 1975 1975
9: C 1976 1980 1976
10: C 1976 1980 1977
11: C 1976 1980 1978
12: C 1976 1980 1979
13: C 1976 1980 1980
14: D 1971 1974 1971
15: D 1971 1974 1972
16: D 1971 1974 1973
17: D 1971 1974 1974
提前致谢!
一个选项是通过对 'ID' 列进行分组来创建 list
列,然后在 'y_min'、'y_max' 和 'y_max' 之间执行 :
unnest
(来自tidyr
)list
列
library(data.table)
library(tidyr)
df[, year := .(list(y_min:y_max)), ID]
df %>%
unnest(c(year))
-输出
# A tibble: 17 x 4
# ID y_min y_max year
# <chr> <dbl> <dbl> <int>
# 1 A 1970 1974 1970
# 2 A 1970 1974 1971
# 3 A 1970 1974 1972
# 4 A 1970 1974 1973
# 5 A 1970 1974 1974
# 6 B 1973 1975 1973
# 7 B 1973 1975 1974
# 8 B 1973 1975 1975
# 9 C 1976 1980 1976
#10 C 1976 1980 1977
#11 C 1976 1980 1978
#12 C 1976 1980 1979
#13 C 1976 1980 1980
#14 D 1971 1974 1971
#15 D 1971 1974 1972
#16 D 1971 1974 1973
#17 D 1971 1974 1974
或仅使用 data.table
df[, year := Map(`:`, y_min, y_max)]
df[rep(seq_len(.N), lengths(year))][, year := unlist(df$year)][]
-输出
# ID y_min y_max year
# 1: A 1970 1974 1970
# 2: A 1970 1974 1971
# 3: A 1970 1974 1972
# 4: A 1970 1974 1973
# 5: A 1970 1974 1974
# 6: B 1973 1975 1973
# 7: B 1973 1975 1974
# 8: B 1973 1975 1975
# 9: C 1976 1980 1976
#10: C 1976 1980 1977
#11: C 1976 1980 1978
#12: C 1976 1980 1979
#13: C 1976 1980 1980
#14: D 1971 1974 1971
#15: D 1971 1974 1972
#16: D 1971 1974 1973
#17: D 1971 1974 1974
或使用自连接
df[df, .(y_min, y_max, year = y_min:y_max), on = .(ID), by = .EACHI]