R - 过滤多年保持不变的观察结果
R - filter observations that remain the same over multiple years
我准备了一个例子data.table:
testTable <- data.table(years = rep(c(rep((2014),3),rep((2015),3), rep((2016),3)), 2),
policy = c(rep("A", 9), rep("B",9)),
destination = rep(c("Paris", "London", "Berlin"), 6))
testTable[c(1,5,8), destination := c("Moskaw", "Milano", "Valencia")]
> testTable
years policy destination
1: 2014 A Moskaw
2: 2014 A London
3: 2014 A Berlin
4: 2015 A Paris
5: 2015 A Milano
6: 2015 A Berlin
7: 2016 A Paris
8: 2016 A Valencia
9: 2016 A Berlin
10: 2014 B Paris
11: 2014 B London
12: 2014 B Berlin
13: 2015 B Paris
14: 2015 B London
15: 2015 B Berlin
16: 2016 B Paris
17: 2016 B London
18: 2016 B Berlin
在这里,我只想保留数据中所有年份具有相同 destination
的观测值。在这个例子中,我只选择了 3 年的政策,但真实数据也可能有 2、3 和 4 年的历史混合在一个 data.table 中。
期望的结果是:
> testTable
years policy destination
3: 2014 A Berlin
6: 2015 A Berlin
9: 2016 A Berlin
10: 2014 B Paris
11: 2014 B London
12: 2014 B Berlin
13: 2015 B Paris
14: 2015 B London
15: 2015 B Berlin
16: 2016 B Paris
17: 2016 B London
18: 2016 B Berlin
Any ides?
我尝试使用 dcast()
,然后我想过滤那些在 policy
之后所有列中具有相同条目的行,但是我意识到 dcast()
会自动转换我的字符变量 destination
转换为数字并使用长度聚合我的数据:
Aggregate function missing, defaulting to 'length'
注意:我的数据将包含数百个观察值。
我们可以为每个 policy
.
过滤每个 years
中存在的常见 destination
library(data.table)
testTable[testTable[, destination %in%
Reduce(intersect, split(destination, years)), policy]$V1]
# years policy destination
# 1: 2014 A Berlin
# 2: 2015 A Berlin
# 3: 2016 A Berlin
# 4: 2014 B Paris
# 5: 2014 B London
# 6: 2014 B Berlin
# 7: 2015 B Paris
# 8: 2015 B London
# 9: 2015 B Berlin
#10: 2016 B Paris
#11: 2016 B London
#12: 2016 B Berlin
并在 dplyr
中:
library(dplyr)
testTable %>%
group_by(policy) %>%
filter(destination %in% Reduce(intersect, split(destination, years)))
这应该可以解决
library(tidyverse)
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#>
#> between, first, last
#> The following object is masked from 'package:purrr':
#>
#> transpose
testTable <- data.table(years = rep(c(rep((2014),3),rep((2015),3), rep((2016),3)), 2),
policy = c(rep("A", 9), rep("B",9)),
destination = rep(c("Paris", "London", "Berlin"), 6))
testTable[c(1,5,8), destination := c("Moskaw", "Milano", "Valencia")]
testTable %>%
mutate(distinct_years = n_distinct(years)) %>%
group_by(policy,destination) %>%
filter(n_distinct(years) ==distinct_years)
#> # A tibble: 12 x 4
#> # Groups: policy, destination [4]
#> years policy destination distinct_years
#> <dbl> <chr> <chr> <int>
#> 1 2014 A Berlin 3
#> 2 2015 A Berlin 3
#> 3 2016 A Berlin 3
#> 4 2014 B Paris 3
#> 5 2014 B London 3
#> 6 2014 B Berlin 3
#> 7 2015 B Paris 3
#> 8 2015 B London 3
#> 9 2015 B Berlin 3
#> 10 2016 B Paris 3
#> 11 2016 B London 3
#> 12 2016 B Berlin 3
由 reprex package (v0.3.0)
于 2020-06-08 创建
这是另一种 data.table 方法:
dt[, if(all(unique(dt$years) %in% years)) .SD, by = .(policy, destination)]
# policy destination years
# 1: A Berlin 2014
# 2: A Berlin 2015
# 3: A Berlin 2016
# 4: B Paris 2014
# 5: B Paris 2015
# 6: B Paris 2016
# 7: B London 2014
# 8: B London 2015
# 9: B London 2016
# 10: B Berlin 2014
# 11: B Berlin 2015
# 12: B Berlin 2016
我准备了一个例子data.table:
testTable <- data.table(years = rep(c(rep((2014),3),rep((2015),3), rep((2016),3)), 2),
policy = c(rep("A", 9), rep("B",9)),
destination = rep(c("Paris", "London", "Berlin"), 6))
testTable[c(1,5,8), destination := c("Moskaw", "Milano", "Valencia")]
> testTable
years policy destination
1: 2014 A Moskaw
2: 2014 A London
3: 2014 A Berlin
4: 2015 A Paris
5: 2015 A Milano
6: 2015 A Berlin
7: 2016 A Paris
8: 2016 A Valencia
9: 2016 A Berlin
10: 2014 B Paris
11: 2014 B London
12: 2014 B Berlin
13: 2015 B Paris
14: 2015 B London
15: 2015 B Berlin
16: 2016 B Paris
17: 2016 B London
18: 2016 B Berlin
在这里,我只想保留数据中所有年份具有相同 destination
的观测值。在这个例子中,我只选择了 3 年的政策,但真实数据也可能有 2、3 和 4 年的历史混合在一个 data.table 中。
期望的结果是:
> testTable
years policy destination
3: 2014 A Berlin
6: 2015 A Berlin
9: 2016 A Berlin
10: 2014 B Paris
11: 2014 B London
12: 2014 B Berlin
13: 2015 B Paris
14: 2015 B London
15: 2015 B Berlin
16: 2016 B Paris
17: 2016 B London
18: 2016 B Berlin
Any ides?
我尝试使用 dcast()
,然后我想过滤那些在 policy
之后所有列中具有相同条目的行,但是我意识到 dcast()
会自动转换我的字符变量 destination
转换为数字并使用长度聚合我的数据:
Aggregate function missing, defaulting to 'length'
注意:我的数据将包含数百个观察值。
我们可以为每个 policy
.
years
中存在的常见 destination
library(data.table)
testTable[testTable[, destination %in%
Reduce(intersect, split(destination, years)), policy]$V1]
# years policy destination
# 1: 2014 A Berlin
# 2: 2015 A Berlin
# 3: 2016 A Berlin
# 4: 2014 B Paris
# 5: 2014 B London
# 6: 2014 B Berlin
# 7: 2015 B Paris
# 8: 2015 B London
# 9: 2015 B Berlin
#10: 2016 B Paris
#11: 2016 B London
#12: 2016 B Berlin
并在 dplyr
中:
library(dplyr)
testTable %>%
group_by(policy) %>%
filter(destination %in% Reduce(intersect, split(destination, years)))
这应该可以解决
library(tidyverse)
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#>
#> between, first, last
#> The following object is masked from 'package:purrr':
#>
#> transpose
testTable <- data.table(years = rep(c(rep((2014),3),rep((2015),3), rep((2016),3)), 2),
policy = c(rep("A", 9), rep("B",9)),
destination = rep(c("Paris", "London", "Berlin"), 6))
testTable[c(1,5,8), destination := c("Moskaw", "Milano", "Valencia")]
testTable %>%
mutate(distinct_years = n_distinct(years)) %>%
group_by(policy,destination) %>%
filter(n_distinct(years) ==distinct_years)
#> # A tibble: 12 x 4
#> # Groups: policy, destination [4]
#> years policy destination distinct_years
#> <dbl> <chr> <chr> <int>
#> 1 2014 A Berlin 3
#> 2 2015 A Berlin 3
#> 3 2016 A Berlin 3
#> 4 2014 B Paris 3
#> 5 2014 B London 3
#> 6 2014 B Berlin 3
#> 7 2015 B Paris 3
#> 8 2015 B London 3
#> 9 2015 B Berlin 3
#> 10 2016 B Paris 3
#> 11 2016 B London 3
#> 12 2016 B Berlin 3
由 reprex package (v0.3.0)
于 2020-06-08 创建这是另一种 data.table 方法:
dt[, if(all(unique(dt$years) %in% years)) .SD, by = .(policy, destination)]
# policy destination years
# 1: A Berlin 2014
# 2: A Berlin 2015
# 3: A Berlin 2016
# 4: B Paris 2014
# 5: B Paris 2015
# 6: B Paris 2016
# 7: B London 2014
# 8: B London 2015
# 9: B London 2016
# 10: B Berlin 2014
# 11: B Berlin 2015
# 12: B Berlin 2016