R Dataframe 按值过滤器
R Dataframe By-Value Filter
假设我有一个如下所示的数据集
Person Year From To
Peter 2001 Apple Microsoft
Peter 2006 Microsoft IBM
Peter 2010 IBM Facebook
Peter 2016 Facebook Apple
Kate 2003 Microsoft Google
Jimmy 2001 Samsung IBM
Jimmy 2004 IBM Google
Jimmy 2009 Google Facebook
我想按人员筛选,只保留曾在 IBM 工作过的人员(在 From
或 To
列中)。此外,我只想在人们离开 IBM 之前(即“IBM”首次出现在 From
列之前)保留记录。因此,我想要如下内容:
Person Year From To
Peter 2001 Apple Microsoft
Peter 2006 Microsoft IBM
Jimmy 2001 Samsung IBM
dplyr
的可能解决方案:
library(dplyr)
df %>%
group_by(Person) %>%
filter(To == "IBM" | lead(To) == "IBM") %>%
ungroup()
# A tibble: 3 x 4
Person Year From To
<chr> <int> <chr> <chr>
1 Peter 2001 Apple Microsoft
2 Peter 2006 Microsoft IBM
3 Jimmy 2001 Samsung IBM
数据
df <- structure(list(Person = c("Peter", "Peter", "Peter", "Peter",
"Kate", "Jimmy", "Jimmy", "Jimmy"), Year = c(2001L, 2006L, 2010L,
2016L, 2003L, 2001L, 2004L, 2009L), From = c("Apple", "Microsoft",
"IBM", "Facebook", "Microsoft", "Samsung", "IBM", "Google"),
To = c("Microsoft", "IBM", "Facebook", "Apple", "Google",
"IBM", "Google", "Facebook")), class = "data.frame", row.names = c(NA, -8L))
假设我有一个如下所示的数据集
Person Year From To
Peter 2001 Apple Microsoft
Peter 2006 Microsoft IBM
Peter 2010 IBM Facebook
Peter 2016 Facebook Apple
Kate 2003 Microsoft Google
Jimmy 2001 Samsung IBM
Jimmy 2004 IBM Google
Jimmy 2009 Google Facebook
我想按人员筛选,只保留曾在 IBM 工作过的人员(在 From
或 To
列中)。此外,我只想在人们离开 IBM 之前(即“IBM”首次出现在 From
列之前)保留记录。因此,我想要如下内容:
Person Year From To
Peter 2001 Apple Microsoft
Peter 2006 Microsoft IBM
Jimmy 2001 Samsung IBM
dplyr
的可能解决方案:
library(dplyr)
df %>%
group_by(Person) %>%
filter(To == "IBM" | lead(To) == "IBM") %>%
ungroup()
# A tibble: 3 x 4
Person Year From To
<chr> <int> <chr> <chr>
1 Peter 2001 Apple Microsoft
2 Peter 2006 Microsoft IBM
3 Jimmy 2001 Samsung IBM
数据
df <- structure(list(Person = c("Peter", "Peter", "Peter", "Peter",
"Kate", "Jimmy", "Jimmy", "Jimmy"), Year = c(2001L, 2006L, 2010L,
2016L, 2003L, 2001L, 2004L, 2009L), From = c("Apple", "Microsoft",
"IBM", "Facebook", "Microsoft", "Samsung", "IBM", "Google"),
To = c("Microsoft", "IBM", "Facebook", "Apple", "Google",
"IBM", "Google", "Facebook")), class = "data.frame", row.names = c(NA, -8L))