在特定时间之前创建不同的值列

Question

我有一个关于如何计算特定时间点之前的唯一值的问题。例如，我想知道一个人在那之前居住过多少个独特的位置。

 created<- c(2009,2010,2010,2011, 2012, 2011)
 person <- c(A, A, A, A, B, B)
 location<- c('London','Geneva', 'London', 'New York', 'London', 'London')
 df <- data.frame (created, person, location)

我想创建一个名为 unique 的变量，考虑到他在那个时间点之前住过多少个不同的地方。我尝试了以下内容。有什么建议吗？

  library(dplyr) 
   df %>% group_by(person, location) %>% arrange(Created,.by_group = TRUE) %>% mutate (unique=distinct (location))

  unique <- c(1, 2, 2, 3,1,1)

Answer 1

一种方法是使用 cumsum 和 duplicated

library(dplyr)
df %>% group_by(person) %>% mutate(unique = cumsum(!duplicated(location)))

#  created person location unique
#    <dbl> <fct>  <fct>     <int>
#1    2009 A      London        1
#2    2010 A      Geneva        2
#3    2010 A      London        2
#4    2011 A      New York      3
#5    2012 B      London        1
#6    2011 B      London        1

Answer 2

我们可以使用cummax

library(dplyr)
df %>% 
   group_by(person) %>% 
   mutate(unique = cummax(match(location, unique(location))))
# A tibble: 6 x 4
# Groups:   person [2]
#  created person location unique
#    <dbl> <fct>  <fct>     <int>
#1    2009 A      London        1
#2    2010 A      Geneva        2
#3    2010 A      London        2
#4    2011 A      New York      3
#5    2012 B      London        1
#6    2011 B      London        1

或 base R

df$unique <- with(df, ave(location, person, FUN =
          function(x) cummax(match(x, unique(x)))))

数据

df <- structure(list(created = c(2009, 2010, 2010, 2011, 2012, 2011
), person = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), location = structure(c(2L, 1L, 2L, 3L, 
2L, 2L), .Label = c("Geneva", "London", "New York"), class = "factor")),
class = "data.frame", row.names = c(NA, 
-6L))

在特定时间之前创建不同的值列

creating distinct values column till certain time

r

plyr

dplyr

data.table

tidyr

数据