在特定时间之前创建不同的值列
creating distinct values column till certain time
我有一个关于如何计算特定时间点之前的唯一值的问题。例如,我想知道一个人在那之前居住过多少个独特的位置。
created<- c(2009,2010,2010,2011, 2012, 2011)
person <- c(A, A, A, A, B, B)
location<- c('London','Geneva', 'London', 'New York', 'London', 'London')
df <- data.frame (created, person, location)
我想创建一个名为 unique 的变量,考虑到他在那个时间点之前住过多少个不同的地方。我尝试了以下内容。有什么建议吗?
library(dplyr)
df %>% group_by(person, location) %>% arrange(Created,.by_group = TRUE) %>% mutate (unique=distinct (location))
unique <- c(1, 2, 2, 3,1,1)
一种方法是使用 cumsum
和 duplicated
library(dplyr)
df %>% group_by(person) %>% mutate(unique = cumsum(!duplicated(location)))
# created person location unique
# <dbl> <fct> <fct> <int>
#1 2009 A London 1
#2 2010 A Geneva 2
#3 2010 A London 2
#4 2011 A New York 3
#5 2012 B London 1
#6 2011 B London 1
我们可以使用cummax
library(dplyr)
df %>%
group_by(person) %>%
mutate(unique = cummax(match(location, unique(location))))
# A tibble: 6 x 4
# Groups: person [2]
# created person location unique
# <dbl> <fct> <fct> <int>
#1 2009 A London 1
#2 2010 A Geneva 2
#3 2010 A London 2
#4 2011 A New York 3
#5 2012 B London 1
#6 2011 B London 1
或 base R
df$unique <- with(df, ave(location, person, FUN =
function(x) cummax(match(x, unique(x)))))
数据
df <- structure(list(created = c(2009, 2010, 2010, 2011, 2012, 2011
), person = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), location = structure(c(2L, 1L, 2L, 3L,
2L, 2L), .Label = c("Geneva", "London", "New York"), class = "factor")),
class = "data.frame", row.names = c(NA,
-6L))
我有一个关于如何计算特定时间点之前的唯一值的问题。例如,我想知道一个人在那之前居住过多少个独特的位置。
created<- c(2009,2010,2010,2011, 2012, 2011)
person <- c(A, A, A, A, B, B)
location<- c('London','Geneva', 'London', 'New York', 'London', 'London')
df <- data.frame (created, person, location)
我想创建一个名为 unique 的变量,考虑到他在那个时间点之前住过多少个不同的地方。我尝试了以下内容。有什么建议吗?
library(dplyr)
df %>% group_by(person, location) %>% arrange(Created,.by_group = TRUE) %>% mutate (unique=distinct (location))
unique <- c(1, 2, 2, 3,1,1)
一种方法是使用 cumsum
和 duplicated
library(dplyr)
df %>% group_by(person) %>% mutate(unique = cumsum(!duplicated(location)))
# created person location unique
# <dbl> <fct> <fct> <int>
#1 2009 A London 1
#2 2010 A Geneva 2
#3 2010 A London 2
#4 2011 A New York 3
#5 2012 B London 1
#6 2011 B London 1
我们可以使用cummax
library(dplyr)
df %>%
group_by(person) %>%
mutate(unique = cummax(match(location, unique(location))))
# A tibble: 6 x 4
# Groups: person [2]
# created person location unique
# <dbl> <fct> <fct> <int>
#1 2009 A London 1
#2 2010 A Geneva 2
#3 2010 A London 2
#4 2011 A New York 3
#5 2012 B London 1
#6 2011 B London 1
或 base R
df$unique <- with(df, ave(location, person, FUN =
function(x) cummax(match(x, unique(x)))))
数据
df <- structure(list(created = c(2009, 2010, 2010, 2011, 2012, 2011
), person = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), location = structure(c(2L, 1L, 2L, 3L,
2L, 2L), .Label = c("Geneva", "London", "New York"), class = "factor")),
class = "data.frame", row.names = c(NA,
-6L))