R Dataframe 重组和总结
R Dataframe Regroup and Summarize
我的数据框如下所示:
Year Person Office
2005 Peter Boston
2007 Peter Boston
2008 Peter Chicago
2009 Peter New York
2011 Peter New York
2003 Amy Seattle
2004 Amy Boston
2006 Amy Chicago
2007 Amy Chicago
我想计算一个办公室人员级别的标准化度量(计数),它捕获一个人在来到当前办公室之前经历过的办公室数量。该度量通过到达当前位置之前的总年数进行归一化。下面是理想的输出。对于彼得来说,波士顿是他的第一个办公室,因此,他对波士顿的标准化测量计数为 0。对于彼得来说,芝加哥是他的第二个办公室,他在 2008-2005=3 年来到芝加哥办公室。因此,他对芝加哥的标准化测量计数是 1/3。
Office Person Count
Boston Peter 0
Boston Amy 1
Chicago Peter 1/3
Chicago Amy 2/3
New York Peter 1/2
Seattle Amy 0
你可以使用
library(dplyr)
df %>%
group_by(Person, Office) %>%
slice_min(Year) %>%
arrange(Year) %>%
add_count() %>%
group_by(Person) %>%
mutate(Count = if_else(cumsum(n) == 1, 0, (cumsum(n) - 1) / (Year - first(Year))),
.keep = "unused") %>%
ungroup()
这个returns
# A tibble: 6 x 3
Person Office Count
<chr> <chr> <dbl>
1 Amy Seattle 0
2 Amy Boston 1
3 Peter Boston 0
4 Amy Chicago 0.667
5 Peter Chicago 0.333
6 Peter New_York 0.5
library(tidyverse)
cities %>%
group_by(Person, Office) %>%
filter(row_number() == 1) %>%
group_by(Person) %>%
mutate(x = row_number()-1, y = (Year - Year[1])) %>%
mutate(count = ifelse(is.nan(x / y), x, x/y))
# Year Person Office x y test
# <int> <chr> <chr> <dbl> <int> <dbl>
# 1 2005 Peter "Boston" 0 0 0
# 2 2008 Peter "Chicago" 1 3 0.333
# 3 2009 Peter "New York" 2 4 0.5
# 4 2003 Amy "Seattle " 0 0 0
# 5 2004 Amy "Boston" 1 1 1
# 6 2006 Amy "Chicago" 2 3 0.667
如果您希望将计数表示为分数,我们可以使用包 pracma
中的辅助函数来减少分数
cities %>%
group_by(Person, Office) %>%
filter(row_number() == 1) %>%
group_by(Person) %>%
mutate(x = row_number()-1, y = (Year - Year[1])) %>%
mutate(count = ifelse(is.nan(x / y), x, x/y)) %>%
mutate(frac = ifelse(x == 0,
0,
ifelse(x/y == 1, 1,
paste0(x / pracma::gcd(x,y), "/", y / pracma::gcd(x,y)))
)
) %>%
select(-x, -y)
# Year Person Office count frac
# <int> <chr> <chr> <dbl> <chr>
# 1 2005 Peter "Boston" 0 0
# 2 2008 Peter "Chicago" 0.333 1/3
# 3 2009 Peter "New York" 0.5 1/2
# 4 2003 Amy "Seattle " 0 0
# 5 2004 Amy "Boston" 1 1
# 6 2006 Amy "Chicago" 0.667 2/3
数据:
cities <- read.delim(text = "Year,Person,Office
2005,Peter,Boston
2007,Peter,Boston
2008,Peter,Chicago
2009,Peter,New York
2011,Peter,New York
2003,Amy,Seattle
2004,Amy,Boston
2006,Amy,Chicago
2007,Amy,Chicago", sep = ",")
我的数据框如下所示:
Year Person Office
2005 Peter Boston
2007 Peter Boston
2008 Peter Chicago
2009 Peter New York
2011 Peter New York
2003 Amy Seattle
2004 Amy Boston
2006 Amy Chicago
2007 Amy Chicago
我想计算一个办公室人员级别的标准化度量(计数),它捕获一个人在来到当前办公室之前经历过的办公室数量。该度量通过到达当前位置之前的总年数进行归一化。下面是理想的输出。对于彼得来说,波士顿是他的第一个办公室,因此,他对波士顿的标准化测量计数为 0。对于彼得来说,芝加哥是他的第二个办公室,他在 2008-2005=3 年来到芝加哥办公室。因此,他对芝加哥的标准化测量计数是 1/3。
Office Person Count
Boston Peter 0
Boston Amy 1
Chicago Peter 1/3
Chicago Amy 2/3
New York Peter 1/2
Seattle Amy 0
你可以使用
library(dplyr)
df %>%
group_by(Person, Office) %>%
slice_min(Year) %>%
arrange(Year) %>%
add_count() %>%
group_by(Person) %>%
mutate(Count = if_else(cumsum(n) == 1, 0, (cumsum(n) - 1) / (Year - first(Year))),
.keep = "unused") %>%
ungroup()
这个returns
# A tibble: 6 x 3
Person Office Count
<chr> <chr> <dbl>
1 Amy Seattle 0
2 Amy Boston 1
3 Peter Boston 0
4 Amy Chicago 0.667
5 Peter Chicago 0.333
6 Peter New_York 0.5
library(tidyverse)
cities %>%
group_by(Person, Office) %>%
filter(row_number() == 1) %>%
group_by(Person) %>%
mutate(x = row_number()-1, y = (Year - Year[1])) %>%
mutate(count = ifelse(is.nan(x / y), x, x/y))
# Year Person Office x y test
# <int> <chr> <chr> <dbl> <int> <dbl>
# 1 2005 Peter "Boston" 0 0 0
# 2 2008 Peter "Chicago" 1 3 0.333
# 3 2009 Peter "New York" 2 4 0.5
# 4 2003 Amy "Seattle " 0 0 0
# 5 2004 Amy "Boston" 1 1 1
# 6 2006 Amy "Chicago" 2 3 0.667
如果您希望将计数表示为分数,我们可以使用包 pracma
中的辅助函数来减少分数
cities %>%
group_by(Person, Office) %>%
filter(row_number() == 1) %>%
group_by(Person) %>%
mutate(x = row_number()-1, y = (Year - Year[1])) %>%
mutate(count = ifelse(is.nan(x / y), x, x/y)) %>%
mutate(frac = ifelse(x == 0,
0,
ifelse(x/y == 1, 1,
paste0(x / pracma::gcd(x,y), "/", y / pracma::gcd(x,y)))
)
) %>%
select(-x, -y)
# Year Person Office count frac
# <int> <chr> <chr> <dbl> <chr>
# 1 2005 Peter "Boston" 0 0
# 2 2008 Peter "Chicago" 0.333 1/3
# 3 2009 Peter "New York" 0.5 1/2
# 4 2003 Amy "Seattle " 0 0
# 5 2004 Amy "Boston" 1 1
# 6 2006 Amy "Chicago" 0.667 2/3
数据:
cities <- read.delim(text = "Year,Person,Office
2005,Peter,Boston
2007,Peter,Boston
2008,Peter,Chicago
2009,Peter,New York
2011,Peter,New York
2003,Amy,Seattle
2004,Amy,Boston
2006,Amy,Chicago
2007,Amy,Chicago", sep = ",")