R Dataframe 重组和总结

R Dataframe Regroup and Summarize

我的数据框如下所示:

Year   Person   Office
2005   Peter    Boston 
2007   Peter    Boston
2008   Peter    Chicago 
2009   Peter    New York
2011   Peter    New York 
2003   Amy      Seattle 
2004   Amy      Boston 
2006   Amy      Chicago 
2007   Amy      Chicago

我想计算一个办公室人员级别的标准化度量(计数),它捕获一个人在来到当前办公室之前经历过的办公室数量。该度量通过到达当前位置之前的总年数进行归一化。下面是理想的输出。对于彼得来说,波士顿是他的第一个办公室,因此,他对波士顿的标准化测量计数为 0。对于彼得来说,芝加哥是他的第二个办公室,他在 2008-2005=3 年来到芝加哥办公室。因此,他对芝加哥的标准化测量计数是 1/3。

Office    Person  Count
Boston    Peter   0
Boston    Amy     1
Chicago   Peter   1/3
Chicago   Amy     2/3
New York  Peter   1/2
Seattle   Amy     0

你可以使用

library(dplyr)

df %>% 
  group_by(Person, Office) %>% 
  slice_min(Year) %>% 
  arrange(Year) %>% 
  add_count() %>% 
  group_by(Person) %>% 
  mutate(Count = if_else(cumsum(n) == 1, 0, (cumsum(n) - 1) / (Year - first(Year))),
         .keep = "unused") %>% 
  ungroup()

这个returns

# A tibble: 6 x 3
  Person Office   Count
  <chr>  <chr>    <dbl>
1 Amy    Seattle  0    
2 Amy    Boston   1    
3 Peter  Boston   0    
4 Amy    Chicago  0.667
5 Peter  Chicago  0.333
6 Peter  New_York 0.5 
library(tidyverse)

cities %>% 
  group_by(Person, Office) %>%
  filter(row_number() == 1) %>% 
  group_by(Person) %>% 
  mutate(x = row_number()-1, y = (Year - Year[1])) %>% 
  mutate(count = ifelse(is.nan(x / y), x, x/y))

#   Year Person Office         x     y  test
#  <int> <chr>  <chr>      <dbl> <int> <dbl>
# 1  2005 Peter  "Boston"       0     0 0    
# 2  2008 Peter  "Chicago"      1     3 0.333
# 3  2009 Peter  "New York"     2     4 0.5  
# 4  2003 Amy    "Seattle "     0     0 0    
# 5  2004 Amy    "Boston"       1     1 1    
# 6  2006 Amy    "Chicago"      2     3 0.667

如果您希望将计数表示为分数,我们可以使用包 pracma 中的辅助函数来减少分数

cities %>% 
  group_by(Person, Office) %>%
  filter(row_number() == 1) %>% 
  group_by(Person) %>% 
  mutate(x = row_number()-1, y = (Year - Year[1])) %>% 
  mutate(count = ifelse(is.nan(x / y), x, x/y)) %>% 
  mutate(frac = ifelse(x == 0,
                       0,
                       ifelse(x/y == 1, 1,
                              paste0(x / pracma::gcd(x,y), "/", y / pracma::gcd(x,y)))
                       )
  ) %>% 
  select(-x, -y)

#   Year Person Office     count frac 
#  <int> <chr>  <chr>      <dbl> <chr>
# 1  2005 Peter  "Boston"   0     0    
# 2  2008 Peter  "Chicago"  0.333 1/3  
# 3  2009 Peter  "New York" 0.5   1/2  
# 4  2003 Amy    "Seattle " 0     0    
# 5  2004 Amy    "Boston"   1     1    
# 6  2006 Amy    "Chicago"  0.667 2/3 

数据:

cities <- read.delim(text = "Year,Person,Office
2005,Peter,Boston
2007,Peter,Boston
2008,Peter,Chicago
2009,Peter,New York
2011,Peter,New York
2003,Amy,Seattle 
2004,Amy,Boston
2006,Amy,Chicago
2007,Amy,Chicago", sep = ",")