基于另一列在数据框中创建多列

Question

我想更新数据框以添加 10 列，其值基于另一列

从此开始

df <- data.frame(ID = 1:3, name = c("Bob", "Jim", "Fred"), endValue= c(3, 7, 4))

并以此结尾

|ID|Name|endValue|A|B|C|D|E|F|G|H|I|J|
|1|Bob|3|Y|Y|Y|N|N|N|N|N|N|N|
|2|Jim|7|Y|Y|Y|Y|Y|Y|Y|N|N|N|
|3|Fred|4|Y|Y|Y|Y|N|N|N|N|N|N|

所以每条新记录需要：

原内容
十个新专栏
基于endValue是否小于或等于第no列的条件值

欢迎任何帮助...

Answer 1

一个选项是创建一个 list 列，方法是 rep 为 'endValue' 列的每个值连接 'Y'、'N' 与10 然后 unnest 它变宽

library(dplyr)
library(purrr)
library(tidyr)
df %>%
   mutate(new =  map(endValue, ~ rep(c("Y", "N"), c(.x, 10 - .x)))) %>% 
   unnest_wider(new) %>%
   rename_at(vars(starts_with('..')), ~ LETTERS[1:10])

-输出

# A tibble: 3 x 13
#     ID name  endValue A     B     C     D     E     F     G     H     I     J    
#  <int> <chr>    <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1     1 Bob          3 Y     Y     Y     N     N     N     N     N     N     N    
#2     2 Jim          7 Y     Y     Y     Y     Y     Y     Y     N     N     N    
#3     3 Fred         4 Y     Y     Y     Y     N     N     N     N     N     N

或使用separate

library(stringr)
df %>%
  mutate(new = str_c(strrep('Y', endValue),
         strrep('N', 10 - endValue))) %>% 
  separate(new, into = LETTERS[1:10], sep="(?<=[A-Z])(?=[A-Z])")

Answer 2

还使用 tidyverse 家庭和 pmap & pivot_wider

library(dplyr)
library(purrr)
library(tidyr)

df %>%
  # map each row to a defined function using pmap
  # this function create a long table with 10 rows per ID, name, endValue
  pmap(.f = function(...) { 
    x <- tibble(...)
    y <- tibble(
      ID = x$ID, name = x$name,
      endValue = x$endValue,
      LETTER = LETTERS[1:10],
      value = c(rep("Y", x$endValue), rep("N", 10 - x$endValue)))
  }) %>%
  # combine all the df together
  bind_rows() %>%
  # then pivot_wider to get final result
  pivot_wider(names_from = LETTER, values_from = value)
#> # A tibble: 3 x 13
#>      ID name  endValue A     B     C     D     E     F     G     H     I    
#>   <int> <chr>    <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1     1 Bob          3 Y     Y     Y     N     N     N     N     N     N    
#> 2     2 Jim          7 Y     Y     Y     Y     Y     Y     Y     N     N    
#> 3     3 Fred         4 Y     Y     Y     Y     N     N     N     N     N    
#> # … with 1 more variable: J <chr>

^{由 reprex package (v2.0.0)}

于 2021-04-27 创建

基于另一列在数据框中创建多列

Create multiple columns in data frame based on another column

r

dplyr

purrr

tidyverse