如何使用嵌套数据框整理数据?
How to tidy data with a nested data frame?
我想要整理嵌套数据框,但遇到了一些困难。我可以使用一种情况很好地重塑数据,但我希望逐个遍历整个数据帧。
我的数据是这样的:
df <- tibble(
case = c("a","a","b","b","c","c"),
year = c(1990,2000,1990,2000,1990,2000),
var1 = round(runif(6,0,1), 2),
var2 = round(runif(6,10,20), 2)
)
我可以使用 tidyr 执行我想执行的任务,只有一个案例
df %>%
filter( case == "a") %>%
gather(var, value, -c(1:2)) %>%
spread(year, value)
输出:
# case var `1990` `2000`
# <chr> <chr> <dbl> <dbl>
# 1 a var1 0.850 0.540
# 2 a var2 14.4 16.7
我如何使用 purrr 或其他函数式编程工具来矢量化此操作并对我的所有案例执行相同的操作并将它们绑定到一个数据框中? "nest" 和 "map"?
的某种组合
谢谢!
不收集 case
列。
set.seed(1234)
df <- tibble(
case = c("a","a","b","b","c","c"),
year = c(1990,2000,1990,2000,1990,2000),
var1 = round(runif(6,0,1), 2),
var2 = round(runif(6,10,20), 2)
)
library(tidyverse)
df %>%
gather(var, value, -c(1:2)) %>%
spread(year, value)
# # A tibble: 6 x 4
# case var `1990` `2000`
# <chr> <chr> <dbl> <dbl>
# 1 a var1 0.110 0.620
# 2 a var2 10.1 12.3
# 3 b var1 0.610 0.620
# 4 b var2 16.7 15.1
# 5 c var1 0.860 0.640
# 6 c var2 16.9 15.4
另一个选项可能是使用 gather
来自 'reshape2package. But 1st it we need to gather
var1and var2
列的 dcast
。
library(tidyverse)
library(reshape2)
set.seed(1234)
df <- tibble(
case = c("a","a","b","b","c","c"),
year = c(1990,2000,1990,2000,1990,2000),
var1 = round(runif(6,0,1), 2),
var2 = round(runif(6,10,20), 2)
)
# User gather to combine var1 and var2 and then apply dcast
gather(df, var, val, var1:var2) %>% dcast(case+var ~ year, value.var = "val")
# Result
# case var 1990 2000
# 1 a var1 0.11 0.62
# 2 a var2 10.09 12.33
# 3 b var1 0.61 0.62
# 4 b var2 16.66 15.14
# 5 c var1 0.86 0.64
# 6 c var2 16.94 15.45
我想要整理嵌套数据框,但遇到了一些困难。我可以使用一种情况很好地重塑数据,但我希望逐个遍历整个数据帧。
我的数据是这样的:
df <- tibble(
case = c("a","a","b","b","c","c"),
year = c(1990,2000,1990,2000,1990,2000),
var1 = round(runif(6,0,1), 2),
var2 = round(runif(6,10,20), 2)
)
我可以使用 tidyr 执行我想执行的任务,只有一个案例
df %>%
filter( case == "a") %>%
gather(var, value, -c(1:2)) %>%
spread(year, value)
输出:
# case var `1990` `2000`
# <chr> <chr> <dbl> <dbl>
# 1 a var1 0.850 0.540
# 2 a var2 14.4 16.7
我如何使用 purrr 或其他函数式编程工具来矢量化此操作并对我的所有案例执行相同的操作并将它们绑定到一个数据框中? "nest" 和 "map"?
的某种组合谢谢!
不收集 case
列。
set.seed(1234)
df <- tibble(
case = c("a","a","b","b","c","c"),
year = c(1990,2000,1990,2000,1990,2000),
var1 = round(runif(6,0,1), 2),
var2 = round(runif(6,10,20), 2)
)
library(tidyverse)
df %>%
gather(var, value, -c(1:2)) %>%
spread(year, value)
# # A tibble: 6 x 4
# case var `1990` `2000`
# <chr> <chr> <dbl> <dbl>
# 1 a var1 0.110 0.620
# 2 a var2 10.1 12.3
# 3 b var1 0.610 0.620
# 4 b var2 16.7 15.1
# 5 c var1 0.860 0.640
# 6 c var2 16.9 15.4
另一个选项可能是使用 gather
来自 'reshape2package. But 1st it we need to gather
var1and var2
列的 dcast
。
library(tidyverse)
library(reshape2)
set.seed(1234)
df <- tibble(
case = c("a","a","b","b","c","c"),
year = c(1990,2000,1990,2000,1990,2000),
var1 = round(runif(6,0,1), 2),
var2 = round(runif(6,10,20), 2)
)
# User gather to combine var1 and var2 and then apply dcast
gather(df, var, val, var1:var2) %>% dcast(case+var ~ year, value.var = "val")
# Result
# case var 1990 2000
# 1 a var1 0.11 0.62
# 2 a var2 10.09 12.33
# 3 b var1 0.61 0.62
# 4 b var2 16.66 15.14
# 5 c var1 0.86 0.64
# 6 c var2 16.94 15.45