R使用for循环迭代R数据帧中的组

Question

需要这方面的帮助。我有一个包含 3 列的数据框。

date <- c(2001:2015)
countries <- c("Afghanistan", "Afghanistan", "Afghanistan","Afghanistan", "Afghanistan", "Algeria", "Algeria", "Algeria", "Algeria", "Algeria", "Albania", "Albania", "Albania", "Albania", "Albania") 
value<- c(1:15)
df <- data.frame(date,country,value)

我想对每个独特的国家/地区应用一个函数 prep_plot 并将输出一起添加到一个新的数据框中。我试过这样的for循环。

data <- data.frame()
for (country in unique(df$countries)){
 data1 <- prep_plot(country)
 data2 <- bind_rows(data, data1)
}

但输出 (data2) 只有阿尔巴尼亚的数据。

Answer 1

根据您的建议，下面的方法应该可行，甚至不需要 data.frame。


library(dplyr)
countries %>% lapply( pred_plot ) %>% bind_rows

或者更清楚，这样您就知道发生了什么：


## a list of all the pred_plot outputs:

l <- lapply( countries, pred_plot ) # this applies the function to each of the countries and returns a list with all the outputs

## combine these together with rbind:
data <- bind_rows( l )

这些语句可以如上所示链接，这种技术通常用于像这样的更简单的函数调用，其中链中一个 link 的输出成为下一个函数的第一个参数，这就是 %>% 的意义所在。

Answer 2

您可以使用 purrr 的 map_df :

result <- purrr::map_df(unique(df$countries), prep_plot)

或以 R 为基数：

result <- do.call(rbind, lapply(unique(df$countries), prep_plot))

Answer 3

But the output (data2) only have data for Albania.

您应该将 data2 <- bind_rows(data, data1) 替换为 data2 <- bind_rows(data2, data1)。也就是说，

data2 <- data.frame() # notice that this is changed as well
for (country in unique(df$countries)){
 data1 <- prep_plot(country)
 data2 <- bind_rows(data2, data1)
}

因为您目前正在做的事情类似于 data2 <- bind_rows(data.frame(), data1)，它只为您提供最后一个唯一国家/地区的数据（在您的情况下为 "Albania"）。

正如其他人所提到的，基于 lapply 和 rbind + do.call 或 bind_rows 的解决方案可能（快得多）（如 Ronak 的 ).

R使用for循环迭代R数据帧中的组

R iterate over group in R dataframe with for-loop

for-loop

r

purrr

tidyverse