循环从多个 URL 创建一个数据框
Loop to create one dataframe from multiple URLs
我有一个包含多个 URL 的字符向量,每个 URL 都包含特定年份的犯罪数据的 csv。有没有一种简单的方法可以创建一个循环 read.csv 并绑定所有数据帧,而不必 运行 read.csv 8 次?网址向量如下
urls <- c('https://opendata.arcgis.com/datasets/73cd2f2858714cd1a7e2859f8e6e4de4_33.csv',
'https://opendata.arcgis.com/datasets/fdacfbdda7654e06a161352247d3a2f0_34.csv',
'https://opendata.arcgis.com/datasets/9d5485ffae914c5f97047a7dd86e115b_35.csv',
'https://opendata.arcgis.com/datasets/010ac88c55b1409bb67c9270c8fc18b5_11.csv',
'https://opendata.arcgis.com/datasets/5fa2e43557f7484d89aac9e1e76158c9_10.csv',
'https://opendata.arcgis.com/datasets/6eaf3e9713de44d3aa103622d51053b5_9.csv',
'https://opendata.arcgis.com/datasets/35034fcb3b36499c84c94c069ab1a966_27.csv',
'https://opendata.arcgis.com/datasets/bda20763840448b58f8383bae800a843_26.csv'
)
purrr
包中的函数 map_dfr
完全可以满足您的需求。它将函数应用于输入的每个元素(在本例中为 urls
)并按行将结果绑定在一起。
library(tidyverse)
map_dfr(urls, read_csv)
出于个人喜好,我使用 read_csv()
而不是 read.csv()
,但两者都可以。
我通常采用这种方法,因为我想单独保存所有 csv 文件,以防以后我需要对每个文件进行进一步分析。否则,您不需要 for 循环。
for (i in 1:length(urls)) assign(paste0("mycsv-",i), read.csv(url(urls[i]), header = T))
df.list <- mget(ls(pattern = "mycsv-*"))
#use plyr if different column names and need to know which row comes from which csv file
library(plyr)
df <- ldply(df.list) #you can remove first column if you wish
#Alternative solution in base R instead of using plyr
#if they have same column names and you only want rbind then you can do this:
df <- do.call("rbind", df.list)
在基数 R 中:
result <- lapply(urls, read.csv, stringsAsFactors = FALSE)
result <- do.call(rbind, result)
我有一个包含多个 URL 的字符向量,每个 URL 都包含特定年份的犯罪数据的 csv。有没有一种简单的方法可以创建一个循环 read.csv 并绑定所有数据帧,而不必 运行 read.csv 8 次?网址向量如下
urls <- c('https://opendata.arcgis.com/datasets/73cd2f2858714cd1a7e2859f8e6e4de4_33.csv',
'https://opendata.arcgis.com/datasets/fdacfbdda7654e06a161352247d3a2f0_34.csv',
'https://opendata.arcgis.com/datasets/9d5485ffae914c5f97047a7dd86e115b_35.csv',
'https://opendata.arcgis.com/datasets/010ac88c55b1409bb67c9270c8fc18b5_11.csv',
'https://opendata.arcgis.com/datasets/5fa2e43557f7484d89aac9e1e76158c9_10.csv',
'https://opendata.arcgis.com/datasets/6eaf3e9713de44d3aa103622d51053b5_9.csv',
'https://opendata.arcgis.com/datasets/35034fcb3b36499c84c94c069ab1a966_27.csv',
'https://opendata.arcgis.com/datasets/bda20763840448b58f8383bae800a843_26.csv'
)
purrr
包中的函数 map_dfr
完全可以满足您的需求。它将函数应用于输入的每个元素(在本例中为 urls
)并按行将结果绑定在一起。
library(tidyverse)
map_dfr(urls, read_csv)
出于个人喜好,我使用 read_csv()
而不是 read.csv()
,但两者都可以。
我通常采用这种方法,因为我想单独保存所有 csv 文件,以防以后我需要对每个文件进行进一步分析。否则,您不需要 for 循环。
for (i in 1:length(urls)) assign(paste0("mycsv-",i), read.csv(url(urls[i]), header = T))
df.list <- mget(ls(pattern = "mycsv-*"))
#use plyr if different column names and need to know which row comes from which csv file
library(plyr)
df <- ldply(df.list) #you can remove first column if you wish
#Alternative solution in base R instead of using plyr
#if they have same column names and you only want rbind then you can do this:
df <- do.call("rbind", df.list)
在基数 R 中:
result <- lapply(urls, read.csv, stringsAsFactors = FALSE)
result <- do.call(rbind, result)