使用名称作为 R 中列的值的列表到数据框
List to dataframe using names as values for column in R
我有 88 个制表符分隔的文件需要导入到 R 中。
他们被命名为"Study-1-12"
- 研究:研究名称
- 1: 主题编号
- [1]2:实验日(1 或 2)
- 1[2]:试用(1 或 2)
每一个里面的数据看起来像
START: dd.mm.yyy hh:mm:ss
WAITING 3780 ms REACTION 1230 ms
WAITING 9700 ms REACTION 377 ms
WAITING 5538 ms REACTION 310 ms
WAITING 4599 ms REACTION 361 ms
WAITING 9579 ms REACTION 338 ms
END: dd.mm.yyy hh:mm:ss
到目前为止,我将它们全部导入到一个列表中并汇总了每一个,所以最终结果是一个 table 包含两列 "waiting" 和 "reaction" 都具有一个平均值值。
# Load filepaths and names
filepath <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = TRUE) # Load full path
filenames <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = FALSE) # load names of files
# load all files into list with named col headers
ldf <- lapply(filepath, function(x) read_tsv(file = x, skip = 1,
col_names = c("waiting", "valueW", "ms", "ws", "reaction", "valueR", "ms1")))
names(ldf) <- filenames # rename items in list
# select only relevant cols and do the math
ldf <- lapply(ldf, function(x) x %>%
select(waiting, valueW, reaction, valueR) %>%
filter(waiting == "WAITING") %>%
summarise(waiting = mean(valueW), reaction = mean(valueR))
)
现在我想做的是根据文件名创建一个包含列的数据框(如上:study-1-12):
- id: 前1
- 表达式:1 或 2
- 试用:1 或 2
- 等待:列表中每个数据框的值
- 反应:列表中每个数据框的值
在 R 中有什么方法可以做到这一点?
library(purrr)
library(stringi)
fils <- list.files("~/Data/so", full.names=TRUE)
fils
## [1] "/Some/path/to/data/studyA-1-12" "/Some/path/to/data/studyB-30-31"
map_df(fils, function(x) {
stri_match_all_regex(x, "([[:alnum:]]+)-([[:digit:]]+)-([[:digit:]])([[:digit:]])")[[1]] %>%
as.list() %>%
.[2:5] %>%
set_names(c("study_name", "subject_id", "experiment_day", "trial")) -> meta
readLines(x) %>%
grep("WAITING", ., value=TRUE) %>%
map(~scan(text=., quiet=TRUE,
what=list(character(), double(), character(),
character(), double(), character()))[c(2,5)]) %>%
map_df(~set_names(as.list(.), c("waiting", "reaction"))) -> df
df$study_name <- meta$study_name
df$subject_id <- meta$subject_id
df$experiment_day <- meta$experiment_day
df$trial <- meta$trial
df
})
## # A tibble: 10 × 6
## waiting reaction study_name subject_id experiment_day trial
## <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 3780 1230 studyA 1 1 2
## 2 9700 377 studyA 1 1 2
## 3 5538 310 studyA 1 1 2
## 4 4599 361 studyA 1 1 2
## 5 9579 338 studyA 1 1 2
## 6 3780 1230 studyB 30 3 1
## 7 9700 377 studyB 30 3 1
## 8 5538 310 studyB 30 3 1
## 9 4599 361 studyB 30 3 1
## 10 9579 338 studyB 30 3 1
我有 88 个制表符分隔的文件需要导入到 R 中。
他们被命名为"Study-1-12"
- 研究:研究名称
- 1: 主题编号
- [1]2:实验日(1 或 2)
- 1[2]:试用(1 或 2)
每一个里面的数据看起来像
START: dd.mm.yyy hh:mm:ss
WAITING 3780 ms REACTION 1230 ms
WAITING 9700 ms REACTION 377 ms
WAITING 5538 ms REACTION 310 ms
WAITING 4599 ms REACTION 361 ms
WAITING 9579 ms REACTION 338 ms
END: dd.mm.yyy hh:mm:ss
到目前为止,我将它们全部导入到一个列表中并汇总了每一个,所以最终结果是一个 table 包含两列 "waiting" 和 "reaction" 都具有一个平均值值。
# Load filepaths and names
filepath <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = TRUE) # Load full path
filenames <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = FALSE) # load names of files
# load all files into list with named col headers
ldf <- lapply(filepath, function(x) read_tsv(file = x, skip = 1,
col_names = c("waiting", "valueW", "ms", "ws", "reaction", "valueR", "ms1")))
names(ldf) <- filenames # rename items in list
# select only relevant cols and do the math
ldf <- lapply(ldf, function(x) x %>%
select(waiting, valueW, reaction, valueR) %>%
filter(waiting == "WAITING") %>%
summarise(waiting = mean(valueW), reaction = mean(valueR))
)
现在我想做的是根据文件名创建一个包含列的数据框(如上:study-1-12):
- id: 前1
- 表达式:1 或 2
- 试用:1 或 2
- 等待:列表中每个数据框的值
- 反应:列表中每个数据框的值
在 R 中有什么方法可以做到这一点?
library(purrr)
library(stringi)
fils <- list.files("~/Data/so", full.names=TRUE)
fils
## [1] "/Some/path/to/data/studyA-1-12" "/Some/path/to/data/studyB-30-31"
map_df(fils, function(x) {
stri_match_all_regex(x, "([[:alnum:]]+)-([[:digit:]]+)-([[:digit:]])([[:digit:]])")[[1]] %>%
as.list() %>%
.[2:5] %>%
set_names(c("study_name", "subject_id", "experiment_day", "trial")) -> meta
readLines(x) %>%
grep("WAITING", ., value=TRUE) %>%
map(~scan(text=., quiet=TRUE,
what=list(character(), double(), character(),
character(), double(), character()))[c(2,5)]) %>%
map_df(~set_names(as.list(.), c("waiting", "reaction"))) -> df
df$study_name <- meta$study_name
df$subject_id <- meta$subject_id
df$experiment_day <- meta$experiment_day
df$trial <- meta$trial
df
})
## # A tibble: 10 × 6
## waiting reaction study_name subject_id experiment_day trial
## <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 3780 1230 studyA 1 1 2
## 2 9700 377 studyA 1 1 2
## 3 5538 310 studyA 1 1 2
## 4 4599 361 studyA 1 1 2
## 5 9579 338 studyA 1 1 2
## 6 3780 1230 studyB 30 3 1
## 7 9700 377 studyB 30 3 1
## 8 5538 310 studyB 30 3 1
## 9 4599 361 studyB 30 3 1
## 10 9579 338 studyB 30 3 1