使用名称作为 R 中列的值的列表到数据框

Question

我有 88 个制表符分隔的文件需要导入到 R 中。

他们被命名为"Study-1-12"

研究：研究名称
1: 主题编号
[1]2：实验日（1 或 2）
1[2]：试用（1 或 2）

每一个里面的数据看起来像

START: dd.mm.yyy hh:mm:ss

WAITING 3780    ms      REACTION    1230  ms

WAITING 9700    ms      REACTION    377 ms


WAITING 5538    ms      REACTION    310 ms

WAITING 4599    ms      REACTION    361 ms

WAITING 9579    ms      REACTION    338 ms
END: dd.mm.yyy hh:mm:ss

到目前为止，我将它们全部导入到一个列表中并汇总了每一个，所以最终结果是一个 table 包含两列 "waiting" 和 "reaction" 都具有一个平均值值。

# Load filepaths and names
filepath <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = TRUE) # Load full path
filenames <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = FALSE) # load names of files

# load all files into list with named col headers
ldf <- lapply(filepath, function(x) read_tsv(file = x, skip = 1,
              col_names = c("waiting", "valueW", "ms", "ws", "reaction", "valueR", "ms1")))

names(ldf) <- filenames # rename items in list

# select only relevant cols and do the math
ldf <- lapply(ldf, function(x) x %>% 
                select(waiting, valueW, reaction, valueR) %>%
                filter(waiting == "WAITING") %>%
                summarise(waiting = mean(valueW), reaction = mean(valueR))
              )

现在我想做的是根据文件名创建一个包含列的数据框（如上：study-1-12）：

id: 前1
表达式：1 或 2
试用：1 或 2
等待：列表中每个数据框的值
反应：列表中每个数据框的值

在 R 中有什么方法可以做到这一点？

Answer 1

library(purrr)
library(stringi)

fils <- list.files("~/Data/so", full.names=TRUE)

fils
## [1] "/Some/path/to/data/studyA-1-12"  "/Some/path/to/data/studyB-30-31"

map_df(fils, function(x) {

  stri_match_all_regex(x, "([[:alnum:]]+)-([[:digit:]]+)-([[:digit:]])([[:digit:]])")[[1]] %>%
    as.list() %>%
    .[2:5] %>%
    set_names(c("study_name", "subject_id", "experiment_day", "trial")) -> meta

  readLines(x) %>%
    grep("WAITING", ., value=TRUE) %>%
    map(~scan(text=., quiet=TRUE,
              what=list(character(), double(), character(),
                                character(), double(), character()))[c(2,5)]) %>%
    map_df(~set_names(as.list(.), c("waiting", "reaction"))) -> df

  df$study_name <- meta$study_name
  df$subject_id <- meta$subject_id
  df$experiment_day <- meta$experiment_day
  df$trial <- meta$trial

  df

})
## # A tibble: 10 × 6
##    waiting reaction study_name subject_id experiment_day trial
##      <dbl>    <dbl>      <chr>      <chr>          <chr> <chr>
## 1     3780     1230     studyA          1              1     2
## 2     9700      377     studyA          1              1     2
## 3     5538      310     studyA          1              1     2
## 4     4599      361     studyA          1              1     2
## 5     9579      338     studyA          1              1     2
## 6     3780     1230     studyB         30              3     1
## 7     9700      377     studyB         30              3     1
## 8     5538      310     studyB         30              3     1
## 9     4599      361     studyB         30              3     1
## 10    9579      338     studyB         30              3     1

使用名称作为 R 中列的值的列表到数据框

List to dataframe using names as values for column in R

r

dplyr

tidyr

data-cleaning