将格式更改应用于列表中的所有表格

Apply formatting changes to all tables in a list

我正在尝试更改列表中多个数据框的格式。我以前在格式化单个数据帧时使用管道,但我不知道这是否仍然是重复该过程的最有效方式。使用管道我不知道如何让脚本调用数据框中的列名而不是列表中数据框的名称。我已经包含了我目前拥有的所有脚本以及我正在尝试制作的脚本。

我当前的脚本:


    library(dataRetrieval)

#create data frame to attach names to codes

    df = data.frame(
          siteNumber = c(   "383652091125002",  "383648091124501",  "383648091124502", "383631091124801", "383631091124802",    "383631091124803",  "383631091124804", "383640091130701",   "383640091130702",  "383621091130701",  "383621091130703",  "383621091130702",  "383624091130501",  "383624091130502",  "383616091130801",  "383616091130802","383644091131601",    "383627091130201",  "383622091130604",  "383622091130605",  "383506091132201",  "383508091132002",  "383508091132004",  "383519091133701",  "383557091132001",  "383614091132801"),
          siteName = c( "BW-00A",   "BW-01",    "BW-01A", "BW-04D","BW-04S",    "BW-04A-D", "BW-04A-S", "BW-08",    "BW-08A",   "BW-11",    "BW-11A-D", "BW-11A-S", "BW-13",    "BW-13A",   "BW-14",    "BW-14A", "BW4-15", "BW4-16",   "BW4-17",   "BW4-18",   "MW-04",    "MW-04A",   "MW-04B",   "MW-11",    "W3",   "W4")
        )

#function paramaters
    parameterCode = c("34475", "34485","39180", "77093")
    parameterName = c("Tetrachloroethene", "Trichloroethene","2Trichloroethene", "cis-1,2-Dichloroethene")
    startDate = "2019-01-01"
    endDate = "2020-12-15"

#create data tables and assign site names instead of site numbers
    results <- lapply(df$siteNumber, readNWISqw, parameterCode, startDate, endDate)
    names(results) <- df$siteName

这段代码创建了一个包含多个数据框的“大列表”。我一直在尝试在下面的脚本中调用列表(名为 results),但它只调用列表中数据框的名称,而不是数据框本身中的信息,如 results$'insertTableName'会。

我正在尝试格式化脚本:

  1. 复制 site_no 和 parm_cd 列并重命名它们。 我不确定如何对列表中的所有数据框进行这项工作。我需要 for loop 吗?
    dataTable$site_nm = dataTable$site_no
    dataTable$parm_nm = dataTable$parm_cd 
  1. 联合 result_va 和 remark_cd 列
  2. select 要显示的特定列
  3. 将重复列中的数字代码替换为名称代码
  4. 统一参数名称和数字列
  5. 调整更宽的参数

我知道要调用特定的表我需要 results$'insertTableName' 但不知道如何一次调用所有表。我需要使用 for loop 吗?

results %>%
  unite(result_va,remark_cd,result_va, sep = "", na.rm = TRUE) %>%
  select(site_no, sample_dt, sample_tm, parm_cd,result_va) %>%
  within(parm_cd <- factor(parm_cd, levels = parameterCode, labels = parameterName)) %>%
  within(site_no <- factor(site_no, levels = siteNumber, labels = siteName)) %>%
  unite(parm_nm, parm_nm, parm_cd, sep = " - ", na.rm = TRUE)
  pivot_wider(names_from = parm_cd, values_from = result_va, values_fn = NULL)

如果我只有一个 table/df,此代码将有效,但当我尝试在数据帧列表上重复它时,我陷入了谷歌搜索的兔子洞。希望这一切都有意义,如果有更好的方法,请告诉我。我真的不太了解编码。谢谢!

我无法使最后一个 pivot_wider() 正常工作,但实际上您可以将管道插入 lapply() 语句中,如下所示。我还使用 df$ 访问 siteNumbersiteName:

res <- lapply(results, function(x){if(nrow(x) > 0){
  x %>%
  mutate(parm_nm = parm_cd) %>%
  unite(result_va,remark_cd,result_va, sep = "", na.rm = TRUE) %>%
  select(site_no, sample_dt, sample_tm, parm_cd,result_va, parm_nm) %>%
  mutate(parm_cd = factor(parm_cd, levels = parameterCode, labels = parameterName)) %>%
  mutate(site_no = factor(site_no, levels = df$siteNumber, labels = df$siteName)) %>%
  unite(parm_nm, parm_nm, parm_cd, sep = " - ", na.rm = TRUE)# %>%
  #pivot_wider(names_from = parm_cd, values_from = result_va, values_fn = NULL)
  }})

您还可以构建管道函数,然后使用 map():

pipeline <- function(x){
  if(nrow(x) > 0){
    x %>%
    mutate(parm_nm = parm_cd) %>%
    unite(result_va,remark_cd,result_va, sep = "", na.rm = TRUE) %>%
    select(site_no, sample_dt, sample_tm, parm_cd,result_va, parm_nm) %>%
    mutate(parm_cd = factor(parm_cd, levels = parameterCode, labels = parameterName)) %>%
    mutate(site_no = factor(site_no, levels = df$siteNumber, labels = df$siteName)) %>%
    unite(parm_nm, parm_nm, parm_cd, sep = " - ", na.rm = TRUE)
  }
}

results %>% map(pipeline)

请注意,您需要预先 if(nrow(x)>0) 来阻止管道尝试对列表中没有数据的元素执行。

如果您将 results 视为数据框的列表列,则可以使用 tidy 操作来过滤和映射管道链中的所有内容:

new_results <- 
  tibble(dfs = results) %>%
  filter(map_lgl(dfs, ~nrow(.) > 0)) %>%
  mutate(new_dfs = map(dfs, function(df) {
    df %>%
      mutate(parm_nm = parm_cd) %>%
      unite(result_va,remark_cd,result_va, sep = "", na.rm = TRUE) %>%
      select(site_no, sample_dt, sample_tm, parm_cd,result_va, parm_nm) %>%
      mutate(parm_cd = factor(parm_cd, levels = parameterCode, labels = parameterName)) %>%
      mutate(site_no = factor(site_no, levels = df$siteNumber, labels = df$siteName)) %>%
      unite(parm_nm, parm_nm, parm_cd, sep = " - ", na.rm = TRUE) %>%
      pivot_wider(names_from = parm_nm, values_from = result_va, values_fn = NULL)
  }))

示例结果:

new_results$new_dfs[[1]]

# A tibble: 1 x 6
  site_no sample_dt  sample_tm `34475 - Tetrachloro… `39180 - 2Trichloro… `77093 - cis-1,2-Dich…
  <fct>   <date>     <chr>     <chr>                 <chr>                <chr>                 
1 NA      2019-09-09 14:00     6.1                   <1                   <1                    

注意:您的pivot_wider似乎有错别字,unite步骤后没有parm_cd了。我将 names_from 参数更改为 'parm_nm'