如何删除 R 中一些名称为日期的变量

How to remove some variables in R with names that are dates

我在 R 中有一个数据集,其中一些变量名称是日期,请参阅下面输入数据的简化示例(在 Excel 中):

我想对这些数据做的是删除一些名称是早于或等于某个日期的列,例如2019-01-31。请参阅下面所需输出数据的简化示例(在 Excel 中):

现在,我可以通过转置数据、过滤掉日期小于或等于 2019 年 1 月 31 日的行并最终转置数据来实现这一点。但是我想知道是否有不同的方法可以只使用列名而不来回旋转来做到这一点?

# Example data to copy and paste into R for easy reproduction of problem:

df <- data.frame (id = c("apples",  "pears",    "grapes",   "tomatoes", "carrots",  "cucumber", "rabbit",   "cat",  "dog"),
type    = c("fruit",    "fruit",    "fruit",    "veggies",  "veggies",  "veggies",  "pets", "pets", "pets"),
color   = c("red",  "green",    "purple",   "red",  "orange",   "green",    "grey", "black",    "brown"),
'2019-04-30'    = c(353,    91, 270,    2029,   107,    62, 30, 61, 137),
'2019-03-31'    = c(349,    90, 267,    2028,   104,    60, 29, 59, 133),
'2019-02-28'    = c(345,    89, 264,    2027,   101,    58, 28, 57, 129),
'2019-01-31'    = c(341,    88, 261,    2026,   98, 56, 27, 55, 125),
'2018-12-31'    = c(337,    87, 258,    2025,   95, 54, 26, 53, 121),
'2018-11-30'    = c(333,    86, 255,    2024,   92, 52, 25, 51, 117),
check.names = FALSE)

做法如下:

  • 提取列名称
  • 如果可能则转换为Date,如果日期不同则转换为NA
  • 创建布尔向量以过滤太旧的日期和非日期(即之前步骤中的 NAs)列

示例数据

## sample data frame
m <- matrix(1, 3, 10)
colnames(m) <- c("a", "b", as.character(seq.Date(as.Date("2021-1-1"), length.out = 8, by = "days")))
(d <- as.data.frame(m))
#   a b 2021-01-01 2021-01-02 2021-01-03 2021-01-04 2021-01-05 2021-01-06 2021-01-07 2021-01-08
# 1 1 1          1          1          1          1          1          1          1          1
# 2 1 1          1          1          1          1          1          1          1          1
# 3 1 1          1          1          1          1          1          1          1          1

过滤器

r <- vapply(names(d), as.Date, numeric(1), optional = TRUE)
d[, is.na(r) | r <= as.Date("2021-1-3")]
#   a b 2021-01-01 2021-01-02 2021-01-03
# 1 1 1          1          1          1
# 2 1 1          1          1          1
# 3 1 1          1          1          1

r <- vapply(names(df), as.Date, numeric(1), optional = TRUE)
df[, is.na(r) | r >= as.Date("2019-1-31")]
#         id    type  color 2019-04-30 2019-03-31 2019-02-28 2019-01-31
# 1   apples   fruit    red        353        349        345        341
# 2    pears   fruit  green         91         90         89         88
# 3   grapes   fruit purple        270        267        264        261
# 4 tomatoes veggies    red       2029       2028       2027       2026
# 5  carrots veggies orange        107        104        101         98
# 6 cucumber veggies  green         62         60         58         56
# 7   rabbit    pets   grey         30         29         28         27
# 8      cat    pets  black         61         59         57         55
# 9      dog    pets  brown        137        133        129        125

描述

可以将数据重新整形为长格式并根据日期列进行过滤。

数据

与示例中提供的数据相同

df <- data.frame (id = c("apples",  "pears",    "grapes",   "tomatoes", "carrots",  "cucumber", "rabbit",   "cat",  "dog"),
type    = c("fruit",    "fruit",    "fruit",    "veggies",  "veggies",  "veggies",  "pets", "pets", "pets"),
color   = c("red",  "green",    "purple",   "red",  "orange",   "green",    "grey", "black",    "brown"),
'2019-04-30'    = c(353,    91, 270,    2029,   107,    62, 30, 61, 137),
'2019-03-31'    = c(349,    90, 267,    2028,   104,    60, 29, 59, 133),
'2019-02-28'    = c(345,    89, 264,    2027,   101,    58, 28, 57, 129),
'2019-01-31'    = c(341,    88, 261,    2026,   98, 56, 27, 55, 125),
'2018-12-31'    = c(337,    87, 258,    2025,   95, 54, 26, 53, 121),
'2018-11-30'    = c(333,    86, 255,    2024,   92, 52, 25, 51, 117),
check.names = FALSE)

解决方案

library(dplyr)
library(tidyr)


df %>%
  tidyr::pivot_longer(cols = !c(id, type, color), names_to = 'date', values_to = 'value') %>%
  dplyr::mutate(date = as.Date(date, format = '%Y-%m-%d')) %>%
  dplyr::filter( date >= as.Date('2019-01-31')) %>%
  tidyr::pivot_wider(names_from = 'date', values_from = 'value')

期望的输出

  id       type    color  `2019-04-30` `2019-03-31` `2019-02-28` `2019-01-31`
  <chr>    <chr>   <chr>         <dbl>        <dbl>        <dbl>        <dbl>
1 apples   fruit   red             353          349          345          341
2 pears    fruit   green            91           90           89           88
3 grapes   fruit   purple          270          267          264          261
4 tomatoes veggies red            2029         2028         2027         2026
5 carrots  veggies orange          107          104          101           98
6 cucumber veggies green            62           60           58           56
7 rabbit   pets    grey             30           29           28           27
8 cat      pets    black            61           59           57           55
9 dog      pets    brown           137          133          129          125

我们可以在 base R 中执行此操作。您的日期采用 YYYY-MM-DD 格式很方便,这意味着它们将由 >=<= 运算符正确排序。我们还可以使用一个简单的正则表达式来保留任何非日期格式的列:

df[!grepl('\d{4}-\d{2}-\d{2}', colnames(df)) | colnames(df) >= '2019-02-28']

        id    type  color 2019-04-30 2019-03-31 2019-02-28
1   apples   fruit    red        353        349        345
2    pears   fruit  green         91         90         89
3   grapes   fruit purple        270        267        264
4 tomatoes veggies    red       2029       2028       2027
5  carrots veggies orange        107        104        101
6 cucumber veggies  green         62         60         58
7   rabbit    pets   grey         30         29         28
8      cat    pets  black         61         59         57
9      dog    pets  brown        137        133        129