如何删除 R 中一些名称为日期的变量
How to remove some variables in R with names that are dates
我在 R 中有一个数据集,其中一些变量名称是日期,请参阅下面输入数据的简化示例(在 Excel 中):
我想对这些数据做的是删除一些名称是早于或等于某个日期的列,例如2019-01-31。请参阅下面所需输出数据的简化示例(在 Excel 中):
现在,我可以通过转置数据、过滤掉日期小于或等于 2019 年 1 月 31 日的行并最终转置数据来实现这一点。但是我想知道是否有不同的方法可以只使用列名而不来回旋转来做到这一点?
# Example data to copy and paste into R for easy reproduction of problem:
df <- data.frame (id = c("apples", "pears", "grapes", "tomatoes", "carrots", "cucumber", "rabbit", "cat", "dog"),
type = c("fruit", "fruit", "fruit", "veggies", "veggies", "veggies", "pets", "pets", "pets"),
color = c("red", "green", "purple", "red", "orange", "green", "grey", "black", "brown"),
'2019-04-30' = c(353, 91, 270, 2029, 107, 62, 30, 61, 137),
'2019-03-31' = c(349, 90, 267, 2028, 104, 60, 29, 59, 133),
'2019-02-28' = c(345, 89, 264, 2027, 101, 58, 28, 57, 129),
'2019-01-31' = c(341, 88, 261, 2026, 98, 56, 27, 55, 125),
'2018-12-31' = c(337, 87, 258, 2025, 95, 54, 26, 53, 121),
'2018-11-30' = c(333, 86, 255, 2024, 92, 52, 25, 51, 117),
check.names = FALSE)
做法如下:
- 提取列名称
- 如果可能则转换为
Date
,如果日期不同则转换为NA
- 创建布尔向量以过滤太旧的日期和非日期(即之前步骤中的
NAs
)列
示例数据
## sample data frame
m <- matrix(1, 3, 10)
colnames(m) <- c("a", "b", as.character(seq.Date(as.Date("2021-1-1"), length.out = 8, by = "days")))
(d <- as.data.frame(m))
# a b 2021-01-01 2021-01-02 2021-01-03 2021-01-04 2021-01-05 2021-01-06 2021-01-07 2021-01-08
# 1 1 1 1 1 1 1 1 1 1 1
# 2 1 1 1 1 1 1 1 1 1 1
# 3 1 1 1 1 1 1 1 1 1 1
过滤器
r <- vapply(names(d), as.Date, numeric(1), optional = TRUE)
d[, is.na(r) | r <= as.Date("2021-1-3")]
# a b 2021-01-01 2021-01-02 2021-01-03
# 1 1 1 1 1 1
# 2 1 1 1 1 1
# 3 1 1 1 1 1
r <- vapply(names(df), as.Date, numeric(1), optional = TRUE)
df[, is.na(r) | r >= as.Date("2019-1-31")]
# id type color 2019-04-30 2019-03-31 2019-02-28 2019-01-31
# 1 apples fruit red 353 349 345 341
# 2 pears fruit green 91 90 89 88
# 3 grapes fruit purple 270 267 264 261
# 4 tomatoes veggies red 2029 2028 2027 2026
# 5 carrots veggies orange 107 104 101 98
# 6 cucumber veggies green 62 60 58 56
# 7 rabbit pets grey 30 29 28 27
# 8 cat pets black 61 59 57 55
# 9 dog pets brown 137 133 129 125
描述
可以将数据重新整形为长格式并根据日期列进行过滤。
数据
与示例中提供的数据相同
df <- data.frame (id = c("apples", "pears", "grapes", "tomatoes", "carrots", "cucumber", "rabbit", "cat", "dog"),
type = c("fruit", "fruit", "fruit", "veggies", "veggies", "veggies", "pets", "pets", "pets"),
color = c("red", "green", "purple", "red", "orange", "green", "grey", "black", "brown"),
'2019-04-30' = c(353, 91, 270, 2029, 107, 62, 30, 61, 137),
'2019-03-31' = c(349, 90, 267, 2028, 104, 60, 29, 59, 133),
'2019-02-28' = c(345, 89, 264, 2027, 101, 58, 28, 57, 129),
'2019-01-31' = c(341, 88, 261, 2026, 98, 56, 27, 55, 125),
'2018-12-31' = c(337, 87, 258, 2025, 95, 54, 26, 53, 121),
'2018-11-30' = c(333, 86, 255, 2024, 92, 52, 25, 51, 117),
check.names = FALSE)
解决方案
library(dplyr)
library(tidyr)
df %>%
tidyr::pivot_longer(cols = !c(id, type, color), names_to = 'date', values_to = 'value') %>%
dplyr::mutate(date = as.Date(date, format = '%Y-%m-%d')) %>%
dplyr::filter( date >= as.Date('2019-01-31')) %>%
tidyr::pivot_wider(names_from = 'date', values_from = 'value')
期望的输出
id type color `2019-04-30` `2019-03-31` `2019-02-28` `2019-01-31`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 apples fruit red 353 349 345 341
2 pears fruit green 91 90 89 88
3 grapes fruit purple 270 267 264 261
4 tomatoes veggies red 2029 2028 2027 2026
5 carrots veggies orange 107 104 101 98
6 cucumber veggies green 62 60 58 56
7 rabbit pets grey 30 29 28 27
8 cat pets black 61 59 57 55
9 dog pets brown 137 133 129 125
我们可以在 base R 中执行此操作。您的日期采用 YYYY-MM-DD 格式很方便,这意味着它们将由 >=
和 <=
运算符正确排序。我们还可以使用一个简单的正则表达式来保留任何非日期格式的列:
df[!grepl('\d{4}-\d{2}-\d{2}', colnames(df)) | colnames(df) >= '2019-02-28']
id type color 2019-04-30 2019-03-31 2019-02-28
1 apples fruit red 353 349 345
2 pears fruit green 91 90 89
3 grapes fruit purple 270 267 264
4 tomatoes veggies red 2029 2028 2027
5 carrots veggies orange 107 104 101
6 cucumber veggies green 62 60 58
7 rabbit pets grey 30 29 28
8 cat pets black 61 59 57
9 dog pets brown 137 133 129
我在 R 中有一个数据集,其中一些变量名称是日期,请参阅下面输入数据的简化示例(在 Excel 中):
我想对这些数据做的是删除一些名称是早于或等于某个日期的列,例如2019-01-31。请参阅下面所需输出数据的简化示例(在 Excel 中):
现在,我可以通过转置数据、过滤掉日期小于或等于 2019 年 1 月 31 日的行并最终转置数据来实现这一点。但是我想知道是否有不同的方法可以只使用列名而不来回旋转来做到这一点?
# Example data to copy and paste into R for easy reproduction of problem:
df <- data.frame (id = c("apples", "pears", "grapes", "tomatoes", "carrots", "cucumber", "rabbit", "cat", "dog"),
type = c("fruit", "fruit", "fruit", "veggies", "veggies", "veggies", "pets", "pets", "pets"),
color = c("red", "green", "purple", "red", "orange", "green", "grey", "black", "brown"),
'2019-04-30' = c(353, 91, 270, 2029, 107, 62, 30, 61, 137),
'2019-03-31' = c(349, 90, 267, 2028, 104, 60, 29, 59, 133),
'2019-02-28' = c(345, 89, 264, 2027, 101, 58, 28, 57, 129),
'2019-01-31' = c(341, 88, 261, 2026, 98, 56, 27, 55, 125),
'2018-12-31' = c(337, 87, 258, 2025, 95, 54, 26, 53, 121),
'2018-11-30' = c(333, 86, 255, 2024, 92, 52, 25, 51, 117),
check.names = FALSE)
做法如下:
- 提取列名称
- 如果可能则转换为
Date
,如果日期不同则转换为NA
- 创建布尔向量以过滤太旧的日期和非日期(即之前步骤中的
NAs
)列
示例数据
## sample data frame
m <- matrix(1, 3, 10)
colnames(m) <- c("a", "b", as.character(seq.Date(as.Date("2021-1-1"), length.out = 8, by = "days")))
(d <- as.data.frame(m))
# a b 2021-01-01 2021-01-02 2021-01-03 2021-01-04 2021-01-05 2021-01-06 2021-01-07 2021-01-08
# 1 1 1 1 1 1 1 1 1 1 1
# 2 1 1 1 1 1 1 1 1 1 1
# 3 1 1 1 1 1 1 1 1 1 1
过滤器
r <- vapply(names(d), as.Date, numeric(1), optional = TRUE)
d[, is.na(r) | r <= as.Date("2021-1-3")]
# a b 2021-01-01 2021-01-02 2021-01-03
# 1 1 1 1 1 1
# 2 1 1 1 1 1
# 3 1 1 1 1 1
r <- vapply(names(df), as.Date, numeric(1), optional = TRUE)
df[, is.na(r) | r >= as.Date("2019-1-31")]
# id type color 2019-04-30 2019-03-31 2019-02-28 2019-01-31
# 1 apples fruit red 353 349 345 341
# 2 pears fruit green 91 90 89 88
# 3 grapes fruit purple 270 267 264 261
# 4 tomatoes veggies red 2029 2028 2027 2026
# 5 carrots veggies orange 107 104 101 98
# 6 cucumber veggies green 62 60 58 56
# 7 rabbit pets grey 30 29 28 27
# 8 cat pets black 61 59 57 55
# 9 dog pets brown 137 133 129 125
描述
可以将数据重新整形为长格式并根据日期列进行过滤。
数据
与示例中提供的数据相同
df <- data.frame (id = c("apples", "pears", "grapes", "tomatoes", "carrots", "cucumber", "rabbit", "cat", "dog"),
type = c("fruit", "fruit", "fruit", "veggies", "veggies", "veggies", "pets", "pets", "pets"),
color = c("red", "green", "purple", "red", "orange", "green", "grey", "black", "brown"),
'2019-04-30' = c(353, 91, 270, 2029, 107, 62, 30, 61, 137),
'2019-03-31' = c(349, 90, 267, 2028, 104, 60, 29, 59, 133),
'2019-02-28' = c(345, 89, 264, 2027, 101, 58, 28, 57, 129),
'2019-01-31' = c(341, 88, 261, 2026, 98, 56, 27, 55, 125),
'2018-12-31' = c(337, 87, 258, 2025, 95, 54, 26, 53, 121),
'2018-11-30' = c(333, 86, 255, 2024, 92, 52, 25, 51, 117),
check.names = FALSE)
解决方案
library(dplyr)
library(tidyr)
df %>%
tidyr::pivot_longer(cols = !c(id, type, color), names_to = 'date', values_to = 'value') %>%
dplyr::mutate(date = as.Date(date, format = '%Y-%m-%d')) %>%
dplyr::filter( date >= as.Date('2019-01-31')) %>%
tidyr::pivot_wider(names_from = 'date', values_from = 'value')
期望的输出
id type color `2019-04-30` `2019-03-31` `2019-02-28` `2019-01-31`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 apples fruit red 353 349 345 341
2 pears fruit green 91 90 89 88
3 grapes fruit purple 270 267 264 261
4 tomatoes veggies red 2029 2028 2027 2026
5 carrots veggies orange 107 104 101 98
6 cucumber veggies green 62 60 58 56
7 rabbit pets grey 30 29 28 27
8 cat pets black 61 59 57 55
9 dog pets brown 137 133 129 125
我们可以在 base R 中执行此操作。您的日期采用 YYYY-MM-DD 格式很方便,这意味着它们将由 >=
和 <=
运算符正确排序。我们还可以使用一个简单的正则表达式来保留任何非日期格式的列:
df[!grepl('\d{4}-\d{2}-\d{2}', colnames(df)) | colnames(df) >= '2019-02-28']
id type color 2019-04-30 2019-03-31 2019-02-28
1 apples fruit red 353 349 345
2 pears fruit green 91 90 89
3 grapes fruit purple 270 267 264
4 tomatoes veggies red 2029 2028 2027
5 carrots veggies orange 107 104 101
6 cucumber veggies green 62 60 58
7 rabbit pets grey 30 29 28
8 cat pets black 61 59 57
9 dog pets brown 137 133 129