如何让 R 使用最近两个月创建数据框
How to have R create a data frame using the last two months
下面是一个示例数据集
area periodyear period employment date
01 2020 08 100 2020-08-01
01 2020 09 105 2020-09-01
01 2020 10 110 2020-10-01
02 2020 08 101 2020-08-01
02 2020 09 102 2020-09-01
02 2020 10 103 2020-10-01
问题是我如何让 R 到 return 最后两行。我使用以下代码创建日期作为具有单个值(而不是周期年和期间)的方式,可以找到最大值。
substate$date<- ymd(粘贴(substate$PERIODYEAR,substate$PERIOD,"1",sep="-"))
我知道如何让它找到列(在本例中为日期)的最大值,但不清楚如何让它创建如下所示的数据框
area periodyear period employment date
01 2020 09 105 2020-09-01
01 2020 10 110 2020-10-01
02 2020 09 102 2020-09-01
02 2020 10 103 2020-10-01
之所以要最后两个,是因为一个月是全新的数据,前一个月是修改过的。从这里,我更新了一个 SQL 数据库。
一个选项是 slice
在 arrange
ing 'area' 之后 Date
class 转换为 'date'(如果它们不是按顺序)
library(dplyr)
df1 %>%
arrange(area, as.Date(date)) %>%
group_by(area) %>%
slice_tail(n = 2) %>%
ungroup
-输出
# A tibble: 4 x 5
# area periodyear period employment date
# <chr> <int> <int> <int> <chr>
#1 01 2020 9 105 2020-09-01
#2 01 2020 10 110 2020-10-01
#3 02 2020 9 102 2020-09-01
#4 02 2020 10 103 2020-10-01
数据
df1 <- structure(list(area = c("01", "01", "01", "02", "02", "02"),
periodyear = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L),
period = c(8L, 9L, 10L, 8L, 9L, 10L), employment = c(100L,
105L, 110L, 101L, 102L, 103L), date = c("2020-08-01", "2020-09-01",
"2020-10-01", "2020-08-01", "2020-09-01", "2020-10-01")),
row.names = c(NA,
-6L), class = "data.frame")
也许是这样:
library(dplyr)
#Code
df %>% arrange(area,date) %>% group_by(area) %>%filter(row_number() %in% 2:n())
输出:
# A tibble: 4 x 5
# Groups: area [2]
area periodyear period employment date
<int> <int> <int> <int> <date>
1 1 2020 9 105 2020-09-01
2 1 2020 10 110 2020-10-01
3 2 2020 9 102 2020-09-01
4 2 2020 10 103 2020-10-01
下面是一个示例数据集
area periodyear period employment date
01 2020 08 100 2020-08-01
01 2020 09 105 2020-09-01
01 2020 10 110 2020-10-01
02 2020 08 101 2020-08-01
02 2020 09 102 2020-09-01
02 2020 10 103 2020-10-01
问题是我如何让 R 到 return 最后两行。我使用以下代码创建日期作为具有单个值(而不是周期年和期间)的方式,可以找到最大值。
substate$date<- ymd(粘贴(substate$PERIODYEAR,substate$PERIOD,"1",sep="-"))
我知道如何让它找到列(在本例中为日期)的最大值,但不清楚如何让它创建如下所示的数据框
area periodyear period employment date
01 2020 09 105 2020-09-01
01 2020 10 110 2020-10-01
02 2020 09 102 2020-09-01
02 2020 10 103 2020-10-01
之所以要最后两个,是因为一个月是全新的数据,前一个月是修改过的。从这里,我更新了一个 SQL 数据库。
一个选项是 slice
在 arrange
ing 'area' 之后 Date
class 转换为 'date'(如果它们不是按顺序)
library(dplyr)
df1 %>%
arrange(area, as.Date(date)) %>%
group_by(area) %>%
slice_tail(n = 2) %>%
ungroup
-输出
# A tibble: 4 x 5
# area periodyear period employment date
# <chr> <int> <int> <int> <chr>
#1 01 2020 9 105 2020-09-01
#2 01 2020 10 110 2020-10-01
#3 02 2020 9 102 2020-09-01
#4 02 2020 10 103 2020-10-01
数据
df1 <- structure(list(area = c("01", "01", "01", "02", "02", "02"),
periodyear = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L),
period = c(8L, 9L, 10L, 8L, 9L, 10L), employment = c(100L,
105L, 110L, 101L, 102L, 103L), date = c("2020-08-01", "2020-09-01",
"2020-10-01", "2020-08-01", "2020-09-01", "2020-10-01")),
row.names = c(NA,
-6L), class = "data.frame")
也许是这样:
library(dplyr)
#Code
df %>% arrange(area,date) %>% group_by(area) %>%filter(row_number() %in% 2:n())
输出:
# A tibble: 4 x 5
# Groups: area [2]
area periodyear period employment date
<int> <int> <int> <int> <date>
1 1 2020 9 105 2020-09-01
2 1 2020 10 110 2020-10-01
3 2 2020 9 102 2020-09-01
4 2 2020 10 103 2020-10-01