在 R 的限制下联合两个数据集

Question

我想根据月份和年份合并两个不同的数据集。他们每个人都有 3 列，一列是年，一列是月，最后一列是两个不同事物的平均值。第一个数据集有 2010 年和 2011 年，第二个数据集有 2015 年和 2016 年。我想制作第三个数据集，其中一列是 2010 年 6 月的平均值，另一列是 2015 年 10 月的平均值。换句话说，我想连接两个不同的年份和不同的月份。我希望它看起来像这样：

Year	Month	AVG
2010	October	15.7
2010	November	13.6
2010	December	13.9

Year	Month	AVG
2015	June	18.2
2015	July	18.4
2015	August	19.0

Year2	Month2	Year1	Month1	AVG2	AVG1
2015	June	2010	October	18.2	15.7
2015	July	2010	November	18.4	13.6
2015	August	2010	December	19.0	13.9

部分数据集 1 如下所示：

structure(list(Year = c(2010, 2010, 2010, 2010, 2010, 2010, 2010, 
2010, 2010, 2010), Month = c("April", "August", "December", "February", 
"January", "July", "June", "March", "May", "November"), Log_AVG = c(4.95582705760126, 4.86753445045558, 
5.34233425196481, 5.35185813347607, 5.33753807970132, 4.82028156560504, 
4.69134788222914, 5.29831736654804, 4.75359019110636, 5.12989871492307
)), row.names = c(NA, -10L), groups = structure(list(
    Year = c(2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 
    2010, 2010), Month = c("April", "August", "December", "February", 
    "January", "July", "June", "March", "May", "November"), .rows = structure(list(
        1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

第二个数据集的一部分看起来像这样：

structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2015), Month = structure(1:10, .Label = c("January", "February", 
"March", "April", "May", "June", "July", "August", "September", 
"October", "November", "December"), class = c("ordered", "factor"
)), Log_AVG = c(0, 0, 9.08398318309966, 8.76029622047005, 
7.13089883029635, 7.07834157955767, 7.95892649305011, 8.8146275553107, 
9.69572510326022, 10.5731101880491)), row.names = c(NA, -10L), groups = structure(list(
     Year = 2015, .rows = structure(list(
        1:10), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

有什么方法可以用 R 做到这一点吗？或者有什么方法可以按照我喜欢的方式对两个初始数据集进行排序（例如，保持原样，并根据月份而不是字母顺序对第二个进行排序，将 2015 年 6 月作为第一个月并继续2015 年和 2016 年的其余月份）？非常感谢！

Answer 1

如果您想要固定偏移量，其中“日期 2”比“日期 1”提前 5 年零 4 个月，我会进行计算。像这样：

library(stringr)
library(lubridate)
df1 %>%
  mutate(
    date1 = ymd(paste(Year, Month, "01")),
    date2 = date1 + years(5) + months(4)
  ) %>%
  left_join(
    df2 %>% mutate(date2 = ymd(paste(Year, Month, "01"))),
    by = "date2",
    suffix = c("1", "2")
  )

如果您以 copy/pasteable 格式共享数据，我会很乐意进行测试和调试。

Answer 2

听起来您只是想让 cbind 将两个数据框并排放置：

d1<-data.frame(Year=rep(2010,3),
               Month = c("October","November","December"),
               AVG=c(15.7,13.6,13.9))
d2<-data.frame(Year=rep(2015,3),
               Month = c("June","July","August"),
               AVG=c(15.7,13.6,13.9))
d3<-cbind(d1,d2)

这给了我们：

> d1
  Year    Month  AVG
1 2010  October 15.7
2 2010 November 13.6
3 2010 December 13.9
> d2
  Year  Month  AVG
1 2015   June 15.7
2 2015   July 13.6
3 2015 August 13.9
> d3<-cbind(d1,d2)
> d3
  Year    Month  AVG Year  Month  AVG
1 2010  October 15.7 2015   June 15.7
2 2010 November 13.6 2015   July 13.6
3 2010 December 13.9 2015 August 13.9

如果您必须按月排序，那么首先您需要像这样转换为 month.name 的因数：

d4<-data.frame(Year=rep(2010,3),
                   Month = c("November","December","October"),
                   AVG=c(13.6,13.9,15.7))
d4$Month <- factor(d4$Month, levels = month.name)
d4 <- d4[order(d4$Month),]

这给了我们这个：

> d4
  Year    Month  AVG
1 2010 November 13.6
2 2010 December 13.9
3 2010  October 15.7
> d4$Month <- factor(d4$Month, levels = month.name)
> d4 <- d4[order(d4$Month),]
> d4
  Year    Month  AVG
3 2010  October 15.7
1 2010 November 13.6
2 2010 December 13.9

在 R 的限制下联合两个数据集

Joint two datasets under restrictions in R

r

dataset

dplyr