在 R 的限制下联合两个数据集
Joint two datasets under restrictions in R
我想根据月份和年份合并两个不同的数据集。他们每个人都有 3 列,一列是年,一列是月,最后一列是两个不同事物的平均值。第一个数据集有 2010 年和 2011 年,第二个数据集有 2015 年和 2016 年。我想制作第三个数据集,其中一列是 2010 年 6 月的平均值,另一列是 2015 年 10 月的平均值。换句话说,我想连接两个不同的年份和不同的月份。我希望它看起来像这样:
Year
Month
AVG
2010
October
15.7
2010
November
13.6
2010
December
13.9
Year
Month
AVG
2015
June
18.2
2015
July
18.4
2015
August
19.0
Year2
Month2
Year1
Month1
AVG2
AVG1
2015
June
2010
October
18.2
15.7
2015
July
2010
November
18.4
13.6
2015
August
2010
December
19.0
13.9
部分数据集 1 如下所示:
structure(list(Year = c(2010, 2010, 2010, 2010, 2010, 2010, 2010,
2010, 2010, 2010), Month = c("April", "August", "December", "February",
"January", "July", "June", "March", "May", "November"), Log_AVG = c(4.95582705760126, 4.86753445045558,
5.34233425196481, 5.35185813347607, 5.33753807970132, 4.82028156560504,
4.69134788222914, 5.29831736654804, 4.75359019110636, 5.12989871492307
)), row.names = c(NA, -10L), groups = structure(list(
Year = c(2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
2010, 2010), Month = c("April", "August", "December", "February",
"January", "July", "June", "March", "May", "November"), .rows = structure(list(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
第二个数据集的一部分看起来像这样:
structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015), Month = structure(1:10, .Label = c("January", "February",
"March", "April", "May", "June", "July", "August", "September",
"October", "November", "December"), class = c("ordered", "factor"
)), Log_AVG = c(0, 0, 9.08398318309966, 8.76029622047005,
7.13089883029635, 7.07834157955767, 7.95892649305011, 8.8146275553107,
9.69572510326022, 10.5731101880491)), row.names = c(NA, -10L), groups = structure(list(
Year = 2015, .rows = structure(list(
1:10), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
有什么方法可以用 R 做到这一点吗?
或者有什么方法可以按照我喜欢的方式对两个初始数据集进行排序(例如,保持原样,并根据月份而不是字母顺序对第二个进行排序,将 2015 年 6 月作为第一个月并继续2015 年和 2016 年的其余月份)?
非常感谢!
如果您想要固定偏移量,其中“日期 2”比“日期 1”提前 5 年零 4 个月,我会进行计算。像这样:
library(stringr)
library(lubridate)
df1 %>%
mutate(
date1 = ymd(paste(Year, Month, "01")),
date2 = date1 + years(5) + months(4)
) %>%
left_join(
df2 %>% mutate(date2 = ymd(paste(Year, Month, "01"))),
by = "date2",
suffix = c("1", "2")
)
如果您以 copy/pasteable 格式共享数据,我会很乐意进行测试和调试。
听起来您只是想让 cbind 将两个数据框并排放置:
d1<-data.frame(Year=rep(2010,3),
Month = c("October","November","December"),
AVG=c(15.7,13.6,13.9))
d2<-data.frame(Year=rep(2015,3),
Month = c("June","July","August"),
AVG=c(15.7,13.6,13.9))
d3<-cbind(d1,d2)
这给了我们:
> d1
Year Month AVG
1 2010 October 15.7
2 2010 November 13.6
3 2010 December 13.9
> d2
Year Month AVG
1 2015 June 15.7
2 2015 July 13.6
3 2015 August 13.9
> d3<-cbind(d1,d2)
> d3
Year Month AVG Year Month AVG
1 2010 October 15.7 2015 June 15.7
2 2010 November 13.6 2015 July 13.6
3 2010 December 13.9 2015 August 13.9
如果您必须按月排序,那么首先您需要像这样转换为 month.name
的因数:
d4<-data.frame(Year=rep(2010,3),
Month = c("November","December","October"),
AVG=c(13.6,13.9,15.7))
d4$Month <- factor(d4$Month, levels = month.name)
d4 <- d4[order(d4$Month),]
这给了我们这个:
> d4
Year Month AVG
1 2010 November 13.6
2 2010 December 13.9
3 2010 October 15.7
> d4$Month <- factor(d4$Month, levels = month.name)
> d4 <- d4[order(d4$Month),]
> d4
Year Month AVG
3 2010 October 15.7
1 2010 November 13.6
2 2010 December 13.9
我想根据月份和年份合并两个不同的数据集。他们每个人都有 3 列,一列是年,一列是月,最后一列是两个不同事物的平均值。第一个数据集有 2010 年和 2011 年,第二个数据集有 2015 年和 2016 年。我想制作第三个数据集,其中一列是 2010 年 6 月的平均值,另一列是 2015 年 10 月的平均值。换句话说,我想连接两个不同的年份和不同的月份。我希望它看起来像这样:
Year | Month | AVG |
---|---|---|
2010 | October | 15.7 |
2010 | November | 13.6 |
2010 | December | 13.9 |
Year | Month | AVG |
---|---|---|
2015 | June | 18.2 |
2015 | July | 18.4 |
2015 | August | 19.0 |
Year2 | Month2 | Year1 | Month1 | AVG2 | AVG1 |
---|---|---|---|---|---|
2015 | June | 2010 | October | 18.2 | 15.7 |
2015 | July | 2010 | November | 18.4 | 13.6 |
2015 | August | 2010 | December | 19.0 | 13.9 |
部分数据集 1 如下所示:
structure(list(Year = c(2010, 2010, 2010, 2010, 2010, 2010, 2010,
2010, 2010, 2010), Month = c("April", "August", "December", "February",
"January", "July", "June", "March", "May", "November"), Log_AVG = c(4.95582705760126, 4.86753445045558,
5.34233425196481, 5.35185813347607, 5.33753807970132, 4.82028156560504,
4.69134788222914, 5.29831736654804, 4.75359019110636, 5.12989871492307
)), row.names = c(NA, -10L), groups = structure(list(
Year = c(2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
2010, 2010), Month = c("April", "August", "December", "February",
"January", "July", "June", "March", "May", "November"), .rows = structure(list(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
第二个数据集的一部分看起来像这样:
structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015), Month = structure(1:10, .Label = c("January", "February",
"March", "April", "May", "June", "July", "August", "September",
"October", "November", "December"), class = c("ordered", "factor"
)), Log_AVG = c(0, 0, 9.08398318309966, 8.76029622047005,
7.13089883029635, 7.07834157955767, 7.95892649305011, 8.8146275553107,
9.69572510326022, 10.5731101880491)), row.names = c(NA, -10L), groups = structure(list(
Year = 2015, .rows = structure(list(
1:10), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
有什么方法可以用 R 做到这一点吗? 或者有什么方法可以按照我喜欢的方式对两个初始数据集进行排序(例如,保持原样,并根据月份而不是字母顺序对第二个进行排序,将 2015 年 6 月作为第一个月并继续2015 年和 2016 年的其余月份)? 非常感谢!
如果您想要固定偏移量,其中“日期 2”比“日期 1”提前 5 年零 4 个月,我会进行计算。像这样:
library(stringr)
library(lubridate)
df1 %>%
mutate(
date1 = ymd(paste(Year, Month, "01")),
date2 = date1 + years(5) + months(4)
) %>%
left_join(
df2 %>% mutate(date2 = ymd(paste(Year, Month, "01"))),
by = "date2",
suffix = c("1", "2")
)
如果您以 copy/pasteable 格式共享数据,我会很乐意进行测试和调试。
听起来您只是想让 cbind 将两个数据框并排放置:
d1<-data.frame(Year=rep(2010,3),
Month = c("October","November","December"),
AVG=c(15.7,13.6,13.9))
d2<-data.frame(Year=rep(2015,3),
Month = c("June","July","August"),
AVG=c(15.7,13.6,13.9))
d3<-cbind(d1,d2)
这给了我们:
> d1
Year Month AVG
1 2010 October 15.7
2 2010 November 13.6
3 2010 December 13.9
> d2
Year Month AVG
1 2015 June 15.7
2 2015 July 13.6
3 2015 August 13.9
> d3<-cbind(d1,d2)
> d3
Year Month AVG Year Month AVG
1 2010 October 15.7 2015 June 15.7
2 2010 November 13.6 2015 July 13.6
3 2010 December 13.9 2015 August 13.9
如果您必须按月排序,那么首先您需要像这样转换为 month.name
的因数:
d4<-data.frame(Year=rep(2010,3),
Month = c("November","December","October"),
AVG=c(13.6,13.9,15.7))
d4$Month <- factor(d4$Month, levels = month.name)
d4 <- d4[order(d4$Month),]
这给了我们这个:
> d4
Year Month AVG
1 2010 November 13.6
2 2010 December 13.9
3 2010 October 15.7
> d4$Month <- factor(d4$Month, levels = month.name)
> d4 <- d4[order(d4$Month),]
> d4
Year Month AVG
3 2010 October 15.7
1 2010 November 13.6
2 2010 December 13.9