当数据集长度可变时,如何在 R 中使用 if 函数合并两个数据集
How to merge two datasets using if-function in R when datasets are of variable length
我有 60 天每小时记录的 GPS 点(纬度和经度),并且我有同一时间段的日出和日落时间。我正在尝试合并这两个数据集,所以我知道记录 GPS 点 x 时太阳是升起还是落下。
如果 sun
数据帧小于 gps
数据帧,我已经设法让它工作。但是如果 sun
数据框更大(如下例所示),我会得到一个错误。不幸的是,我需要代码在这两种情况下都能工作(即,无论哪个数据框更大)。
示例数据和代码:
library(lubridate)
gps <- data.frame(lat = c(54.008, 54.009, 54.009, 54.008, 54.009, 54.009),
long = c(38.050, 38.051, 38.053, 38.050, 38.051, 38.053),
date = as.Date(c("2019-12-19", "2019-12-19", "2019-12-19", "2019-12-19","2019-12-20", "2019-12-20"), format = "%Y-%m-%d"),
time = as.numeric(hm("06:00", "07:00", "12:30", "13:00", "07:00", "24:00")))
sun <- data.frame(date = as.Date(c("2019-12-19", "2019-12-20", "2019-12-21", "2019-12-22", "2019-12-23", "2019-12-24", "2019-12-25"), format = "%Y-%m-%d"),
sunrise = as.numeric(hm("06:04", "06:05","06:06", "06:07","06:08", "06:09", "06:09")),
sunset = as.numeric(hm("23:06", "23:06","23:06", "23:06","23:06", "23:06", "23:08")))
gps$sunlight <- ifelse(sun$date == gps$date | sun$sunrise >= gps$time | sun$sunset <= gps$time, "N", "Y")
错误:
Error in `$<-.data.frame`(`*tmp*`, sunlight, value = c("N", "Y", "Y", :
replacement has 7 rows, data has 6
想要的输出:
> gps
lat long date time sunlight
1 54.008 38.050 2019-12-19 21600 N
2 54.009 38.051 2019-12-19 25200 Y
3 54.009 38.053 2019-12-19 45000 Y
4 54.008 38.050 2019-12-19 46800 Y
5 54.009 38.051 2019-12-20 25200 Y
6 54.009 38.053 2019-12-20 86400 N
有什么我可能出错的建议或想法吗?
您需要先合并两个数据框。使用左连接,gps
上的每一行将尝试与 sun
中 date
列的行合并。然后你可以应用条件 sunlight_cond
来查看 gps time
是否在 sunrise
和 sunset
之间。然后您可以将 TRUE
和 FALSE
值转换为所需的标签,Y
和 N
library(tidyverse)
gps %>%
left_join(sun, by="date") %>%
mutate(sunlight_cond = time >= sunrise & time <= sunset,
sunlight = factor(sunlight_cond, levels = c(TRUE, FALSE), labels = c("Y", "N")))
结果:
lat long date time sunrise sunset sunlight_cond sunlight
1 54.008 38.050 2019-12-19 21600 21840 83160 FALSE N
2 54.009 38.051 2019-12-19 25200 21840 83160 TRUE Y
3 54.009 38.053 2019-12-19 45000 21840 83160 TRUE Y
4 54.008 38.050 2019-12-19 46800 21840 83160 TRUE Y
5 54.009 38.051 2019-12-20 25200 21900 83160 TRUE Y
6 54.009 38.053 2019-12-20 86400 21900 83160 FALSE N
我有 60 天每小时记录的 GPS 点(纬度和经度),并且我有同一时间段的日出和日落时间。我正在尝试合并这两个数据集,所以我知道记录 GPS 点 x 时太阳是升起还是落下。
如果 sun
数据帧小于 gps
数据帧,我已经设法让它工作。但是如果 sun
数据框更大(如下例所示),我会得到一个错误。不幸的是,我需要代码在这两种情况下都能工作(即,无论哪个数据框更大)。
示例数据和代码:
library(lubridate)
gps <- data.frame(lat = c(54.008, 54.009, 54.009, 54.008, 54.009, 54.009),
long = c(38.050, 38.051, 38.053, 38.050, 38.051, 38.053),
date = as.Date(c("2019-12-19", "2019-12-19", "2019-12-19", "2019-12-19","2019-12-20", "2019-12-20"), format = "%Y-%m-%d"),
time = as.numeric(hm("06:00", "07:00", "12:30", "13:00", "07:00", "24:00")))
sun <- data.frame(date = as.Date(c("2019-12-19", "2019-12-20", "2019-12-21", "2019-12-22", "2019-12-23", "2019-12-24", "2019-12-25"), format = "%Y-%m-%d"),
sunrise = as.numeric(hm("06:04", "06:05","06:06", "06:07","06:08", "06:09", "06:09")),
sunset = as.numeric(hm("23:06", "23:06","23:06", "23:06","23:06", "23:06", "23:08")))
gps$sunlight <- ifelse(sun$date == gps$date | sun$sunrise >= gps$time | sun$sunset <= gps$time, "N", "Y")
错误:
Error in `$<-.data.frame`(`*tmp*`, sunlight, value = c("N", "Y", "Y", :
replacement has 7 rows, data has 6
想要的输出:
> gps
lat long date time sunlight
1 54.008 38.050 2019-12-19 21600 N
2 54.009 38.051 2019-12-19 25200 Y
3 54.009 38.053 2019-12-19 45000 Y
4 54.008 38.050 2019-12-19 46800 Y
5 54.009 38.051 2019-12-20 25200 Y
6 54.009 38.053 2019-12-20 86400 N
有什么我可能出错的建议或想法吗?
您需要先合并两个数据框。使用左连接,gps
上的每一行将尝试与 sun
中 date
列的行合并。然后你可以应用条件 sunlight_cond
来查看 gps time
是否在 sunrise
和 sunset
之间。然后您可以将 TRUE
和 FALSE
值转换为所需的标签,Y
和 N
library(tidyverse)
gps %>%
left_join(sun, by="date") %>%
mutate(sunlight_cond = time >= sunrise & time <= sunset,
sunlight = factor(sunlight_cond, levels = c(TRUE, FALSE), labels = c("Y", "N")))
结果:
lat long date time sunrise sunset sunlight_cond sunlight
1 54.008 38.050 2019-12-19 21600 21840 83160 FALSE N
2 54.009 38.051 2019-12-19 25200 21840 83160 TRUE Y
3 54.009 38.053 2019-12-19 45000 21840 83160 TRUE Y
4 54.008 38.050 2019-12-19 46800 21840 83160 TRUE Y
5 54.009 38.051 2019-12-20 25200 21900 83160 TRUE Y
6 54.009 38.053 2019-12-20 86400 21900 83160 FALSE N