根据其他数据框列从数据框中选择特定行
selecting a particular row from a dataframe based on the other dataframe column
朋友们,
我有一个简单的问题,但无法巧妙地解决它。下面是它的样子..
df1 --> this data frame has around 3mn rows
event lat long
e01010 10.1010 20.1010
e02020 10.1010 20.1010
e03030 10.1010 20.1010
e04040 10.1010 20.1010
.
.
.
df2 --> this data frame has around 60k rows
event start_date end_date
e01010 2016-01-10 2016-01-12
e04020 2017-10-12 2017-10-22
e03030 2015-01-10 2015-01-10
e06040 2018-01-22 2018-02-22
.
.
.
现在我期待 "df2" 中的结果如下,新的额外 2 列称为 "lat" 和 "long"
df2
event start date end date lat long
e01010 2016-01-10 2016-01-12 10.1010 20.1010
e04020 2017-10-12 2017-10-22 NA NA
e03030 2015-01-10 2015-01-10 10.1010 20.1010
e06040 2018-01-22 2018-02-19 NA NA
.
.
.
如您所见,df2 是我的主要数据框,我想附加条件与 df1 匹配的列..
任何人都可以在这里帮助我..我试过 "which" 但无法实现!!
您可以使用 dplyr::left_join
:
dplyr::left_join(df2, df1, by = "event");
# event start.date end.date lat long
#1 e01010 2016-01-10 2016-01-12 10.101 20.101
#2 e04020 2017-10-12 2017-10-22 NA NA
#3 e03030 2015-01-10 2015-01-10 10.101 20.101
#4 e06040 2018-01-22 2018-02-22 NA NA
或以 R 为基数:
merge(df2, df1, by = "event", all.x = TRUE);
朋友们, 我有一个简单的问题,但无法巧妙地解决它。下面是它的样子..
df1 --> this data frame has around 3mn rows
event lat long
e01010 10.1010 20.1010
e02020 10.1010 20.1010
e03030 10.1010 20.1010
e04040 10.1010 20.1010
.
.
.
df2 --> this data frame has around 60k rows
event start_date end_date
e01010 2016-01-10 2016-01-12
e04020 2017-10-12 2017-10-22
e03030 2015-01-10 2015-01-10
e06040 2018-01-22 2018-02-22
.
.
.
现在我期待 "df2" 中的结果如下,新的额外 2 列称为 "lat" 和 "long"
df2
event start date end date lat long
e01010 2016-01-10 2016-01-12 10.1010 20.1010
e04020 2017-10-12 2017-10-22 NA NA
e03030 2015-01-10 2015-01-10 10.1010 20.1010
e06040 2018-01-22 2018-02-19 NA NA
.
.
.
如您所见,df2 是我的主要数据框,我想附加条件与 df1 匹配的列..
任何人都可以在这里帮助我..我试过 "which" 但无法实现!!
您可以使用 dplyr::left_join
:
dplyr::left_join(df2, df1, by = "event");
# event start.date end.date lat long
#1 e01010 2016-01-10 2016-01-12 10.101 20.101
#2 e04020 2017-10-12 2017-10-22 NA NA
#3 e03030 2015-01-10 2015-01-10 10.101 20.101
#4 e06040 2018-01-22 2018-02-22 NA NA
或以 R 为基数:
merge(df2, df1, by = "event", all.x = TRUE);