合并 Python 中日期略有不同的日期时间索引
Merging Datetime Indices With Slightly Different Dates in Python
我正在尝试合并两个具有不同日期时间索引的 pandas 数据帧。 DF1 是 XYZ 公司的季度 财务报表,DF2 是 XYZ 股票public 交易股票的每日收盘价。
问题是财务报告的发布日期并不总是与当前的每日收盘价相匹配(大概是因为报告是在周末发布的)。
我需要一种方法来模糊 DF2 中的日期,以便当我将它们与 DF1 合并时,合并会从 DF2 中选择最接近的日期,而不是在合并中为收盘价留空 space。
目前正在使用:
df1 = [['2007-12-30','$xxx,xxx'],
['2008-03-30','$xxx,xxx'],
['2008-06-28','$xxx,xxx'],
['2008-09-29','$xxx,xxx'],
['2008-12-31','$xxx,xxx']]
df2 = [['2007-12-30',''],
['2008-03-30',''],
['2008-06-27',''],
['2008-09-29',''],
['2008-12-30','']]
df3 = pd.merge(df1, df2, how='outer', on='date')
RETURNS:
df3 = [['2007-12-30','$xxx,xxx', ''],
['2008-03-30','$xxx,xxx', ''],
['2008-06-28','$xxx,xxx', 'NaN'],
['2008-09-29','$xxx,xxx', ''],
['2008-12-31','$xxx,xxx', 'Nan']]
想要 RETURN:
df3 = [['2007-12-30','$xxx,xxx', ''],
['2008-03-30','$xxx,xxx', ''],
['2008-06-28','$xxx,xxx', ''],
['2008-09-29','$xxx,xxx', ''],
['2008-12-31','$xxx,xxx', '']]
解法:
df3 = pd.merge(df1, df2, how='outer', on='date')\
.sort_index(ascending=False).fillna(method="ffill")
df3 = df3[df3.index.isin(df1.index)]
使用 fillna(method="ffill")
获取先前的值,然后仅保存日期存在于 df1
中的行
df3 = pd.merge(df1, df2, how='outer', on='date').sort_values('date').fillna(method="ffill")
df3 = df3[df3['date'].isin(df1['date'])]
date xprice price
0 2007-12-30 $xxx,xxx
1 2008-03-30 $xxx,xxx
2 2008-06-28 $xxx,xxx
3 2008-09-29 $xxx,xxx
4 2008-12-31 $xxx,xxx
import pandas as pd
mylist1 = [['2007-12-30','$xxx,xxx'],
['2008-03-30','$xxx,xxx'],
['2008-06-28','$xxx,xxx'],
['2008-09-29','$xxx,xxx'],
['2008-12-31','$xxx,xxx']]
mylist2 = [['2007-12-30',''],
['2008-03-30',''],
['2008-06-27',''],
['2008-09-29',''],
['2008-12-30','']]
df1 = pd.DataFrame.from_records(mylist1,columns=['date', "value"])
df2 = pd.DataFrame.from_records(mylist2,columns=['date', "value"])
df3 = pd.merge(df1, df2, right_index=True, left_index=True)
我正在尝试合并两个具有不同日期时间索引的 pandas 数据帧。 DF1 是 XYZ 公司的季度 财务报表,DF2 是 XYZ 股票public 交易股票的每日收盘价。
问题是财务报告的发布日期并不总是与当前的每日收盘价相匹配(大概是因为报告是在周末发布的)。
我需要一种方法来模糊 DF2 中的日期,以便当我将它们与 DF1 合并时,合并会从 DF2 中选择最接近的日期,而不是在合并中为收盘价留空 space。
目前正在使用:
df1 = [['2007-12-30','$xxx,xxx'],
['2008-03-30','$xxx,xxx'],
['2008-06-28','$xxx,xxx'],
['2008-09-29','$xxx,xxx'],
['2008-12-31','$xxx,xxx']]
df2 = [['2007-12-30',''],
['2008-03-30',''],
['2008-06-27',''],
['2008-09-29',''],
['2008-12-30','']]
df3 = pd.merge(df1, df2, how='outer', on='date')
RETURNS:
df3 = [['2007-12-30','$xxx,xxx', ''],
['2008-03-30','$xxx,xxx', ''],
['2008-06-28','$xxx,xxx', 'NaN'],
['2008-09-29','$xxx,xxx', ''],
['2008-12-31','$xxx,xxx', 'Nan']]
想要 RETURN:
df3 = [['2007-12-30','$xxx,xxx', ''],
['2008-03-30','$xxx,xxx', ''],
['2008-06-28','$xxx,xxx', ''],
['2008-09-29','$xxx,xxx', ''],
['2008-12-31','$xxx,xxx', '']]
解法:
df3 = pd.merge(df1, df2, how='outer', on='date')\
.sort_index(ascending=False).fillna(method="ffill")
df3 = df3[df3.index.isin(df1.index)]
使用 fillna(method="ffill")
获取先前的值,然后仅保存日期存在于 df1
df3 = pd.merge(df1, df2, how='outer', on='date').sort_values('date').fillna(method="ffill")
df3 = df3[df3['date'].isin(df1['date'])]
date xprice price
0 2007-12-30 $xxx,xxx
1 2008-03-30 $xxx,xxx
2 2008-06-28 $xxx,xxx
3 2008-09-29 $xxx,xxx
4 2008-12-31 $xxx,xxx
import pandas as pd
mylist1 = [['2007-12-30','$xxx,xxx'],
['2008-03-30','$xxx,xxx'],
['2008-06-28','$xxx,xxx'],
['2008-09-29','$xxx,xxx'],
['2008-12-31','$xxx,xxx']]
mylist2 = [['2007-12-30',''],
['2008-03-30',''],
['2008-06-27',''],
['2008-09-29',''],
['2008-12-30','']]
df1 = pd.DataFrame.from_records(mylist1,columns=['date', "value"])
df2 = pd.DataFrame.from_records(mylist2,columns=['date', "value"])
df3 = pd.merge(df1, df2, right_index=True, left_index=True)