Merging/Concat 带日期的非唯一多索引

Merging/Concat non unique multi index with Date

我有如下2个数据框:

df1 =

City       Date           Data1
LA         2020-01-01     20
LA         2020-01-02     30
NY         2020-01-01     50

df2 = 

City       Date           Data2
LA         2020-01-01     2.5
LA         2020-01-02     1
LA         2020-01-03     7
NY         2020-01-01     6.5

我想根据 'City' 和 'Date' 合并或连接它们,结果将是:

City       Date           Data1    Data2
LA         2020-01-01     20       2.5
LA         2020-01-02     30       1
NY         2020-01-01     50       6.5

我尝试了什么:

pd.concat([df1.set_index(['Country','Date'],[df1.set_index(['Country','Date'])], axis = 1)

我得到错误: ValueError:无法处理非唯一的多索引!

我也不能合并,因为我有日期作为索引。

想法是通过 GroupBy.cumcount 创建的新列删除重复的对:

print (df2)
  City        Date  Data2
0   LA  2020-01-01    2.5
1   LA  2020-01-02    1.0 <- duplicates
2   LA  2020-01-02    7.0 <- duplicates
3   NY  2020-01-01    6.5

df1 = (df1.assign(g = df1.groupby(['City','Date']).cumcount())
          .set_index(['City','Date','g']))
df2 = (df2.assign(g = df2.groupby(['City','Date']).cumcount())
          .set_index(['City','Date','g']))

df = pd.concat([df1, df2], axis = 1)
print (df)
                   Data1  Data2
City Date       g              
LA   2020-01-01 0   20.0    2.5
     2020-01-02 0   30.0    1.0
                1    NaN    7.0
NY   2020-01-01 0   50.0    6.5

如果需要删除助手级别g:

df = pd.concat([df1, df2], axis = 1).reset_index(level=2, drop=True)
print (df)
                 Data1  Data2
City Date                    
LA   2020-01-01   20.0    2.5
     2020-01-02   30.0    1.0
     2020-01-02    NaN    7.0
NY   2020-01-01   50.0    6.5

编辑:我认为这里有必要将两列都转换为 DataFrame,然后使用内部连接 ​​DataFrame.merge:

df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])

df = df1.merge(df2, on=['City','Date'])
print (df)
  City       Date  Data1  Data2
0   LA 2020-01-01     20    2.5
1   LA 2020-01-02     30    1.0
2   NY 2020-01-01     50    6.5