基于 2 列合并两个数据集，或者在第一个数据框中找到缺失值并从另一个数据框中填充

Question

我有 2 个 pandas 数据框，有 2 列，即索引和日期。第一个数据框中缺少一些日期，这些值可以从与索引对应的第二个数据框中获得。我尝试使用 pd.concat、pd.merge 和 pd.join 等，但这些似乎没有给我想要的结果。这里是 table.

df1 =

df2 =

Answer 1

由于没有可重现的数据框，我在生成的数据上使用运行尝试了下面的代码，但我认为它也适用于您的代码：

import pandas as pd
df1 = pd.DataFrame({"date": [None, None, None, "01/01/2022"], "index":[402,402,403,404]})
df2 = pd.DataFrame({"date": ["16/05/2020", "18/07/2021", "13/08/2022", "26/07/2020"], "index":[402,405,403,404]})
df1.set_index("index", inplace=True)
df2.set_index("index", inplace=True)
for index, row in df1.iterrows():
  if row["date"] != row["date"] or row["date"] == None:
    df1.loc[index , "date"] = df2.loc[index]["date"]
df1

输出

index	date
402	16/05/2020
402	16/05/2020
403	13/08/2022
404	01/01/2022

请注意，当单元格的值为 nan 且具有浮点类型时，将使用 row["date"] != row["date"]。 nan 值甚至不等于它们自己！

Answer 2

你试过 df1 = df1.update(df2) 了吗？

虽然更新函数不会增加 df1 的大小，但它只会更新缺失值或已经存在的值。

Answer 3

您可以试试这个解决方案：

import pandas as pd
import numpy as np

# initialize list of lists
df1 = [[402, '15/05/2020'], [408, np.nan], [408, '14/05/2020']]
df2 = [[402, '16/05/2020'], [408, '10/05/2020'], [409, '13/05/2020']]

# Create the pandas DataFrame
df1 = pd.DataFrame(df1, columns=['index', 'date'])
df2 = pd.DataFrame(df2, columns=['index', 'date'])

df1.set_index("index", inplace=True)
df2.set_index("index", inplace=True)
for index, row in df1.iterrows():
    if row["date"] != row["date"]:
        row["date"] = df2.loc[index]["date"]

输出：

index            
402    15/05/2020
408    10/05/2020
408    14/05/2020

使用此解决方案，只有日期为 nan 或 null 的行会更新为其他数据帧上的相应值。

基于 2 列合并两个数据集，或者在第一个数据框中找到缺失值并从另一个数据框中填充

Merging two datasets based on the 2 columns, or finding the missing values in first dataframe and filling that from the other

python

dataframe

python-3.x

pandas

data-science

输出