将两个数据框与一些公共列合并，其中公共列的组合需要自定义函数

Question

我的问题与Merge pandas dataframe, with column operation非常相似，但它不能满足我的需求。

假设我有两个数据框，例如（注意数据框内容可以是浮点数而不是布尔值）：

left = pd.DataFrame({0: [True, True, False], 0.5: [False, True, True]}, index=[12.5, 14, 15.5])
right = pd.DataFrame({0.7: [True, False, False], 0.5: [True, False, True]}, index=[12.5, 14, 15.5])

对

        0.5    0.7
12.5   True   True
14.0  False  False
15.5   True  False

左

        0.0    0.5
12.5   True  False
14.0   True   True
15.5  False   True

如您所见，它们具有相同的索引，并且其中一列是通用的。在现实生活中，可能会有更常见的列，例如 1.0 或其他尚未定义的数字，以及每侧更多独特的列。我需要组合这两个数据框，以便保留所有唯一列，并使用特定函数组合公共列，例如此示例的布尔值或，而两个数据帧的索引始终相同。

所以结果应该是：

结果

        0.0   0.5    0.7
12.5   True  True   True
14.0   True  True  False
15.5  False  True  False

在现实生活中会有两个以上的dataframe需要合并，但是可以一个接一个地依次合并到一个空的第一个dataframe。

我觉得 pandas.combine 可以解决问题，但我无法从文档中找出答案。任何人都会对如何通过一个或多个步骤进行操作提出建议。

Answer 1

您可以连接数据帧，然后按列名分组以对名称相似的列应用操作：在这种情况下，您可以先求和然后类型转换回 bool 以获得 or 运算.

import pandas as pd

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).sum().astype(bool)

输出：

        0.0   0.5    0.7
12.5   True  True   True
14.0   True  True  False
15.5  False  True  False

如果您需要了解如何以不太具体的方式执行此操作，那么再次按列分组并在 axis=1

上对分组对象应用某些内容

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: x.any(1))
#        0.0   0.5    0.7
#12.5   True  True   True
#14.0   True  True  False
#15.5  False  True  False

此外，您还可以自定义组合函数。这是一个将左帧的两倍添加到右帧的 4 倍的帧。如果只有一列，它 returns 左框架的 2 倍。

示例数据

左：

      0.0  0.5
12.5    1   11
14.0    2   17
15.5    3   17

对：

      0.7  0.5
12.5    4    2
14.0    4   -1
15.5    5    5

代码

def my_func(x):
    try:
        res = x.iloc[:, 0]*2 + x.iloc[:, 1]*4
    except IndexError:
        res = x.iloc[:, 0]*2
    return res

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: my_func(x))

输出：

      0.0  0.5  0.7
12.5    2   30    8
14.0    4   30    8
15.5    6   54   10

最后，如果您想以连续的方式执行此操作，则应使用 reduce。在这里，我将 5 DataFrames 与上述功能结合起来。（我将只重复正确的帧 4x 作为示例）

from functools import reduce

def my_comb(df_l, df_r, func):
    """ Concatenate df_l and df_r along axis=1. Apply the
    specified function.
    """
    df = pd.concat([df_l, df_r], 1)
    return df.groupby(df.columns, 1).apply(lambda x: func(x))

reduce(lambda dfl, dfr: my_comb(dfl, dfr, func=my_func), [left, right, right, right, right])
#      0.0  0.5  0.7
#12.5   16  296  176
#14.0   32  212  176
#15.5   48  572  220

将两个数据框与一些公共列合并，其中公共列的组合需要自定义函数

merge two dataframes with some common columns where the combining of the common needs to be a custom function

python

merge

concat

pandas

对

左

结果

输出：

示例数据

代码

输出：