将数据框与数组合并？

Question

我希望你能提供一些指导 - 我正在使用 Python v2.7 中的 Pandas 库编写脚本。

脚本的一部分合并了两个数据框 - 一个用于收入，另一个用于性能数据。这些 DF 都有每日条目，并通过 ID 列链接。

性能数据框：

     RevID         Date       PartnerName        Performance        Revenue
     1,2           1/2/2015   Johndoe            0.02               0.00
     1             2/2/2015   Johndoe            0.12               0.00
     4             3/2/2015   Johndoe            0.10               0.00

请注意上一行中的'1,2'指的是需要加在一起的两个ID

收入数据框：

     RevID     Date      Revenue
     1         1/2/2015  24000.00
     2         1/2/2015  25000.00
     1         2/2/2015  10000.00
     4         3/2/2015  94000.00

我的问题是，考虑到有时性能 DF 中会有逗号分隔值（如数组）需要找到两个相应的收入行，我如何才能对这两行执行合并来自 Revenue DF 的一起 - 和日期。

例如，我将如何处理这个问题，以便最终 table 读取：

     RevID         Date       PartnerName        Performance        Revenue
     1,2           1/2/2015   Johndoe            0.02               49000.00
     1             2/2/2015   Johndoe            0.12               10000.00
     4             3/2/2015   Johndoe            0.10               94000.00

请注意，第一行的收入已与 RevID 1 和 2 的值相加。在这一点上，任何帮助都会很棒！

Answer 1

我只是欺骗了这些数据，然后逗号的问题就消失了：

In [11]: res = pd.concat([df.iloc[i] for val, i in g.groups.items() for v in val.split(',')], ignore_index=True)

In [12]: res['RevID'] = sum([val.split(',') for val in g.groups], [])

并确保 RevID 是数字而不是字符串：

In [13]: res['RevID'] = res['RevID'].convert_objects(convert_numeric=True)

In [14]: res
Out[14]:
  RevID      Date PartnerName  Performance  Revenue
0     1  2/2/2015     Johndoe         0.12        0
1     1  1/2/2015     Johndoe         0.02        0
2     2  1/2/2015     Johndoe         0.02        0
3     4  3/2/2015     Johndoe         0.10        0

这样你就可以合并并且你基本上就在那里：

In [21]: res.merge(df2, on=['RevID', 'Date'])
Out[21]:
   RevID      Date PartnerName  Performance  Revenue_x  Revenue_y
0      1  2/2/2015     Johndoe         0.12          0      10000
1      1  1/2/2015     Johndoe         0.02          0      24000
2      2  1/2/2015     Johndoe         0.02          0      25000
3      4  3/2/2015     Johndoe         0.10          0      94000

注意：您可能希望在合并前删除 0 Revenue 列（这样您就不需要指定 on）。

如果您想引用原始 ID（唯一的东西），那么您可以对其进行分组并对收入求和，以获得您想要的框架...

将数据框与数组合并？

Merging Dataframes with Arrays?

python

arrays

pandas