pandas 带或不带广播的数据帧乘法

pandas dataframes multiplication with or without broadcasting

I have 2 dataframes:

>>> type(c)
Out[118]: pandas.core.frame.DataFrame
>>> type(N)
Out[119]: pandas.core.frame.DataFrame

>>> c
Out[114]: 
                       t
2017-06-01 01:06:00 1.00
2017-06-01 01:13:00 1.00
2017-06-01 02:09:00 1.00
2017-06-26 22:47:00 1.00

>>> N
Out[115]: 
                       0    1
2017-06-01 01:06:00 1.00 1.00
2017-06-01 01:13:00 1.00 1.00
2017-06-01 02:09:00 1.00 1.00
2017-06-26 22:47:00 1.00 1.00

我需要将它们相乘得到一个 4,2 数据帧,它是 N 的每一列元素与 C 的相乘。我尝试了以下 4 种方法但没有成功:

>>> N.multiply(c, axis='index')
Out[116]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c[:]*N
Out[98]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c*N
Out[99]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c[:, None]*N
Traceback (most recent call last):

  File "C:\...pandas\core\frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "C:\...core\frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "C:\...core\generic.py", line 1082, in _get_item_cache
    res = cache.get(item)
TypeError: unhashable type

有没有一种方法,无论有没有广播都能轻松做到这一点?

问题是您传递了一个 DataFrame,因此它也尝试匹配列名。如果对列 t 进行切片,它将成为一个 Series 并适当广播:

N.mul(c['t'], axis=0)
Out: 
                       0    1
2017-06-01 01:06:00  1.0  1.0
2017-06-01 01:13:00  1.0  1.0
2017-06-01 02:09:00  1.0  1.0
2017-06-26 22:47:00  1.0  1.0

对于 numpy 数组,您不需要指定任何内容。对于 (4, 2) 和 (4, 1) 的形状,numpy 将看到具有相同长度的轴并相应地广播。

考虑以下数据帧:

N
Out: 
                       0    1
2017-06-01 01:06:00  1.0  2.0
2017-06-01 01:13:00  6.0  5.0
2017-06-01 02:09:00  4.0  3.0
2017-06-26 22:47:00  4.0  7.0


c
Out: 
                       t
2017-06-01 01:06:00  6.0
2017-06-01 01:13:00  2.0
2017-06-01 02:09:00  8.0
2017-06-26 22:47:00  2.0

您可以使用 .values 属性访问底层数组,因此

N.values * c.values
Out: 
array([[  6.,  12.],
       [ 12.,  10.],
       [ 32.,  24.],
       [  8.,  14.]])

会得到与

相同的结果
N.mul(c['t'], axis=0)
Out: 
                        0     1
2017-06-01 01:06:00   6.0  12.0
2017-06-01 01:13:00  12.0  10.0
2017-06-01 02:09:00  32.0  24.0
2017-06-26 22:47:00   8.0  14.0

但是由于整个操作都是在 numpy 中进行的,因此您会丢失标签。