Pandas 相关错误 - 小数和浮点类型不匹配

Pandas correlation error - decimal and float type mismatch

此问题已有been raised here,但尚未得到解答。我在此线程中提供了更多详细信息,希望能使果汁流淌。

我有一个包含时间序列数据的 pandas 数据框 master_frame

     SUBMIT_DATE   CRUX_VOL        CRUX_RATE
0     2016-02-01   76.38733173161  0.02832710529
1     2016-01-31   76.68984699154  0.02720243998
2     2016-01-30   75.59094829615  0.02720243998
3     2016-01-29   75.91758975956  0.02720243998
4     2016-01-28   76.31809997200  0.02671927211
...          ...   ...            ...

我想要 CRUX_VOLCRUX_RATE 列之间的相关性。都是小数类型:

ln[3]: print type(master_frame["CRUX_VOL"][0]), type(master_frame["CRUX_RATE"][0])
out[3]: <class 'decimal.Decimal'> <class 'decimal.Decimal'>

当我使用 corr 函数时,出现与输入类型相关的严重错误。

print master_frame['CRUX_VOL'].corr(master_frame['CRUX_RATE'])

Traceback (most recent call last):
  File "U:/Programming/VolPathReport/VolPath.py", line 52, in <module>
    print master_frame['CRUX_VOL'].corr(master_frame['CRUX_RATE'])
  File "C:\Anaconda2\lib\site-packages\pandas\core\series.py", line 1312, in corr
    min_periods=min_periods)
  File "C:\Anaconda2\lib\site-packages\pandas\core\nanops.py", line 47, in _f
    return f(*args, **kwargs)
  File "C:\Anaconda2\lib\site-packages\pandas\core\nanops.py", line 644, in nancorr
    return f(a, b)
  File "C:\Anaconda2\lib\site-packages\pandas\core\nanops.py", line 652, in _pearson
    return np.corrcoef(a, b)[0, 1]
  File "C:\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 2145, in corrcoef
    c = cov(x, y, rowvar)
  File "C:\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 2065, in cov
    avg, w_sum = average(X, axis=1, weights=w, returned=True)
  File "C:\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 599, in average
    scl = np.multiply(avg, 0) + scl
TypeError: unsupported operand type(s) for +: 'Decimal' and 'float'

我弄乱了类型,无法使它正常工作。帮助我,互联网的向导们!

错误信息的最后一行指向

np.multiply(avg, 0) + scl

作为

的原因
TypeError: unsupported operand type(s) for +: 'Decimal' and 'float'

我不认为 numpyDecimal 类型,所以 np.multiply returns float,然后不与 [= 协作15=] 在使用 + 运算符时。由于 pandas 依赖于 numpy,因此最好使用

DataFrame 转换为 float dtype
master_frame.loc[:, ['CRUX_VOL', 'CRUX_RATE']].astype(float)

master_frame.convert_objects(convert_numeric=True)