Pandas 0.15.2 MultiIndex 对比 0.14.1(datetime.date 对比 pandas.tslib.Timestamp)
Pandas 0.15.2 MultiIndex vs. 0.14.1 (datetime.date vs. pandas.tslib.Timestamp)
从 Pandas 0.14.1 升级到 0.15.2 时,我的代码出现中断,我将其追溯到现在返回 pandas.tslib.Timestamp 的 MultiIndex 分配,而之前是 datetime.date.
有没有人遇到过类似的事情?这是一个理想的功能,还是 0.15.2 中的错误?有任何推荐的修复方法吗?
i = [dt.date(2015,1,1), dt.date(2015,1,2), dt.date(2015,1,3)]
idx = pd.MultiIndex.from_product([['a', 'b'], i])
>>> idx
MultiIndex(levels=[[u'a', u'b'], [2015-01-01 00:00:00, 2015-01-02 00:00:00, 2015-01-03 00:00:00]],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])
>>> type(idx[0][1])
pandas.tslib.Timestamp
>>> idx.levels[1]
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-11-23, ..., 2015-03-06]
Length: 834, Freq: None, Timezone: None
>>> type(idx.levels[1][0])
Out[29]: pandas.tslib.Timestamp
我在 运行 这个语句时得到以下错误:
df2.merge(df, left_on=['identifier', 'date'],
right_index=True,
how='left',
suffixes=['', '_dup'])
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/core/frame.py", line 3919, in merge
suffixes=suffixes, copy=copy)
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 39, in merge
return op.get_result()
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 187, in get_result
join_index, left_indexer, right_indexer = self._get_join_info()
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 264, in _get_join_info
sort=self.sort)
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 582, in _left_join_on_index
_get_multiindex_indexer(join_keys, right_ax, sort=sort)
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 542, in _get_multiindex_indexer
llab, rlab, count = _factorize_keys(level, key, sort=False)
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 622, in _factorize_keys
llab = rizer.factorize(lk)
TypeError: Argument 'values' has incorrect type (expected numpy.ndarray, got Index)
这是索引构建中的错误,请参阅 here
这是一个如何使用实际 datetime.date
对象的示例
In [8]: pd.MultiIndex.from_arrays([Index([datetime.date(2013,1,1)]),['a']])
Out[8]:
MultiIndex(levels=[[2013-01-01], [u'a']],
labels=[[0], [0]])
请记住 datetime.date
实际上是第二 class 公民,因此将被表示为 object
dtypes,因此效率不高。你通常应该简单地使用 Timestamps
.
从 Pandas 0.14.1 升级到 0.15.2 时,我的代码出现中断,我将其追溯到现在返回 pandas.tslib.Timestamp 的 MultiIndex 分配,而之前是 datetime.date.
有没有人遇到过类似的事情?这是一个理想的功能,还是 0.15.2 中的错误?有任何推荐的修复方法吗?
i = [dt.date(2015,1,1), dt.date(2015,1,2), dt.date(2015,1,3)]
idx = pd.MultiIndex.from_product([['a', 'b'], i])
>>> idx
MultiIndex(levels=[[u'a', u'b'], [2015-01-01 00:00:00, 2015-01-02 00:00:00, 2015-01-03 00:00:00]],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])
>>> type(idx[0][1])
pandas.tslib.Timestamp
>>> idx.levels[1]
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-11-23, ..., 2015-03-06]
Length: 834, Freq: None, Timezone: None
>>> type(idx.levels[1][0])
Out[29]: pandas.tslib.Timestamp
我在 运行 这个语句时得到以下错误:
df2.merge(df, left_on=['identifier', 'date'],
right_index=True,
how='left',
suffixes=['', '_dup'])
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/core/frame.py", line 3919, in merge
suffixes=suffixes, copy=copy)
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 39, in merge
return op.get_result()
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 187, in get_result
join_index, left_indexer, right_indexer = self._get_join_info()
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 264, in _get_join_info
sort=self.sort)
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 582, in _left_join_on_index
_get_multiindex_indexer(join_keys, right_ax, sort=sort)
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 542, in _get_multiindex_indexer
llab, rlab, count = _factorize_keys(level, key, sort=False)
File "/Users/user4589964/anaconda/envs/madrone_dev/lib/python2.7/site-packages/pandas/tools/merge.py", line 622, in _factorize_keys
llab = rizer.factorize(lk)
TypeError: Argument 'values' has incorrect type (expected numpy.ndarray, got Index)
这是索引构建中的错误,请参阅 here
这是一个如何使用实际 datetime.date
对象的示例
In [8]: pd.MultiIndex.from_arrays([Index([datetime.date(2013,1,1)]),['a']])
Out[8]:
MultiIndex(levels=[[2013-01-01], [u'a']],
labels=[[0], [0]])
请记住 datetime.date
实际上是第二 class 公民,因此将被表示为 object
dtypes,因此效率不高。你通常应该简单地使用 Timestamps
.