将列添加到 Pandas 数据框时如何避免列和 DatetimeIndex 之间的混淆
How to avoid confusion between column and DatetimeIndex when adding column to Pandas dataframe
我有一个 pandas DataFrame
,其中 header 列是数字字符串,索引是 DatetimeIndex
。例如:
In:
df=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], index=pd.DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 00:05:00',
'2019-01-01 00:10:00']), columns=['010000','010001','010002'])
df
Out:
010000 010001 010002
2019-01-01 00:00:00 1 2 3
2019-01-01 00:05:00 4 5 6
2019-01-01 00:10:00 7 8 9
我使用
等方法成功地将列添加到数据框中
In:
df['010003'] = pd.Series([99,99,99], index= df.index)
df
Out:
010000 010001 010002 010003
2019-01-01 00:00:00 1 2 3 99
2019-01-01 00:05:00 4 5 6 99
2019-01-01 00:10:00 7 8 9 99
但是,如果列 header 可能被误认为日期,Pandas 会将其视为索引元素,尝试添加行而不是列,并引发异常:
In:
df['010119'] = pd.Series([99,99,99], index= df.index)
Out:
Traceback (most recent call last):
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3325, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-16-1f55509f2987>", line 1, in <module>
df['010119'] = pd.Series([99,99,99], index= df.index)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3362, in __setitem__
return self._setitem_slice(indexer, value)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3374, in _setitem_slice
self.loc._setitem_with_indexer(key, value)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 656, in _setitem_with_indexer
value=value)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 510, in setitem
return self.apply('setitem', **kwargs)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 395, in apply
applied = getattr(b, f)(**kwargs)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 920, in setitem
values[indexer] = value
ValueError: could not broadcast input array from shape (3) into shape (3,4)
为了避免这种混淆,我应该如何重写赋值以强制 Pandas 将数字字符串作为新列的 header?
我们为 insert
列
提供特定功能
df.insert(len(df.columns),column='010119',value=[99,99,99])
df
010000 010001 010002 010119
2019-01-01 00:00:00 1 2 3 99
2019-01-01 00:05:00 4 5 6 99
2019-01-01 00:10:00 7 8 9 99
我有一个 pandas DataFrame
,其中 header 列是数字字符串,索引是 DatetimeIndex
。例如:
In:
df=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], index=pd.DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 00:05:00',
'2019-01-01 00:10:00']), columns=['010000','010001','010002'])
df
Out:
010000 010001 010002
2019-01-01 00:00:00 1 2 3
2019-01-01 00:05:00 4 5 6
2019-01-01 00:10:00 7 8 9
我使用
等方法成功地将列添加到数据框中In:
df['010003'] = pd.Series([99,99,99], index= df.index)
df
Out:
010000 010001 010002 010003
2019-01-01 00:00:00 1 2 3 99
2019-01-01 00:05:00 4 5 6 99
2019-01-01 00:10:00 7 8 9 99
但是,如果列 header 可能被误认为日期,Pandas 会将其视为索引元素,尝试添加行而不是列,并引发异常:
In:
df['010119'] = pd.Series([99,99,99], index= df.index)
Out:
Traceback (most recent call last):
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3325, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-16-1f55509f2987>", line 1, in <module>
df['010119'] = pd.Series([99,99,99], index= df.index)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3362, in __setitem__
return self._setitem_slice(indexer, value)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3374, in _setitem_slice
self.loc._setitem_with_indexer(key, value)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 656, in _setitem_with_indexer
value=value)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 510, in setitem
return self.apply('setitem', **kwargs)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 395, in apply
applied = getattr(b, f)(**kwargs)
File "C:\Users\...\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 920, in setitem
values[indexer] = value
ValueError: could not broadcast input array from shape (3) into shape (3,4)
为了避免这种混淆,我应该如何重写赋值以强制 Pandas 将数字字符串作为新列的 header?
我们为 insert
列
df.insert(len(df.columns),column='010119',value=[99,99,99])
df
010000 010001 010002 010119
2019-01-01 00:00:00 1 2 3 99
2019-01-01 00:05:00 4 5 6 99
2019-01-01 00:10:00 7 8 9 99