将新行添加到包含值和列表的 MultiIndex pandas DataFrame
Adding a new row to a MultiIndex pandas DataFrame with both values and lists
我有一个 MultiIndex DataFrame
:
predicted_y actual_y predicted_full actual_full
subj_id org_clip
123 3 2 5 [1, 2, 3] [4, 5, 6]
我想添加一个新行到:
predicted_y actual_y predicted_full actual_full
subj_id org_clip
123 3 2 5 [1, 2, 3] [4, 5, 6]
321 4 20 50 [10, 20, 30] [40, 50, 60] # add this row
下面的代码可以做到:
df.loc[('321', 4),['predicted_y', 'actual_y']] = [20, 50]
df.loc[('321', 4),['predicted_full', 'actual_full']] = [[10,20,30], [40,50,60]]
但是尝试在单行中添加新行时,出现错误:
df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] = [20, 50, [10,20,30], [40,50,60]]
>>> ValueError: setting an array element with a sequence.
备注:
我相信它与我尝试添加包含值和列表的行有关(可能是语法上的)。所有其他尝试都引发了同样的错误;请参阅以下示例:
df.loc[('321', 4),['predicted_y', 'actual_y', ['predicted_full', 'actual_full']]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', ['predicted_full'], ['actual_full']]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', [['predicted_full'], ['actual_full']]]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] = [20, 50, np.array([10,20,30]), np.array([40,50,60])]
构造初始DataFrame
的代码:
df = pd.DataFrame(index=pd.MultiIndex(levels=[[], []], labels=[[], []], names=['subj_id', 'org_clip']),
columns=['predicted_y', 'actual_y', 'predicted_full', 'actual_full'])
df.loc[('123', 3),['predicted_y', 'actual_y']] = [2, 5]
df.loc[('123', 3),['predicted_full', 'actual_full']] = [[1,2,3], [4,5,6]]
将至少一个子列表设为 dtype object
:
的数组
In [27]: df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] = (
[20, 50, np.array((10, 20, 30), dtype='O'), [40, 50, 60]])
In [28]: df
Out[28]:
predicted_y actual_y predicted_full actual_full
subj_id org_clip
123 3 2 5 [1, 2, 3] [4, 5, 6]
321 4 20 50 [10, 20, 30] [40, 50, 60]
注意错误
ValueError: setting an array element with a sequence.
出现在这一行:
--> 643 arr_value = np.array(value)
并且可以这样重现
In [12]: np.array([20, 50, [10, 20, 30], [40, 50, 60]])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-f6122275ab9f> in <module>()
----> 1 np.array([20, 50, [10, 20, 30], [40, 50, 60]])
ValueError: setting an array element with a sequence.
但是如果其中一个子列表是一个dtype对象数组,那么结果就是一个dtype对象数组:
In [16]: np.array((20, 50, np.array((10, 20, 30), dtype='O'), (40, 50, 60)))
Out[16]: array([20, 50, array([10, 20, 30], dtype=object), (40, 50, 60)], dtype=object)
这样就可以避免ValueError。
您可以让 pd.Series
处理 dtypes
row_to_append = pd.Series([20, 50, [10, 20, 30], [40, 50, 60]])
cols = ['predicted_y', 'actual_y', 'predicted_full', 'actual_full']
df.loc[(321, 4), cols] = row_to_append.values
df
我有一个 MultiIndex DataFrame
:
predicted_y actual_y predicted_full actual_full
subj_id org_clip
123 3 2 5 [1, 2, 3] [4, 5, 6]
我想添加一个新行到:
predicted_y actual_y predicted_full actual_full
subj_id org_clip
123 3 2 5 [1, 2, 3] [4, 5, 6]
321 4 20 50 [10, 20, 30] [40, 50, 60] # add this row
下面的代码可以做到:
df.loc[('321', 4),['predicted_y', 'actual_y']] = [20, 50]
df.loc[('321', 4),['predicted_full', 'actual_full']] = [[10,20,30], [40,50,60]]
但是尝试在单行中添加新行时,出现错误:
df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] = [20, 50, [10,20,30], [40,50,60]]
>>> ValueError: setting an array element with a sequence.
备注:
我相信它与我尝试添加包含值和列表的行有关(可能是语法上的)。所有其他尝试都引发了同样的错误;请参阅以下示例:
df.loc[('321', 4),['predicted_y', 'actual_y', ['predicted_full', 'actual_full']]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', ['predicted_full'], ['actual_full']]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', [['predicted_full'], ['actual_full']]]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] = [20, 50, np.array([10,20,30]), np.array([40,50,60])]
构造初始DataFrame
的代码:
df = pd.DataFrame(index=pd.MultiIndex(levels=[[], []], labels=[[], []], names=['subj_id', 'org_clip']),
columns=['predicted_y', 'actual_y', 'predicted_full', 'actual_full'])
df.loc[('123', 3),['predicted_y', 'actual_y']] = [2, 5]
df.loc[('123', 3),['predicted_full', 'actual_full']] = [[1,2,3], [4,5,6]]
将至少一个子列表设为 dtype object
:
In [27]: df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] = (
[20, 50, np.array((10, 20, 30), dtype='O'), [40, 50, 60]])
In [28]: df
Out[28]:
predicted_y actual_y predicted_full actual_full
subj_id org_clip
123 3 2 5 [1, 2, 3] [4, 5, 6]
321 4 20 50 [10, 20, 30] [40, 50, 60]
注意错误
ValueError: setting an array element with a sequence.
出现在这一行:
--> 643 arr_value = np.array(value)
并且可以这样重现
In [12]: np.array([20, 50, [10, 20, 30], [40, 50, 60]])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-f6122275ab9f> in <module>()
----> 1 np.array([20, 50, [10, 20, 30], [40, 50, 60]])
ValueError: setting an array element with a sequence.
但是如果其中一个子列表是一个dtype对象数组,那么结果就是一个dtype对象数组:
In [16]: np.array((20, 50, np.array((10, 20, 30), dtype='O'), (40, 50, 60)))
Out[16]: array([20, 50, array([10, 20, 30], dtype=object), (40, 50, 60)], dtype=object)
这样就可以避免ValueError。
您可以让 pd.Series
处理 dtypes
row_to_append = pd.Series([20, 50, [10, 20, 30], [40, 50, 60]])
cols = ['predicted_y', 'actual_y', 'predicted_full', 'actual_full']
df.loc[(321, 4), cols] = row_to_append.values
df