Python pandas 多索引列
Python pandas Multindex column
首先,我在 jupyter notebook 中使用 python 3.50。
我想创建一个 DataFrame 来在报表中显示一些数据。我希望它有两个索引列(如果引用它的术语不正确,请原谅。我不习惯使用 pandas)。
我有这个有效的示例代码:
frame = pd.DataFrame(np.arange(12).reshape(( 4, 3)),
index =[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
columns =[['Ohio', 'Ohio', 'Ohio'], ['Green', 'Red', 'Green']])
但是当我尝试将其用于我的案例时,它给了我一个错误:
cell_rise_Inv= pd.DataFrame([[0.00483211, 0.00511619, 0.00891821, 0.0449637, 0.205753],
[0.00520049, 0.00561577, 0.010993, 0.0468998, 0.207461],
[0.00357213, 0.00429087, 0.0132186, 0.0536389, 0.21384],
[-0.0021868, -0.0011312, 0.0120546, 0.0647213, 0.224749],
[-0.0725403, -0.0700884, -0.0382486, 0.0899121, 0.313639]],
index =[['transition [ns]','transition [ns]','transition [ns]','transition [ns]','transition [ns]'],
[0.0005, 0.001, 0.01, 0.1, 0.5]],
columns =[[0.01, 0.02, 0.05, 0.1, 0.5],['capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]']])
cell_rise_Inv
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-89-180a1ad88403> in <module>()
6 index =[['transition [ns]','transition [ns]','transition [ns]','transition [ns]','transition [ns]'],
7 [0.0005, 0.001, 0.01, 0.1, 0.5]],
----> 8 columns =[[0.01, 0.02, 0.05, 0.1, 0.5],['capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]']])
9 cell_rise_Inv
C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
261 if com.is_named_tuple(data[0]) and columns is None:
262 columns = data[0]._fields
--> 263 arrays, columns = _to_arrays(data, columns, dtype=dtype)
264 columns = _ensure_index(columns)
265
C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in _to_arrays(data, columns, coerce_float, dtype)
5350 if isinstance(data[0], (list, tuple)):
5351 return _list_to_arrays(data, columns, coerce_float=coerce_float,
-> 5352 dtype=dtype)
5353 elif isinstance(data[0], collections.Mapping):
5354 return _list_of_dict_to_arrays(data, columns,
C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in _list_to_arrays(data, columns, coerce_float, dtype)
5429 content = list(lib.to_object_array(data).T)
5430 return _convert_object_array(content, columns, dtype=dtype,
-> 5431 coerce_float=coerce_float)
5432
5433
C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in _convert_object_array(content, columns, coerce_float, dtype)
5487 # caller's responsibility to check for this...
5488 raise AssertionError('%d columns passed, passed data had %s '
-> 5489 'columns' % (len(columns), len(content)))
5490
5491 # provide soft conversion of object dtypes
AssertionError: 2 columns passed, passed data had 5 columns
有什么想法吗?我不明白为什么这个例子有效而我的不这样做。 :S
提前谢谢你:)。
看起来确实不一致。我会使用 pd.MultiIndex
构造函数 from_arrays
idx = pd.MultiIndex.from_arrays([['transition [ns]'] * 5,
[0.0005, 0.001, 0.01, 0.1, 0.5]])
col = pd.MultiIndex.from_arrays([[0.01, 0.02, 0.05, 0.1, 0.5],
['capacitance [pF]'] * 5])
cell_rise_Inv= pd.DataFrame([[0.00483211, 0.00511619, 0.00891821, 0.0449637, 0.205753],
[0.00520049, 0.00561577, 0.010993, 0.0468998, 0.207461],
[0.00357213, 0.00429087, 0.0132186, 0.0536389, 0.21384],
[-0.0021868, -0.0011312, 0.0120546, 0.0647213, 0.224749],
[-0.0725403, -0.0700884, -0.0382486, 0.0899121, 0.313639]],
index=idx,
columns=col)
cell_rise_Inv
您的代码与示例之间存在一个主要区别:该示例传递一个 numpy
数组作为输入,而不是嵌套列表。事实上,在您的列表周围添加 np.array(...)
效果很好:
cell_rise_Inv= pd.DataFrame(
np.array([[0.00483211, 0.00511619, 0.00891821, 0.0449637, 0.205753],
[0.00520049, 0.00561577, 0.010993, 0.0468998, 0.207461],
[0.00357213, 0.00429087, 0.0132186, 0.0536389, 0.21384],
[-0.0021868, -0.0011312, 0.0120546, 0.0647213, 0.224749],
[-0.0725403, -0.0700884, -0.0382486, 0.0899121, 0.313639]]),
index=[['transition [ns]'] * 5,
[0.0005, 0.001, 0.01, 0.1, 0.5]],
columns=[['capacitance [pF]'] * 5,
[0.01, 0.02, 0.05, 0.1, 0.5]])
我缩短了索引中重复的字符串并交换了索引级别的顺序,但这些都不是重大更改。
编辑
做了一点调查。如果你传入一个嵌套列表(没有 np.array
调用),调用将在没有 columns
的情况下工作,即使 columns
是一维列表。出于某种原因,两个元素的嵌套列表不会被解释为多重索引,除非输入是 ndarray
.
我根据这个问题用 pandas 提交了 issue #14467。
首先,我在 jupyter notebook 中使用 python 3.50。
我想创建一个 DataFrame 来在报表中显示一些数据。我希望它有两个索引列(如果引用它的术语不正确,请原谅。我不习惯使用 pandas)。
我有这个有效的示例代码:
frame = pd.DataFrame(np.arange(12).reshape(( 4, 3)),
index =[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
columns =[['Ohio', 'Ohio', 'Ohio'], ['Green', 'Red', 'Green']])
但是当我尝试将其用于我的案例时,它给了我一个错误:
cell_rise_Inv= pd.DataFrame([[0.00483211, 0.00511619, 0.00891821, 0.0449637, 0.205753],
[0.00520049, 0.00561577, 0.010993, 0.0468998, 0.207461],
[0.00357213, 0.00429087, 0.0132186, 0.0536389, 0.21384],
[-0.0021868, -0.0011312, 0.0120546, 0.0647213, 0.224749],
[-0.0725403, -0.0700884, -0.0382486, 0.0899121, 0.313639]],
index =[['transition [ns]','transition [ns]','transition [ns]','transition [ns]','transition [ns]'],
[0.0005, 0.001, 0.01, 0.1, 0.5]],
columns =[[0.01, 0.02, 0.05, 0.1, 0.5],['capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]']])
cell_rise_Inv
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-89-180a1ad88403> in <module>()
6 index =[['transition [ns]','transition [ns]','transition [ns]','transition [ns]','transition [ns]'],
7 [0.0005, 0.001, 0.01, 0.1, 0.5]],
----> 8 columns =[[0.01, 0.02, 0.05, 0.1, 0.5],['capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]','capacitance [pF]']])
9 cell_rise_Inv
C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
261 if com.is_named_tuple(data[0]) and columns is None:
262 columns = data[0]._fields
--> 263 arrays, columns = _to_arrays(data, columns, dtype=dtype)
264 columns = _ensure_index(columns)
265
C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in _to_arrays(data, columns, coerce_float, dtype)
5350 if isinstance(data[0], (list, tuple)):
5351 return _list_to_arrays(data, columns, coerce_float=coerce_float,
-> 5352 dtype=dtype)
5353 elif isinstance(data[0], collections.Mapping):
5354 return _list_of_dict_to_arrays(data, columns,
C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in _list_to_arrays(data, columns, coerce_float, dtype)
5429 content = list(lib.to_object_array(data).T)
5430 return _convert_object_array(content, columns, dtype=dtype,
-> 5431 coerce_float=coerce_float)
5432
5433
C:\Users\Josele\Anaconda3\lib\site-packages\pandas\core\frame.py in _convert_object_array(content, columns, coerce_float, dtype)
5487 # caller's responsibility to check for this...
5488 raise AssertionError('%d columns passed, passed data had %s '
-> 5489 'columns' % (len(columns), len(content)))
5490
5491 # provide soft conversion of object dtypes
AssertionError: 2 columns passed, passed data had 5 columns
有什么想法吗?我不明白为什么这个例子有效而我的不这样做。 :S
提前谢谢你:)。
看起来确实不一致。我会使用 pd.MultiIndex
构造函数 from_arrays
idx = pd.MultiIndex.from_arrays([['transition [ns]'] * 5,
[0.0005, 0.001, 0.01, 0.1, 0.5]])
col = pd.MultiIndex.from_arrays([[0.01, 0.02, 0.05, 0.1, 0.5],
['capacitance [pF]'] * 5])
cell_rise_Inv= pd.DataFrame([[0.00483211, 0.00511619, 0.00891821, 0.0449637, 0.205753],
[0.00520049, 0.00561577, 0.010993, 0.0468998, 0.207461],
[0.00357213, 0.00429087, 0.0132186, 0.0536389, 0.21384],
[-0.0021868, -0.0011312, 0.0120546, 0.0647213, 0.224749],
[-0.0725403, -0.0700884, -0.0382486, 0.0899121, 0.313639]],
index=idx,
columns=col)
cell_rise_Inv
您的代码与示例之间存在一个主要区别:该示例传递一个 numpy
数组作为输入,而不是嵌套列表。事实上,在您的列表周围添加 np.array(...)
效果很好:
cell_rise_Inv= pd.DataFrame( np.array([[0.00483211, 0.00511619, 0.00891821, 0.0449637, 0.205753], [0.00520049, 0.00561577, 0.010993, 0.0468998, 0.207461], [0.00357213, 0.00429087, 0.0132186, 0.0536389, 0.21384], [-0.0021868, -0.0011312, 0.0120546, 0.0647213, 0.224749], [-0.0725403, -0.0700884, -0.0382486, 0.0899121, 0.313639]]), index=[['transition [ns]'] * 5, [0.0005, 0.001, 0.01, 0.1, 0.5]], columns=[['capacitance [pF]'] * 5, [0.01, 0.02, 0.05, 0.1, 0.5]])
我缩短了索引中重复的字符串并交换了索引级别的顺序,但这些都不是重大更改。
编辑
做了一点调查。如果你传入一个嵌套列表(没有 np.array
调用),调用将在没有 columns
的情况下工作,即使 columns
是一维列表。出于某种原因,两个元素的嵌套列表不会被解释为多重索引,除非输入是 ndarray
.
我根据这个问题用 pandas 提交了 issue #14467。