将级别(具有固定值)附加到 pandas Series/DataFrame
Appending a level (with fixed value) to pandas Series/DataFrame
我有一个 pandas 多索引系列,如下所示:
category_1 number
A 0 1.764052
1 0.400157
2 0.978738
3 2.240893
4 1.867558
C 0 -0.977278
1 0.950088
2 -0.151357
3 -0.103219
4 0.410599
它是从这段代码生成的:
import pandas as pd
import numpy as np
idx = pd.MultiIndex.from_product([['A','C'],range(5)], names=['category_1','number'])
np.random.seed(0)
s = pd.Series(index=idx, data = np.random.randn(len(idx)))
我想在具有固定值(即 D
)的索引中添加另一个名为 category_2
的级别,以获得以下结果:
category_1 category_2 number
A D 0 1.764052
1 0.400157
2 0.978738
3 2.240893
4 1.867558
C D 0 -0.977278
1 0.950088
2 -0.151357
3 -0.103219
4 0.410599
我一直在使用这种 hacky 的方式来做到这一点:
df =s.to_frame('dummy')
df['category_2'] = 'D'
df.set_index('category_2', append = True, inplace = True)
df = df.reorder_levels([0,2,1])
res = df['dummy']
有没有更好的(更多succinct/pythonic)方法来在pandas Series/DataFrame上的现有级别上添加具有固定值的级别?
您需要创建新的 MultiIndex
然后替换旧的:
#change multiindex
new_index = list(zip(s.index.get_level_values('category_1'),
['D'] * len(s.index),
s.index.get_level_values('number')))
print (new_index)
[('A', 'D', 0), ('A', 'D', 1),
('A', 'D', 2), ('A', 'D', 3),
('A', 'D', 4), ('C', 'D', 0),
('C', 'D', 1), ('C', 'D', 2),
('C', 'D', 3), ('C', 'D', 4)]
s.index = pd.MultiIndex.from_tuples(new_index,
names=['category_1','category_2','number'])
print (s)
category_1 category_2 number
A D 0 1.764052
1 0.400157
2 0.978738
3 2.240893
4 1.867558
C D 0 -0.977278
1 0.950088
2 -0.151357
3 -0.103219
4 0.410599
dtype: float64
MultiIndex.from_product
- a bit changed 的另一个不错的解决方案:
s.index = pd.MultiIndex.from_product([s.index.levels[0],
['D'],
s.index.levels[1]], names= ['c1','c2','number'])
print (s)
c1 c2 number
A D 0 1.764052
1 0.400157
2 0.978738
3 2.240893
4 1.867558
C D 0 -0.977278
1 0.950088
2 -0.151357
3 -0.103219
4 0.410599
dtype: float64
或:
s.index = pd.MultiIndex.from_product([s.index.get_level_values('category_1').unique(),
['D'],
s.index.get_level_values('number').unique()],
names= ['c1','c2','number'])
print (s)
c1 c2 number
A D 0 1.764052
1 0.400157
2 0.978738
3 2.240893
4 1.867558
C D 0 -0.977278
1 0.950088
2 -0.151357
3 -0.103219
4 0.410599
dtype: float64
我有一个 pandas 多索引系列,如下所示:
category_1 number
A 0 1.764052
1 0.400157
2 0.978738
3 2.240893
4 1.867558
C 0 -0.977278
1 0.950088
2 -0.151357
3 -0.103219
4 0.410599
它是从这段代码生成的:
import pandas as pd
import numpy as np
idx = pd.MultiIndex.from_product([['A','C'],range(5)], names=['category_1','number'])
np.random.seed(0)
s = pd.Series(index=idx, data = np.random.randn(len(idx)))
我想在具有固定值(即 D
)的索引中添加另一个名为 category_2
的级别,以获得以下结果:
category_1 category_2 number
A D 0 1.764052
1 0.400157
2 0.978738
3 2.240893
4 1.867558
C D 0 -0.977278
1 0.950088
2 -0.151357
3 -0.103219
4 0.410599
我一直在使用这种 hacky 的方式来做到这一点:
df =s.to_frame('dummy')
df['category_2'] = 'D'
df.set_index('category_2', append = True, inplace = True)
df = df.reorder_levels([0,2,1])
res = df['dummy']
有没有更好的(更多succinct/pythonic)方法来在pandas Series/DataFrame上的现有级别上添加具有固定值的级别?
您需要创建新的 MultiIndex
然后替换旧的:
#change multiindex
new_index = list(zip(s.index.get_level_values('category_1'),
['D'] * len(s.index),
s.index.get_level_values('number')))
print (new_index)
[('A', 'D', 0), ('A', 'D', 1),
('A', 'D', 2), ('A', 'D', 3),
('A', 'D', 4), ('C', 'D', 0),
('C', 'D', 1), ('C', 'D', 2),
('C', 'D', 3), ('C', 'D', 4)]
s.index = pd.MultiIndex.from_tuples(new_index,
names=['category_1','category_2','number'])
print (s)
category_1 category_2 number
A D 0 1.764052
1 0.400157
2 0.978738
3 2.240893
4 1.867558
C D 0 -0.977278
1 0.950088
2 -0.151357
3 -0.103219
4 0.410599
dtype: float64
MultiIndex.from_product
- a bit changed
s.index = pd.MultiIndex.from_product([s.index.levels[0],
['D'],
s.index.levels[1]], names= ['c1','c2','number'])
print (s)
c1 c2 number
A D 0 1.764052
1 0.400157
2 0.978738
3 2.240893
4 1.867558
C D 0 -0.977278
1 0.950088
2 -0.151357
3 -0.103219
4 0.410599
dtype: float64
或:
s.index = pd.MultiIndex.from_product([s.index.get_level_values('category_1').unique(),
['D'],
s.index.get_level_values('number').unique()],
names= ['c1','c2','number'])
print (s)
c1 c2 number
A D 0 1.764052
1 0.400157
2 0.978738
3 2.240893
4 1.867558
C D 0 -0.977278
1 0.950088
2 -0.151357
3 -0.103219
4 0.410599
dtype: float64