根据另外三个相加的结果在数据框中创建一个新列
Creating a new column in a dataframe based on the result of the addition of three others
我生成了以下代码:
data['Customer_segment'] = np.where(((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=5,1),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>5 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=8,2),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>8 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=11,3),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>11 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=14,4),5)
我收到以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
非常感谢帮助达成最佳解决方案,我觉得我正在尝试做的可能不是最佳解决方案。
输入示例如下:
MOVC % segment order_size_seg Order frequency segment
1 2 3
5 2 1
5 5 5
我正在尝试根据对每一行求和的结果添加一列,如下所示:
如果 3-5 那么 1
如果 6-8 那么 2
如果 9-11 则 3
如果 12-14 那么 4
如果 15+ 则 5
真的很有帮助
query
方法怎么样?它似乎具有非常强大的语法:
import pandas as pd
d = pd.DataFrame([[1,2,3],[5,2,1],[5,5,5]], columns=['M','O','F'])
d.query("5 < M+O+F < 8")
Out[4]:
M O F
1 5 2 1
我认为你需要多个 np.where
一个 numpy.select
:
#only once sum values
a = data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment']
#conditions with ()
m1 = a<=5
m2 = (a>5) & (a<=8)
m3 = (a>8) & (a<=11)
m4 = (a>11) & (a<=14)
data['Customer_segment'] = np.select([m1, m2, m3, m4],[1,2,3,4], default=5)
另一个解决方案是使用 cut
:
bins = [-np.inf,5,8,11,14, np.inf]
labels = [1,2,3,4,5]
data['Customer_segment'] = pd.cut(df['B'], bins=bins, labels=labels)
试试怎么样pd.cut
df = pd.DataFrame([[1,2,3],[5,2,1],[5,5,5]], columns=['M','O','F'])
pd.cut(df.T.sum(),[5, 8, 11, 14,np.inf],labels=[1,2,3,4])
Out[1180]:
0 1
1 1
2 4
dtype: category
我生成了以下代码:
data['Customer_segment'] = np.where(((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=5,1),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>5 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=8,2),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>8 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=11,3),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>11 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=14,4),5)
我收到以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
非常感谢帮助达成最佳解决方案,我觉得我正在尝试做的可能不是最佳解决方案。
输入示例如下:
MOVC % segment order_size_seg Order frequency segment
1 2 3
5 2 1
5 5 5
我正在尝试根据对每一行求和的结果添加一列,如下所示:
如果 3-5 那么 1 如果 6-8 那么 2 如果 9-11 则 3 如果 12-14 那么 4 如果 15+ 则 5
真的很有帮助
query
方法怎么样?它似乎具有非常强大的语法:
import pandas as pd
d = pd.DataFrame([[1,2,3],[5,2,1],[5,5,5]], columns=['M','O','F'])
d.query("5 < M+O+F < 8")
Out[4]:
M O F
1 5 2 1
我认为你需要多个 np.where
一个 numpy.select
:
#only once sum values
a = data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment']
#conditions with ()
m1 = a<=5
m2 = (a>5) & (a<=8)
m3 = (a>8) & (a<=11)
m4 = (a>11) & (a<=14)
data['Customer_segment'] = np.select([m1, m2, m3, m4],[1,2,3,4], default=5)
另一个解决方案是使用 cut
:
bins = [-np.inf,5,8,11,14, np.inf]
labels = [1,2,3,4,5]
data['Customer_segment'] = pd.cut(df['B'], bins=bins, labels=labels)
试试怎么样pd.cut
df = pd.DataFrame([[1,2,3],[5,2,1],[5,5,5]], columns=['M','O','F'])
pd.cut(df.T.sum(),[5, 8, 11, 14,np.inf],labels=[1,2,3,4])
Out[1180]:
0 1
1 1
2 4
dtype: category