在 Pandas DataFrame 的 if-then-else 块中评估多个条件
Evaluating multiple conditions in if-then-else block in a Pandas DataFrame
我想通过评估 if-then-else 块中的多个条件在 Pandas DataFrame 中创建一个新列。
if events.hour <= 6:
events['time_slice'] = 'night'
elif events.hour <= 12:
events['time_slice'] = 'morning'
elif events.hour <= 18:
events['time_slice'] = 'afternoon'
elif events.hour <= 23:
events['time_slice'] = 'evening'
当我运行这个时,我得到以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
所以我尝试通过添加如下所示的任何语句来解决这个问题:
if (events.hour <= 6).any():
events['time_slice'] = 'night'
elif (events.hour <= 12).any():
events['time_slice'] = 'morning'
elif (events.hour <= 18).any():
events['time_slice'] = 'afternoon'
elif (events.hour <= 23).any():
events['time_slice'] = 'evening'
现在我没有收到任何错误,但是当我检查 time_slice 的唯一值时,它只显示 'night'
np.unique(events.time_slice)
array(['night'], dtype=object)
我该如何解决这个问题,因为我的数据包含应该得到 'morning'、'afternoon' 或 'evening' 的样本。谢谢!
您可以使用 pd.cut() 方法对您的数据进行分类:
演示:
In [66]: events = pd.DataFrame(np.random.randint(0, 23, 10), columns=['hour'])
In [67]: events
Out[67]:
hour
0 5
1 17
2 12
3 2
4 20
5 22
6 20
7 11
8 14
9 8
In [71]: events['time_slice'] = pd.cut(events.hour, bins=[-1, 6, 12, 18, 23], labels=['night','morning','afternoon','evening'])
In [72]: events
Out[72]:
hour time_slice
0 5 night
1 17 afternoon
2 12 morning
3 2 night
4 20 evening
5 22 evening
6 20 evening
7 11 morning
8 14 afternoon
9 8 morning
您可以创建一个函数:
def time_slice(hour):
if hour <= 6:
return 'night'
elif hour <= 12:
return 'morning'
elif hour <= 18:
return 'afternoon'
elif hour <= 23:
return 'evening'
那么 events['time_slice'] = events.hour.apply(time_slice)
就可以了。
这是一个 NumPy 方法 -
tags = ['night','morning','afternoon','evening']
events['time_slice'] = np.take(tags,((events.hour.values-1)//6).clip(min=0))
样本运行-
In [130]: events
Out[130]:
hour time_slice
0 0 night
1 8 morning
2 16 afternoon
3 20 evening
4 2 night
5 14 afternoon
6 7 morning
7 18 afternoon
8 8 morning
9 22 evening
我想通过评估 if-then-else 块中的多个条件在 Pandas DataFrame 中创建一个新列。
if events.hour <= 6:
events['time_slice'] = 'night'
elif events.hour <= 12:
events['time_slice'] = 'morning'
elif events.hour <= 18:
events['time_slice'] = 'afternoon'
elif events.hour <= 23:
events['time_slice'] = 'evening'
当我运行这个时,我得到以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
所以我尝试通过添加如下所示的任何语句来解决这个问题:
if (events.hour <= 6).any():
events['time_slice'] = 'night'
elif (events.hour <= 12).any():
events['time_slice'] = 'morning'
elif (events.hour <= 18).any():
events['time_slice'] = 'afternoon'
elif (events.hour <= 23).any():
events['time_slice'] = 'evening'
现在我没有收到任何错误,但是当我检查 time_slice 的唯一值时,它只显示 'night'
np.unique(events.time_slice)
array(['night'], dtype=object)
我该如何解决这个问题,因为我的数据包含应该得到 'morning'、'afternoon' 或 'evening' 的样本。谢谢!
您可以使用 pd.cut() 方法对您的数据进行分类:
演示:
In [66]: events = pd.DataFrame(np.random.randint(0, 23, 10), columns=['hour'])
In [67]: events
Out[67]:
hour
0 5
1 17
2 12
3 2
4 20
5 22
6 20
7 11
8 14
9 8
In [71]: events['time_slice'] = pd.cut(events.hour, bins=[-1, 6, 12, 18, 23], labels=['night','morning','afternoon','evening'])
In [72]: events
Out[72]:
hour time_slice
0 5 night
1 17 afternoon
2 12 morning
3 2 night
4 20 evening
5 22 evening
6 20 evening
7 11 morning
8 14 afternoon
9 8 morning
您可以创建一个函数:
def time_slice(hour):
if hour <= 6:
return 'night'
elif hour <= 12:
return 'morning'
elif hour <= 18:
return 'afternoon'
elif hour <= 23:
return 'evening'
那么 events['time_slice'] = events.hour.apply(time_slice)
就可以了。
这是一个 NumPy 方法 -
tags = ['night','morning','afternoon','evening']
events['time_slice'] = np.take(tags,((events.hour.values-1)//6).clip(min=0))
样本运行-
In [130]: events
Out[130]:
hour time_slice
0 0 night
1 8 morning
2 16 afternoon
3 20 evening
4 2 night
5 14 afternoon
6 7 morning
7 18 afternoon
8 8 morning
9 22 evening