python 如何使用字符串值进行自定义排序?
python how to use string value for custom sort?
我有这样的数据帧
time_posted
0 5 days ago
1 an hour ago
2 a day ago
3 6 hours ago
4 4 hours ago
我尝试了这个 df.sort_values(by='time_posted',ascending=True)
并得到了这个结果:
time_posted
4 4 hours ago
0 5 days ago
3 6 hours ago
2 a day ago
1 an hour ago
但我想通过 hours ago
做空,所以我的 datframe 看起来像这样
time_posted
1 an hour ago
4 4 hours ago
3 6 hours ago
2 a day ago
0 5 days ago
一个答案可能如下
设置示例数据
import pandas as pd
#your dataframe
df = pd.DataFrame(dict(time_posted=['5 days ago', 'an hour ago', 'a day ago', '6 hours ago', '4 hours ago']))
转换函数
您必须拆分字符串并决定不同的值(这里 x[0]
一个值,x[1]
一个单位)
def to_hours(s):
x = s.split(' ')
if x[0].lower() in ['a','an']:
a = 1
else:
a = float(x[0])
x1 = x[1].lower()
b = 1 # 1 hour
if x[1].startswith('day'):
b = b*24 # 1 day = 24 hours
return a*b
申请
df['hours'] = df.time_posted.apply(to_hours) # apply hours conversion
df = df.sort_values('hours',ascending=True)[['time_posted']]# Sort and skip non-necessary col
print(df)
输出:
time_posted
1 an hour ago
4 4 hours ago
3 6 hours ago
2 a day ago
0 5 days ago
如果删除“ago”并将“a/an”替换为 1,则可以将值提供给 pandas.to_timedelta
:
(pd.to_timedelta(df['time_posted']
.str.replace(r'\ban?\b', '1', regex=True)
.str.replace(' ago', '', regex=False))
)
输出:
0 5 days 00:00:00
1 0 days 01:00:00
2 1 days 00:00:00
3 0 days 06:00:00
4 0 days 04:00:00
Name: time_posted, dtype: timedelta64[ns]
这使您能够获得排序顺序:
idx = (pd.to_timedelta(df['time_posted']
.str.replace(r'\ban?\b', '1', regex=True)
.str.replace(' ago', '', regex=False))
.sort_values()
.index
)
df.loc[idx]
输出:
time_posted
1 an hour ago
4 4 hours ago
3 6 hours ago
2 a day ago
0 5 days ago
我有这样的数据帧
time_posted
0 5 days ago
1 an hour ago
2 a day ago
3 6 hours ago
4 4 hours ago
我尝试了这个 df.sort_values(by='time_posted',ascending=True)
并得到了这个结果:
time_posted
4 4 hours ago
0 5 days ago
3 6 hours ago
2 a day ago
1 an hour ago
但我想通过 hours ago
做空,所以我的 datframe 看起来像这样
time_posted
1 an hour ago
4 4 hours ago
3 6 hours ago
2 a day ago
0 5 days ago
一个答案可能如下
设置示例数据
import pandas as pd
#your dataframe
df = pd.DataFrame(dict(time_posted=['5 days ago', 'an hour ago', 'a day ago', '6 hours ago', '4 hours ago']))
转换函数
您必须拆分字符串并决定不同的值(这里 x[0]
一个值,x[1]
一个单位)
def to_hours(s):
x = s.split(' ')
if x[0].lower() in ['a','an']:
a = 1
else:
a = float(x[0])
x1 = x[1].lower()
b = 1 # 1 hour
if x[1].startswith('day'):
b = b*24 # 1 day = 24 hours
return a*b
申请
df['hours'] = df.time_posted.apply(to_hours) # apply hours conversion
df = df.sort_values('hours',ascending=True)[['time_posted']]# Sort and skip non-necessary col
print(df)
输出:
time_posted
1 an hour ago
4 4 hours ago
3 6 hours ago
2 a day ago
0 5 days ago
如果删除“ago”并将“a/an”替换为 1,则可以将值提供给 pandas.to_timedelta
:
(pd.to_timedelta(df['time_posted']
.str.replace(r'\ban?\b', '1', regex=True)
.str.replace(' ago', '', regex=False))
)
输出:
0 5 days 00:00:00
1 0 days 01:00:00
2 1 days 00:00:00
3 0 days 06:00:00
4 0 days 04:00:00
Name: time_posted, dtype: timedelta64[ns]
这使您能够获得排序顺序:
idx = (pd.to_timedelta(df['time_posted']
.str.replace(r'\ban?\b', '1', regex=True)
.str.replace(' ago', '', regex=False))
.sort_values()
.index
)
df.loc[idx]
输出:
time_posted
1 an hour ago
4 4 hours ago
3 6 hours ago
2 a day ago
0 5 days ago