Python3 pandas 数据框使用 fillna(method='bfill') 和 group by
Python3 pandas data frame using fillna(method='bfill') with group by
我是 python 和 pandas 的新手,并且遇到下面提到的请求
python pandas 数据框中的数据为
time_stamp dish_id table_no order_id
2017-10-05 22:11 122 A1
2017-10-05 22:14 127 A1
2017-10-05 22:17 129 A5
2017-10-05 22:19 122 A1 X_001
2017-10-05 22:17 129 A5 X_002
我正在用
填写缺失的订单值
output_sort[['new_order_id']] = output_sort[['order_id']].fillna(method='bfill')
这让我得到了
这样的结果
time_stamp dish_id table_no order_id
2017-10-05 22:11 122 A1 X_001
2017-10-05 22:14 127 A1 X_001
2017-10-05 22:17 129 A5 X_001
2017-10-05 22:19 122 A1 X_001
2017-10-05 22:17 129 A5 X_002
但是我想得到像
这样的结果
time_stamp dish_id table_no order_id
2017-10-05 22:11 122 A1 X_001
2017-10-05 22:14 127 A1 X_001
2017-10-05 22:17 129 A5 X_002
2017-10-05 22:19 122 A1 X_001
2017-10-05 22:17 129 A5 X_002
order_id与correct_table没有匹配
我还没找到办法
任何帮助将不胜感激
df.groupby('table_no')['order_id'].apply(lambda x :x.ffill().bfill())
Out[529]:
0 X_001
1 X_001
2 X_002
3 X_001
4 X_002
Name: order_id, dtype: object
df['order_id']=df.groupby('table_no')['order_id'].apply(lambda x :x.ffill().bfill())
df
Out[530]:
time_stamp dish_id table_no order_id
0 2017-10-0522:11 122 A1 X_001
1 2017-10-0522:14 127 A1 X_001
2 2017-10-0522:17 129 A5 X_002
3 2017-10-0522:19 122 A1 X_001
4 2017-10-0522:17 129 A5 X_002
df.assign(order_id=df.groupby('table_no').order_id.bfill())
time_stamp dish_id table_no order_id
0 2017-10-05 22:11 122 A1 X_001
1 2017-10-05 22:14 127 A1 X_001
2 2017-10-05 22:17 129 A5 X_002
3 2017-10-05 22:19 122 A1 X_001
4 2017-10-05 22:17 129 A5 X_002
虽然不像 bfill
那样地道,但 map
应该是一个很好的选择。
m = dict(df[['table_no', 'order_id']].dropna().values)
print(m)
{'A1': 'X_001', 'A5': 'X_002'}
df['order_id'] = df.table_no.map(m)
print(df)
time_stamp dish_id table_no order_id
0 2017-10-05 22:11 122 A1 X_001
1 2017-10-05 22:14 127 A1 X_001
2 2017-10-05 22:17 129 A5 X_002
3 2017-10-05 22:19 122 A1 X_001
4 2017-10-05 22:17 129 A5 X_002
您也可以使用 df.replace
:
df['order_id'] = df.table_no.replace(m)
print(df)
time_stamp dish_id table_no order_id
0 2017-10-05 22:11 122 A1 X_001
1 2017-10-05 22:14 127 A1 X_001
2 2017-10-05 22:17 129 A5 X_002
3 2017-10-05 22:19 122 A1 X_001
4 2017-10-05 22:17 129 A5 X_002
另一种生成 m
的方法是:
m = df[['table_no', 'order_id']].dropna().set_index('table_no').order_id
print(m)
table_no
A1 X_001
A5 X_002
Name: order_id, dtype: object
series_ = df.table_no.tolist()
def fill_():
order_id_ = []
if table_no == 'A1'
order_id_.append('X_001')
else:
order_id_.append('X_005')
return order_id_
df.order_no = list(map(fill_,series_))
我是 python 和 pandas 的新手,并且遇到下面提到的请求 python pandas 数据框中的数据为
time_stamp dish_id table_no order_id
2017-10-05 22:11 122 A1
2017-10-05 22:14 127 A1
2017-10-05 22:17 129 A5
2017-10-05 22:19 122 A1 X_001
2017-10-05 22:17 129 A5 X_002
我正在用
填写缺失的订单值output_sort[['new_order_id']] = output_sort[['order_id']].fillna(method='bfill')
这让我得到了
这样的结果time_stamp dish_id table_no order_id
2017-10-05 22:11 122 A1 X_001
2017-10-05 22:14 127 A1 X_001
2017-10-05 22:17 129 A5 X_001
2017-10-05 22:19 122 A1 X_001
2017-10-05 22:17 129 A5 X_002
但是我想得到像
这样的结果time_stamp dish_id table_no order_id
2017-10-05 22:11 122 A1 X_001
2017-10-05 22:14 127 A1 X_001
2017-10-05 22:17 129 A5 X_002
2017-10-05 22:19 122 A1 X_001
2017-10-05 22:17 129 A5 X_002
order_id与correct_table没有匹配 我还没找到办法 任何帮助将不胜感激
df.groupby('table_no')['order_id'].apply(lambda x :x.ffill().bfill())
Out[529]:
0 X_001
1 X_001
2 X_002
3 X_001
4 X_002
Name: order_id, dtype: object
df['order_id']=df.groupby('table_no')['order_id'].apply(lambda x :x.ffill().bfill())
df
Out[530]:
time_stamp dish_id table_no order_id
0 2017-10-0522:11 122 A1 X_001
1 2017-10-0522:14 127 A1 X_001
2 2017-10-0522:17 129 A5 X_002
3 2017-10-0522:19 122 A1 X_001
4 2017-10-0522:17 129 A5 X_002
df.assign(order_id=df.groupby('table_no').order_id.bfill())
time_stamp dish_id table_no order_id
0 2017-10-05 22:11 122 A1 X_001
1 2017-10-05 22:14 127 A1 X_001
2 2017-10-05 22:17 129 A5 X_002
3 2017-10-05 22:19 122 A1 X_001
4 2017-10-05 22:17 129 A5 X_002
虽然不像 bfill
那样地道,但 map
应该是一个很好的选择。
m = dict(df[['table_no', 'order_id']].dropna().values)
print(m)
{'A1': 'X_001', 'A5': 'X_002'}
df['order_id'] = df.table_no.map(m)
print(df)
time_stamp dish_id table_no order_id
0 2017-10-05 22:11 122 A1 X_001
1 2017-10-05 22:14 127 A1 X_001
2 2017-10-05 22:17 129 A5 X_002
3 2017-10-05 22:19 122 A1 X_001
4 2017-10-05 22:17 129 A5 X_002
您也可以使用 df.replace
:
df['order_id'] = df.table_no.replace(m)
print(df)
time_stamp dish_id table_no order_id
0 2017-10-05 22:11 122 A1 X_001
1 2017-10-05 22:14 127 A1 X_001
2 2017-10-05 22:17 129 A5 X_002
3 2017-10-05 22:19 122 A1 X_001
4 2017-10-05 22:17 129 A5 X_002
另一种生成 m
的方法是:
m = df[['table_no', 'order_id']].dropna().set_index('table_no').order_id
print(m)
table_no
A1 X_001
A5 X_002
Name: order_id, dtype: object
series_ = df.table_no.tolist()
def fill_():
order_id_ = []
if table_no == 'A1'
order_id_.append('X_001')
else:
order_id_.append('X_005')
return order_id_
df.order_no = list(map(fill_,series_))