随着时间的推移计算第二天到达的物品数量
Counting how many items arrive the next day over time
我有问题。我想计算第二天有多少商品到达了客户。这意味着例如我有 customerId == 1
的客户,我想查看当天 2022-05-04
以了解第二天有多少包裹到达。第二天将是 2022-05-05
。如果我们为客户将这两天加在一起,我们得到 2。
最后一个日期不应有值,例如 2022-05-08 == None
.
我已经试着计算下一个日期了。但是如何统计和计算第二天到达了多少件商品?
数据框:
customerId fromDate
0 1 2022-05-04
1 1 2022-05-05
2 1 2022-05-05
3 1 2022-05-06
4 1 2022-05-08
5 2 2022-05-10
6 2 2022-05-12
代码:
import pandas as pd
import datetime
d = {'customerId': [1, 1, 1, 1, 1, 2, 2],
'fromDate': ['2022-05-04', '2022-05-05', '2022-05-05', '2022-05-06', '2022-05-08', '2022-05-10', '2022-05-12']
}
df = pd.DataFrame(data=d)
def nearest(items, pivot):
try:
return min(items, key=lambda x: abs(x - pivot))
except:
return None
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce').dt.date
df["count_next_date"] = df['fromDate'].apply(lambda x: nearest(df['fromDate'], x))
[OUT]
customerId fromDate count_next
0 1 2022-05-04 2022-05-04
1 1 2022-05-05 2022-05-05
2 1 2022-05-05 2022-05-05
3 1 2022-05-07 2022-05-07
4 2 2022-05-10 2022-05-10
5 2 2022-05-12 2022-05-12
我想要的:
customerId fromDate count_next
0 1 2022-05-04 2
1 1 2022-05-05 1
2 1 2022-05-05 1
3 1 2022-05-06 0
4 1 2022-05-08 None
5 2 2022-05-10 0
6 2 2022-05-12 None
注释代码
# Convert the column to datetime
df['fromDate'] = pd.to_datetime(df['fromDate'])
# Group by custid and prev date to calculate
# number of items arriving next day
date = df['fromDate'] - pd.DateOffset(days=1)
items = df.groupby(['customerId', date], as_index=False).size()
# Merge the item count with original df
out = df.merge(items, how='left')
# Fill the nan values with 0
out['size'] = out['size'].fillna(0)
# mask the item count corresponding to last date for each customerid
out['size'] = out['size'].mask(~out['customerId'].duplicated(keep='last'))
结果
print(out)
customerId fromDate size
0 1 2022-05-04 2.0
1 1 2022-05-05 1.0
2 1 2022-05-05 1.0
3 1 2022-05-06 0.0
4 1 2022-05-08 NaN
5 2 2022-05-10 0.0
6 2 2022-05-12 NaN
我有问题。我想计算第二天有多少商品到达了客户。这意味着例如我有 customerId == 1
的客户,我想查看当天 2022-05-04
以了解第二天有多少包裹到达。第二天将是 2022-05-05
。如果我们为客户将这两天加在一起,我们得到 2。
最后一个日期不应有值,例如 2022-05-08 == None
.
我已经试着计算下一个日期了。但是如何统计和计算第二天到达了多少件商品?
数据框:
customerId fromDate
0 1 2022-05-04
1 1 2022-05-05
2 1 2022-05-05
3 1 2022-05-06
4 1 2022-05-08
5 2 2022-05-10
6 2 2022-05-12
代码:
import pandas as pd
import datetime
d = {'customerId': [1, 1, 1, 1, 1, 2, 2],
'fromDate': ['2022-05-04', '2022-05-05', '2022-05-05', '2022-05-06', '2022-05-08', '2022-05-10', '2022-05-12']
}
df = pd.DataFrame(data=d)
def nearest(items, pivot):
try:
return min(items, key=lambda x: abs(x - pivot))
except:
return None
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce').dt.date
df["count_next_date"] = df['fromDate'].apply(lambda x: nearest(df['fromDate'], x))
[OUT]
customerId fromDate count_next
0 1 2022-05-04 2022-05-04
1 1 2022-05-05 2022-05-05
2 1 2022-05-05 2022-05-05
3 1 2022-05-07 2022-05-07
4 2 2022-05-10 2022-05-10
5 2 2022-05-12 2022-05-12
我想要的:
customerId fromDate count_next
0 1 2022-05-04 2
1 1 2022-05-05 1
2 1 2022-05-05 1
3 1 2022-05-06 0
4 1 2022-05-08 None
5 2 2022-05-10 0
6 2 2022-05-12 None
注释代码
# Convert the column to datetime
df['fromDate'] = pd.to_datetime(df['fromDate'])
# Group by custid and prev date to calculate
# number of items arriving next day
date = df['fromDate'] - pd.DateOffset(days=1)
items = df.groupby(['customerId', date], as_index=False).size()
# Merge the item count with original df
out = df.merge(items, how='left')
# Fill the nan values with 0
out['size'] = out['size'].fillna(0)
# mask the item count corresponding to last date for each customerid
out['size'] = out['size'].mask(~out['customerId'].duplicated(keep='last'))
结果
print(out)
customerId fromDate size
0 1 2022-05-04 2.0
1 1 2022-05-05 1.0
2 1 2022-05-05 1.0
3 1 2022-05-06 0.0
4 1 2022-05-08 NaN
5 2 2022-05-10 0.0
6 2 2022-05-12 NaN