如何填充开始日期为每月第一天的缺失值?
how to fill the missing values where start date has been first day of month?
我有这样的数据框:
tst=
Date % on Merchant % on Customer Merchants Location
2021-08-04 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-05 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-06 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-01 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-02 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-03 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-04 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-05 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-06 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
uni_ind= ['% on Merchant','% on Customer','Merchants','Location']
我正在寻找输出:
Date % on Merchant % on Customer Merchants Location
2021-08-01 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-02 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-03 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-04 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-05 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-06 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-01 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-02 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-03 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-04 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-05 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-06 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
tst.groupby(uni_ind).resample('D').bfill()..reset_index(level=(0,1,2,3),drop= True).reset_index()
- 为 商家 缺少
的月份创建日期范围
- 外连接到原始数据框和
fillna(method="bfill")
import pandas as pd
import io
df = pd.read_csv(io.StringIO("""Date % on Merchant % on Customer Merchants Location
2021-08-04 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-05 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-06 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-01 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-02 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-03 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-04 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-05 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-06 0.0 0.12 Zwarma - The Shawarma Maker Pollachi """), sep="\s\s+", engine="python")
df["Date"] = pd.to_datetime(df["Date"])
df = (
df.merge(
df.groupby(
[df["Date"].dt.year, df["Date"].dt.month, "Merchants", "Location"], as_index=False
)
.agg({"Date": "min"})
.loc[lambda d: d["Date"].dt.day.gt(1)]
.apply(
lambda r: pd.Series(
{
"Date": list(
pd.date_range(
r["Date"] - pd.offsets.MonthBegin(1),
r["Date"] - pd.Timedelta(days=1),
)
),
"Merchants": r["Merchants"],
"Location": r["Location"]
}
),
axis=1,
)
.explode("Date"),
on=["Date", "Merchants", "Location"],
how="outer",
)
.sort_values(["Merchants", "Location", "Date"])
.fillna(method="bfill")
)
df
Date
% on Merchant
% on Customer
Merchants
Location
9
2021-08-01 00:00:00
0
0.1
Zwarma - The Shawarma Maker
Palani
10
2021-08-02 00:00:00
0
0.1
Zwarma - The Shawarma Maker
Palani
11
2021-08-03 00:00:00
0
0.1
Zwarma - The Shawarma Maker
Palani
0
2021-08-04 00:00:00
0
0.1
Zwarma - The Shawarma Maker
Palani
1
2021-08-05 00:00:00
0
0.1
Zwarma - The Shawarma Maker
Palani
2
2021-08-06 00:00:00
0
0.1
Zwarma - The Shawarma Maker
Palani
3
2021-08-01 00:00:00
0
0.12
Zwarma - The Shawarma Maker
Pollachi
4
2021-08-02 00:00:00
0
0.12
Zwarma - The Shawarma Maker
Pollachi
5
2021-08-03 00:00:00
0
0.12
Zwarma - The Shawarma Maker
Pollachi
6
2021-08-04 00:00:00
0
0.12
Zwarma - The Shawarma Maker
Pollachi
7
2021-08-05 00:00:00
0
0.12
Zwarma - The Shawarma Maker
Pollachi
8
2021-08-06 00:00:00
0
0.12
Zwarma - The Shawarma Maker
Pollachi
下面有一个更简单的答案。
第 1 步:通过 resmaple Month start 获取月份的第一个日期
tst1 = tst.groupby(uni_ind).resample('MS').bfill().reset_index(level=(0,1,2,3, 4,5),drop= 真).reset_index()
第 2 步:首先使用原始 df 附加月份
tst3 = tst.reset_index().append(tst1)
第 3 步:删除重复项,因为可能有几个月开始
tst3.drop_duplicates( inplace= True, ignore_index= False, keep= 'first')
第 4 步:将日期设置为要使用的重采样函数的索引
tst3.set_index('Date',inplace=True)
第 5 步:重新采样 df
tst3.groupby(uni_ind , dropna= False).resample('D').ffill().reset_index(
level=(0,1,2,3,4,5),drop= True).reset_index()
我有这样的数据框:
tst=
Date % on Merchant % on Customer Merchants Location
2021-08-04 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-05 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-06 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-01 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-02 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-03 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-04 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-05 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-06 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
uni_ind= ['% on Merchant','% on Customer','Merchants','Location']
我正在寻找输出:
Date % on Merchant % on Customer Merchants Location
2021-08-01 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-02 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-03 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-04 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-05 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-06 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-01 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-02 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-03 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-04 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-05 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-06 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
tst.groupby(uni_ind).resample('D').bfill()..reset_index(level=(0,1,2,3),drop= True).reset_index()
- 为 商家 缺少 的月份创建日期范围
- 外连接到原始数据框和
fillna(method="bfill")
import pandas as pd
import io
df = pd.read_csv(io.StringIO("""Date % on Merchant % on Customer Merchants Location
2021-08-04 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-05 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-06 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-01 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-02 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-03 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-04 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-05 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-06 0.0 0.12 Zwarma - The Shawarma Maker Pollachi """), sep="\s\s+", engine="python")
df["Date"] = pd.to_datetime(df["Date"])
df = (
df.merge(
df.groupby(
[df["Date"].dt.year, df["Date"].dt.month, "Merchants", "Location"], as_index=False
)
.agg({"Date": "min"})
.loc[lambda d: d["Date"].dt.day.gt(1)]
.apply(
lambda r: pd.Series(
{
"Date": list(
pd.date_range(
r["Date"] - pd.offsets.MonthBegin(1),
r["Date"] - pd.Timedelta(days=1),
)
),
"Merchants": r["Merchants"],
"Location": r["Location"]
}
),
axis=1,
)
.explode("Date"),
on=["Date", "Merchants", "Location"],
how="outer",
)
.sort_values(["Merchants", "Location", "Date"])
.fillna(method="bfill")
)
df
Date | % on Merchant | % on Customer | Merchants | Location | |
---|---|---|---|---|---|
9 | 2021-08-01 00:00:00 | 0 | 0.1 | Zwarma - The Shawarma Maker | Palani |
10 | 2021-08-02 00:00:00 | 0 | 0.1 | Zwarma - The Shawarma Maker | Palani |
11 | 2021-08-03 00:00:00 | 0 | 0.1 | Zwarma - The Shawarma Maker | Palani |
0 | 2021-08-04 00:00:00 | 0 | 0.1 | Zwarma - The Shawarma Maker | Palani |
1 | 2021-08-05 00:00:00 | 0 | 0.1 | Zwarma - The Shawarma Maker | Palani |
2 | 2021-08-06 00:00:00 | 0 | 0.1 | Zwarma - The Shawarma Maker | Palani |
3 | 2021-08-01 00:00:00 | 0 | 0.12 | Zwarma - The Shawarma Maker | Pollachi |
4 | 2021-08-02 00:00:00 | 0 | 0.12 | Zwarma - The Shawarma Maker | Pollachi |
5 | 2021-08-03 00:00:00 | 0 | 0.12 | Zwarma - The Shawarma Maker | Pollachi |
6 | 2021-08-04 00:00:00 | 0 | 0.12 | Zwarma - The Shawarma Maker | Pollachi |
7 | 2021-08-05 00:00:00 | 0 | 0.12 | Zwarma - The Shawarma Maker | Pollachi |
8 | 2021-08-06 00:00:00 | 0 | 0.12 | Zwarma - The Shawarma Maker | Pollachi |
下面有一个更简单的答案。
第 1 步:通过 resmaple Month start 获取月份的第一个日期
tst1 = tst.groupby(uni_ind).resample('MS').bfill().reset_index(level=(0,1,2,3, 4,5),drop= 真).reset_index()
第 2 步:首先使用原始 df 附加月份
tst3 = tst.reset_index().append(tst1)
第 3 步:删除重复项,因为可能有几个月开始
tst3.drop_duplicates( inplace= True, ignore_index= False, keep= 'first')
第 4 步:将日期设置为要使用的重采样函数的索引
tst3.set_index('Date',inplace=True)
第 5 步:重新采样 df
tst3.groupby(uni_ind , dropna= False).resample('D').ffill().reset_index(
level=(0,1,2,3,4,5),drop= True).reset_index()