需要帮助从 pandas 数据框中过滤前 3 个计数
Need help in filtering top 3 counts from pandas dataframe
您好,我希望通过时间线获取数据帧头部下的前 3 个字符串计数,下面的代码能够提取所有字符串的计数,但是我如何应用 top3 或 5 的过滤器来仅获取那些.
df['Date'] = pd.to_datetime(df['Date'])
table1 = pd.crosstab([df['name'], df['city']], df['Date'].dt.to_period('q'))
table.columns = [table.columns.year, table.columns.strftime('q')]
print(table1)
#Data Look like below
name age city Date country hight MessageList gender
Tom 10 NewYork 1/1/2021 08:35:58Z US NaN X List Male
Mark 5 London 5/1/2021 08:35:58Z UK NaN X List Male
Pam 7 London 3/6/2021 08:35:58Z UK NaN Y List Female
Tom 18 California 4/6/2021 08:35:58Z US 163 Y List Male
Lena 23 NewYork 12/12/2020 08:35:58Z US NaN Y List Female
Ben 17 Colombo 11/12/2020 08:35:58Z Srilanka NaN X List Male
Lena 23 Paris 8/1/2020 08:35:58Z France NaN Y List Female
Ben 51 Colombo 7/1/2020 08:35:58Z Srilanka NaN Z List Male
Tom 18 Paris 1/1/2021 08:35:58Z France NaN Z List Male
Mark 5 Paris 5/1/2021 08:35:58Z Japan NaN Z List Male
Tom 18 London 3/6/2021 08:35:58Z UK NaN X List Male
Tom 18 Paris 4/6/2021 08:35:58Z France 163 Z List Male
Tom 10 NewYork 1/1/2021 08:35:58Z US NaN X List Male
Mark 5 London 5/1/2021 08:35:58Z UK NaN X List Male
Pam 7 London 3/6/2021 08:35:58Z UK NaN Y List Female
Tom 18 California 4/6/2021 08:35:58Z US 163 Y List Male
Lena 23 NewYork 12/12/2020 08:35:58Z US NaN Y List Female
Ben 17 Colombo 11/12/2020 08:35:58Z India NaN X List Male
Lena 23 Paris 8/1/2020 08:35:58Z France NaN Y List Female
Ben 51 Colombo 7/1/2020 08:35:58Z India NaN Z List Male
Tom 18 Paris 1/1/2021 08:35:58Z France NaN Z List Male
Mark 5 Paris 5/1/2021 08:35:58Z Japan NaN Z List Male
Tom 18 London 3/6/2021 08:35:58Z UK NaN X List Male
Tom 18 Paris 4/6/2021 08:35:58Z France 163 Z List Male
Tom 10 NewYork 1/1/2021 08:35:58Z US NaN X List Male
Mark 5 London 5/1/2021 08:35:58Z UK NaN X List Male
Pam 7 London 3/6/2021 08:35:58Z UK NaN Y List Female
Tom 18 California 4/6/2021 08:35:58Z US 163 Y List Male
Lena 23 NewYork 12/12/2020 08:35:58Z US NaN Y List Female
Ben 17 Colombo 11/12/2020 08:35:58Z Srilanka NaN X List Male
Lena 23 Paris 8/1/2020 08:35:58Z France NaN Y List Female
Ben 51 Colombo 7/1/2020 08:35:58Z Srilanka NaN Z List Male
Tom 18 Paris 1/1/2021 08:35:58Z France NaN Z List Male
Mark 5 Paris 5/1/2021 08:35:58Z Japan NaN Z List Male
Tom 18 London 3/6/2021 08:35:58Z UK NaN X List Male
Tom 18 California 4/6/2021 08:35:58Z US 163 Y List Male
Lena 23 NewYork 12/12/2020 08:35:58Z US NaN Y List Female
Ben 17 Colombo 11/12/2020 08:35:58Z India NaN X List Male
Lena 23 Paris 8/1/2020 08:35:58Z France NaN Y List Female
Ben 51 Colombo 7/1/2020 08:35:58Z India NaN Z List Male
Tom 18 Paris 1/1/2021 08:35:58Z France NaN Z List Male
Mark 5 Paris 5/1/2021 08:35:58Z Japan NaN Z List Male
Tom 18 London 3/6/2021 08:35:58Z UK NaN X List Male
Tom 18 Paris 4/6/2021 08:35:58Z France 163 Z List Male
Tom 10 NewYork 1/1/2021 08:35:58Z US NaN X List Male
Mark 5 London 5/1/2021 08:35:58Z UK NaN X List Male
Pam 7 London 3/6/2021 08:35:58Z UK NaN Y List Female
Tom 18 California 4/6/2021 08:35:58Z US 163 Y List Male
Lena 23 NewYork 12/12/2020 08:35:58Z US NaN Y List Female
Ben 17 Colombo 11/12/2020 08:35:58Z Srilanka NaN X List Male
Lena 23 Paris 8/1/2020 08:35:58Z France NaN Y List Female
Ben 51 Colombo 7/1/2020 08:35:58Z Srilanka NaN Z List Male
Tom 18 Paris 1/1/2021 08:35:58Z France NaN Z List Male
Mark 5 Paris 5/1/2021 08:35:58Z Japan NaN Z List Male
Tom 18 London 3/6/2021 08:35:58Z UK NaN X List Male
#Output expected
Quarter Q1 Q2 Q3 Q4 Total
city US 12 8 24 11 55
Japan 6 7 5 3 21
Italy 8 3 2 5 18
如何在行和列上保留过滤器,例如在 excel 中旋转,请帮助
我会以下。创建四分之一列:
df["quarter"] = df["Date"].dt.to_period("q")
然后旋转数据框,删除列总和,对行总和和 return 前 3 行进行排序:
df.pivot_table(
index="city",
columns="quarter",
values="name",
aggfunc="count",
fill_value=0,
margins=True,
)[:-1].sort_values(by="All", ascending=False)[:3]
输出:
quarter 2020Q3 2020Q4 2021Q1 2021Q2 All
city
Paris 5 0 5 8 18
London 0 0 9 4 13
Colombo 5 5 0 0 10
与@Henrik Bo 的回答类似,按照你的方式使用交叉表:
table1 = pd.crosstab([ df['city']], df['Date'].dt.to_period('q'))
table1["total"] = table1.sum(axis=1)
table1.sort_values(by="total",ascending=False)[:3]
Date 2020Q3 2020Q4 2021Q1 2021Q2 total
city
Paris 5 0 5 8 18
London 0 0 9 4 13
Colombo 5 5 0 0 10
当您对季度期间不感兴趣时:
table2 = pd.crosstab([ df['city']], df['Date'].dt.quarter.apply(lambda x: "Q" + str(x)))
table2["total"] = table2.sum(axis=1)
table2.sort_values(by="total",ascending=False)[:3]
Date Q1 Q2 Q3 Q4 total
city
Paris 5 8 5 0 18
London 9 4 0 0 13
Colombo 0 0 5 5 10
您好,我希望通过时间线获取数据帧头部下的前 3 个字符串计数,下面的代码能够提取所有字符串的计数,但是我如何应用 top3 或 5 的过滤器来仅获取那些.
df['Date'] = pd.to_datetime(df['Date'])
table1 = pd.crosstab([df['name'], df['city']], df['Date'].dt.to_period('q'))
table.columns = [table.columns.year, table.columns.strftime('q')]
print(table1)
#Data Look like below
name age city Date country hight MessageList gender
Tom 10 NewYork 1/1/2021 08:35:58Z US NaN X List Male
Mark 5 London 5/1/2021 08:35:58Z UK NaN X List Male
Pam 7 London 3/6/2021 08:35:58Z UK NaN Y List Female
Tom 18 California 4/6/2021 08:35:58Z US 163 Y List Male
Lena 23 NewYork 12/12/2020 08:35:58Z US NaN Y List Female
Ben 17 Colombo 11/12/2020 08:35:58Z Srilanka NaN X List Male
Lena 23 Paris 8/1/2020 08:35:58Z France NaN Y List Female
Ben 51 Colombo 7/1/2020 08:35:58Z Srilanka NaN Z List Male
Tom 18 Paris 1/1/2021 08:35:58Z France NaN Z List Male
Mark 5 Paris 5/1/2021 08:35:58Z Japan NaN Z List Male
Tom 18 London 3/6/2021 08:35:58Z UK NaN X List Male
Tom 18 Paris 4/6/2021 08:35:58Z France 163 Z List Male
Tom 10 NewYork 1/1/2021 08:35:58Z US NaN X List Male
Mark 5 London 5/1/2021 08:35:58Z UK NaN X List Male
Pam 7 London 3/6/2021 08:35:58Z UK NaN Y List Female
Tom 18 California 4/6/2021 08:35:58Z US 163 Y List Male
Lena 23 NewYork 12/12/2020 08:35:58Z US NaN Y List Female
Ben 17 Colombo 11/12/2020 08:35:58Z India NaN X List Male
Lena 23 Paris 8/1/2020 08:35:58Z France NaN Y List Female
Ben 51 Colombo 7/1/2020 08:35:58Z India NaN Z List Male
Tom 18 Paris 1/1/2021 08:35:58Z France NaN Z List Male
Mark 5 Paris 5/1/2021 08:35:58Z Japan NaN Z List Male
Tom 18 London 3/6/2021 08:35:58Z UK NaN X List Male
Tom 18 Paris 4/6/2021 08:35:58Z France 163 Z List Male
Tom 10 NewYork 1/1/2021 08:35:58Z US NaN X List Male
Mark 5 London 5/1/2021 08:35:58Z UK NaN X List Male
Pam 7 London 3/6/2021 08:35:58Z UK NaN Y List Female
Tom 18 California 4/6/2021 08:35:58Z US 163 Y List Male
Lena 23 NewYork 12/12/2020 08:35:58Z US NaN Y List Female
Ben 17 Colombo 11/12/2020 08:35:58Z Srilanka NaN X List Male
Lena 23 Paris 8/1/2020 08:35:58Z France NaN Y List Female
Ben 51 Colombo 7/1/2020 08:35:58Z Srilanka NaN Z List Male
Tom 18 Paris 1/1/2021 08:35:58Z France NaN Z List Male
Mark 5 Paris 5/1/2021 08:35:58Z Japan NaN Z List Male
Tom 18 London 3/6/2021 08:35:58Z UK NaN X List Male
Tom 18 California 4/6/2021 08:35:58Z US 163 Y List Male
Lena 23 NewYork 12/12/2020 08:35:58Z US NaN Y List Female
Ben 17 Colombo 11/12/2020 08:35:58Z India NaN X List Male
Lena 23 Paris 8/1/2020 08:35:58Z France NaN Y List Female
Ben 51 Colombo 7/1/2020 08:35:58Z India NaN Z List Male
Tom 18 Paris 1/1/2021 08:35:58Z France NaN Z List Male
Mark 5 Paris 5/1/2021 08:35:58Z Japan NaN Z List Male
Tom 18 London 3/6/2021 08:35:58Z UK NaN X List Male
Tom 18 Paris 4/6/2021 08:35:58Z France 163 Z List Male
Tom 10 NewYork 1/1/2021 08:35:58Z US NaN X List Male
Mark 5 London 5/1/2021 08:35:58Z UK NaN X List Male
Pam 7 London 3/6/2021 08:35:58Z UK NaN Y List Female
Tom 18 California 4/6/2021 08:35:58Z US 163 Y List Male
Lena 23 NewYork 12/12/2020 08:35:58Z US NaN Y List Female
Ben 17 Colombo 11/12/2020 08:35:58Z Srilanka NaN X List Male
Lena 23 Paris 8/1/2020 08:35:58Z France NaN Y List Female
Ben 51 Colombo 7/1/2020 08:35:58Z Srilanka NaN Z List Male
Tom 18 Paris 1/1/2021 08:35:58Z France NaN Z List Male
Mark 5 Paris 5/1/2021 08:35:58Z Japan NaN Z List Male
Tom 18 London 3/6/2021 08:35:58Z UK NaN X List Male
#Output expected
Quarter Q1 Q2 Q3 Q4 Total
city US 12 8 24 11 55
Japan 6 7 5 3 21
Italy 8 3 2 5 18
如何在行和列上保留过滤器,例如在 excel 中旋转,请帮助
我会以下。创建四分之一列:
df["quarter"] = df["Date"].dt.to_period("q")
然后旋转数据框,删除列总和,对行总和和 return 前 3 行进行排序:
df.pivot_table(
index="city",
columns="quarter",
values="name",
aggfunc="count",
fill_value=0,
margins=True,
)[:-1].sort_values(by="All", ascending=False)[:3]
输出:
quarter 2020Q3 2020Q4 2021Q1 2021Q2 All
city
Paris 5 0 5 8 18
London 0 0 9 4 13
Colombo 5 5 0 0 10
与@Henrik Bo 的回答类似,按照你的方式使用交叉表:
table1 = pd.crosstab([ df['city']], df['Date'].dt.to_period('q'))
table1["total"] = table1.sum(axis=1)
table1.sort_values(by="total",ascending=False)[:3]
Date 2020Q3 2020Q4 2021Q1 2021Q2 total
city
Paris 5 0 5 8 18
London 0 0 9 4 13
Colombo 5 5 0 0 10
当您对季度期间不感兴趣时:
table2 = pd.crosstab([ df['city']], df['Date'].dt.quarter.apply(lambda x: "Q" + str(x)))
table2["total"] = table2.sum(axis=1)
table2.sort_values(by="total",ascending=False)[:3]
Date Q1 Q2 Q3 Q4 total
city
Paris 5 8 5 0 18
London 9 4 0 0 13
Colombo 0 0 5 5 10