pandas 相当于 fct_reorder
pandas equivalent of fct_reorder
有没有一种方法可以根据 pandas 数据框列与同一数据框中另一个分类列的关系对其进行重新排序,类似于 R 中 forcats 包中的 fct_reorder
?
我的一个朋友想要 运行 一个 python 脚本来绘制 plotnine 中的情节。
可以在下面找到 reprex 数据框:
Group Name Height
0 3 Abigail 151.09962170955896
1 2 Amelia 144.53368144215813
2 1 Ava 150.84441176683055
3 2 Charlotte 144.2526003986535
4 3 Emily 150.01613555140298
5 1 Emma 127.9293425061458
6 3 Evelyn 154.35548000906718
7 3 Harper 155.22807300246453
8 1 Isabella 116.54302297370651
9 2 Mia 155.0605589215757
10 1 Olivia 142.7742924211066
11 2 Sophia 154.2912468881105
我也做了一个csv文件供下载:
https://github.com/Biomiha/factors/blob/master/Fct_reorder_reprex.csv
将其作为小标题读入 R session:
df <- structure(list(Group = c(3, 2, 1, 2, 3, 1, 3, 3, 1, 2, 1, 2),
Name = c("Abigail", "Amelia", "Ava", "Charlotte", "Emily",
"Emma", "Evelyn", "Harper", "Isabella", "Mia", "Olivia",
"Sophia"), Height = c(151.099621709559, 144.533681442158,
150.844411766831, 144.252600398653, 150.016135551403, 127.929342506146,
154.355480009067, 155.228073002465, 116.543022973707, 155.060558921576,
142.774292421107, 154.29124688811)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), spec = structure(list(
cols = list(Group = structure(list(), class = c("collector_double",
"collector")), Name = structure(list(), class = c("collector_character",
"collector")), Height = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
并将其作为 Pandas DataFrame 读入 python session,复制上面的 table 并粘贴使用:
df = pd.read_clipboard()
我的R代码是:
library(tidyverse)
# The unordered plot that is the default looks like:
plot_without <- df %>%
dplyr::mutate(Group = as.factor(Group)) %>%
ggplot(aes(x = Name, y = Height, fill = Group)) +
geom_bar(stat = "identity") +
labs(title = "Plot without ordering")
plot_without
# To order the 'Name' variable, using fct_reorder (this is what I want but from python):
plot_with <- df %>%
dplyr::mutate(Group = as.factor(Group),
Name = fct_reorder(Name, Group, identity)) %>%
ggplot(aes(x = Name, y = Height, fill = Group)) +
geom_bar(stat = "identity") +
labs(title = "Ordered plot")
plot_with
到目前为止等效的 python 代码是:
import sys
import pandas as pd
from plotnine import *
df=pd.read_csv('Fct_reorder_reprex.csv')
df['Group'] = df['Group'].astype('category')
ggplot(df) + geom_bar(aes(x = 'Name', y = 'Height', fill = 'Group', col = 'Group'), stat = 'identity') + labs(title='Python unordered plot')
plotnine 输出如下所示:
问题是,我如何告诉 pandas 根据 Group
列对 Name
列重新排序(即将颜色组合在一起)?
两年过去了,现在我们在python中找到了完美的解决方案:
import pandas as pd
from datar.all import f, mutate, fct_reorder, as_factor, identity
from plotnine import ggplot, geom_bar, labs, aes
df = pd.read_csv("https://github.com/Biomiha/factors/raw/master/Fct_reorder_reprex.csv")
df
Group Name Height
<int64> <object> <float64>
0 3 Abigail 151.099622
1 2 Amelia 144.533681
2 1 Ava 150.844412
3 2 Charlotte 144.252600
4 3 Emily 150.016136
5 1 Emma 127.929343
6 3 Evelyn 154.355480
7 3 Harper 155.228073
8 1 Isabella 116.543023
9 2 Mia 155.060559
10 1 Olivia 142.774292
11 2 Sophia 154.291247
plot_without = (
df
>> mutate(Group=as_factor(f.Group))
>> ggplot(aes(x="Name", y="Height", fill="Group"))
+ geom_bar(stat="identity")
+ labs(title="Plot without ordering")
)
plot_without
plot_with = (
df
>> mutate(Group=as_factor(f.Group), Name=fct_reorder(f.Name, f.Group, _fun=identity))
>> ggplot(aes(x="Name", y="Height", fill="Group"))
+ geom_bar(stat="identity")
+ labs(title="Plot without ordering")
)
plot_with
有没有一种方法可以根据 pandas 数据框列与同一数据框中另一个分类列的关系对其进行重新排序,类似于 R 中 forcats 包中的 fct_reorder
?
我的一个朋友想要 运行 一个 python 脚本来绘制 plotnine 中的情节。
可以在下面找到 reprex 数据框:
Group Name Height
0 3 Abigail 151.09962170955896
1 2 Amelia 144.53368144215813
2 1 Ava 150.84441176683055
3 2 Charlotte 144.2526003986535
4 3 Emily 150.01613555140298
5 1 Emma 127.9293425061458
6 3 Evelyn 154.35548000906718
7 3 Harper 155.22807300246453
8 1 Isabella 116.54302297370651
9 2 Mia 155.0605589215757
10 1 Olivia 142.7742924211066
11 2 Sophia 154.2912468881105
我也做了一个csv文件供下载: https://github.com/Biomiha/factors/blob/master/Fct_reorder_reprex.csv
将其作为小标题读入 R session:
df <- structure(list(Group = c(3, 2, 1, 2, 3, 1, 3, 3, 1, 2, 1, 2),
Name = c("Abigail", "Amelia", "Ava", "Charlotte", "Emily",
"Emma", "Evelyn", "Harper", "Isabella", "Mia", "Olivia",
"Sophia"), Height = c(151.099621709559, 144.533681442158,
150.844411766831, 144.252600398653, 150.016135551403, 127.929342506146,
154.355480009067, 155.228073002465, 116.543022973707, 155.060558921576,
142.774292421107, 154.29124688811)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), spec = structure(list(
cols = list(Group = structure(list(), class = c("collector_double",
"collector")), Name = structure(list(), class = c("collector_character",
"collector")), Height = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
并将其作为 Pandas DataFrame 读入 python session,复制上面的 table 并粘贴使用:
df = pd.read_clipboard()
我的R代码是:
library(tidyverse)
# The unordered plot that is the default looks like:
plot_without <- df %>%
dplyr::mutate(Group = as.factor(Group)) %>%
ggplot(aes(x = Name, y = Height, fill = Group)) +
geom_bar(stat = "identity") +
labs(title = "Plot without ordering")
plot_without
# To order the 'Name' variable, using fct_reorder (this is what I want but from python):
plot_with <- df %>%
dplyr::mutate(Group = as.factor(Group),
Name = fct_reorder(Name, Group, identity)) %>%
ggplot(aes(x = Name, y = Height, fill = Group)) +
geom_bar(stat = "identity") +
labs(title = "Ordered plot")
plot_with
到目前为止等效的 python 代码是:
import sys
import pandas as pd
from plotnine import *
df=pd.read_csv('Fct_reorder_reprex.csv')
df['Group'] = df['Group'].astype('category')
ggplot(df) + geom_bar(aes(x = 'Name', y = 'Height', fill = 'Group', col = 'Group'), stat = 'identity') + labs(title='Python unordered plot')
plotnine 输出如下所示:
问题是,我如何告诉 pandas 根据 Group
列对 Name
列重新排序(即将颜色组合在一起)?
两年过去了,现在我们在python中找到了完美的解决方案:
import pandas as pd
from datar.all import f, mutate, fct_reorder, as_factor, identity
from plotnine import ggplot, geom_bar, labs, aes
df = pd.read_csv("https://github.com/Biomiha/factors/raw/master/Fct_reorder_reprex.csv")
df
Group Name Height
<int64> <object> <float64>
0 3 Abigail 151.099622
1 2 Amelia 144.533681
2 1 Ava 150.844412
3 2 Charlotte 144.252600
4 3 Emily 150.016136
5 1 Emma 127.929343
6 3 Evelyn 154.355480
7 3 Harper 155.228073
8 1 Isabella 116.543023
9 2 Mia 155.060559
10 1 Olivia 142.774292
11 2 Sophia 154.291247
plot_without = (
df
>> mutate(Group=as_factor(f.Group))
>> ggplot(aes(x="Name", y="Height", fill="Group"))
+ geom_bar(stat="identity")
+ labs(title="Plot without ordering")
)
plot_without
plot_with = (
df
>> mutate(Group=as_factor(f.Group), Name=fct_reorder(f.Name, f.Group, _fun=identity))
>> ggplot(aes(x="Name", y="Height", fill="Group"))
+ geom_bar(stat="identity")
+ labs(title="Plot without ordering")
)
plot_with