python pandas 或 Filter 中的管道或函数序列然后汇总(作为 dplyr)
Pipe or sequence of function in python pandas or Filter then summarize (as dplyr)
语境化。我是 R 的重度用户,但目前在 python 和 pandas 之间切换。假设我有这个数据框
data = {'participant': ['p1','p1','p2','p3'],
'metadata': ['congruent_1','congruent_2','incongruent_1','incongruent_2'],
'reaction': [22000,25000,27000,35000]
}
df_s1 = pd.DataFrame(data, columns = ['participant','metadata', 'reaction'])
df_s1 = df_s1.append([df_s1]*15,ignore_index=True)
df_s1
我想通过以下方式重现我在 R(管道函数)中可以轻松完成的操作:
df_s1[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1")].df_s1["reaction"].mean()
这是不可能的。当我将此代码拆分为 parts/variables:
时,我才能成功
x = df_s1[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1")]
x = x["reaction"].mean()
x
以 dplyr 方式,我会选择
ds_s1 %>%
filter(metadata == "congruent_1" | metadata == "incongruent_1") %>%
summarise(mean(reaction))
注意:我非常感谢对可以将我的 R 代码转换为 Python 的站点的简明引用。有几种文献可用,但格式混合且样式灵活。
谢谢
你是说:
df_s1.loc[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1"), "reaction"].mean()
或更简单 isin
:
df_s1.loc[df_s1.metadata.isin(["congruent_1", "incongruent_1"]), "reaction"].mean()
输出:
24500.0
这里有.loc
df_s1.loc[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1"), 'reaction'].mean()
Out[117]: 24500.0
更改为 isin
,如 Quang 所述,尝试减少代码行
在基地 R
mean(ds_s1$reaction[ds_s1$metadata%in%c('congruent_1','incongruent_1')])
除了其他建议的解决方案:
df_s1.query('metadata==["congruent_1","incongruent_1"]').agg({"reaction": "mean"})
reaction 24500.0
dtype: float64
在python中使用datar
(我是作者),您可以轻松地将代码从R移植到python:
from datar.all import *
data = tibble(
participant=['p1','p1','p2','p3'],
metadata=['congruent_1','congruent_2','incongruent_1','incongruent_2'],
reaction=[22000,25000,27000,35000]
)
df_s1 = data >> uncount(15)
df_s1 = df_s1 >> \
filter((f.metadata == "congruent_1") | (f.metadata == "incongruent_1")) >> \
group_by(f.metadata) >> \
summarise(reaction_mean=mean(f.reaction))
print(df_s1)
输出:
metadata reaction_mean
0 congruent_1 22000.0
1 incongruent_1 27000.0
语境化。我是 R 的重度用户,但目前在 python 和 pandas 之间切换。假设我有这个数据框
data = {'participant': ['p1','p1','p2','p3'],
'metadata': ['congruent_1','congruent_2','incongruent_1','incongruent_2'],
'reaction': [22000,25000,27000,35000]
}
df_s1 = pd.DataFrame(data, columns = ['participant','metadata', 'reaction'])
df_s1 = df_s1.append([df_s1]*15,ignore_index=True)
df_s1
我想通过以下方式重现我在 R(管道函数)中可以轻松完成的操作:
df_s1[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1")].df_s1["reaction"].mean()
这是不可能的。当我将此代码拆分为 parts/variables:
时,我才能成功x = df_s1[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1")]
x = x["reaction"].mean()
x
以 dplyr 方式,我会选择
ds_s1 %>%
filter(metadata == "congruent_1" | metadata == "incongruent_1") %>%
summarise(mean(reaction))
注意:我非常感谢对可以将我的 R 代码转换为 Python 的站点的简明引用。有几种文献可用,但格式混合且样式灵活。
谢谢
你是说:
df_s1.loc[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1"), "reaction"].mean()
或更简单 isin
:
df_s1.loc[df_s1.metadata.isin(["congruent_1", "incongruent_1"]), "reaction"].mean()
输出:
24500.0
这里有.loc
df_s1.loc[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1"), 'reaction'].mean()
Out[117]: 24500.0
更改为 isin
,如 Quang 所述,尝试减少代码行
在基地 R
mean(ds_s1$reaction[ds_s1$metadata%in%c('congruent_1','incongruent_1')])
除了其他建议的解决方案:
df_s1.query('metadata==["congruent_1","incongruent_1"]').agg({"reaction": "mean"})
reaction 24500.0
dtype: float64
在python中使用datar
(我是作者),您可以轻松地将代码从R移植到python:
from datar.all import *
data = tibble(
participant=['p1','p1','p2','p3'],
metadata=['congruent_1','congruent_2','incongruent_1','incongruent_2'],
reaction=[22000,25000,27000,35000]
)
df_s1 = data >> uncount(15)
df_s1 = df_s1 >> \
filter((f.metadata == "congruent_1") | (f.metadata == "incongruent_1")) >> \
group_by(f.metadata) >> \
summarise(reaction_mean=mean(f.reaction))
print(df_s1)
输出:
metadata reaction_mean
0 congruent_1 22000.0
1 incongruent_1 27000.0