将循环中的 where 应用于不同列集的更有效方法 - python
more efficient way of apply where in a loop to different sets of columns - python
我想要列 'month'、'week' 和 'year' 与列 'a' 的交叉表,但我只想替换列 [=] 的值 0 和 1 17=] 和 'week'。我在下面有一个代码在技术上确实有效,但我想知道是否有更有效的编写方法?任何指针都会很棒!谢谢
import pandas as pd
import numpy as np
out = {}
df = pd.DataFrame({'a': ['a','b','b','a','b','b','a','b','a'],
'month':['march','march','january', 'march','january','january', 'may','march','march'],
'week':['1','1','1', '1','2','3', '3','2','1'],
'year':['5','3','4', '3','1','1', '1','1','1']})
cols_a =['month', 'week']
cols_b = ['year']
out1 = {}
out2={}
for col in cols_a:
ct1 = pd.crosstab(df.a, df[col])
ct2 = pd.DataFrame(ct1.where(ct1 >=2, 'group_a'))
out1[f'{(col)}'] = ct2
for col in cols_b:
ct3 = pd.crosstab(df.a, df[col])
out2[f'{(col)}'] = ct3
out3 = {**out1, **out2}
当前的输出是这样的,这是正确的
{'month': month january march may
a
a group_a 3 group_a
b 3 2 group_a,
'week': week 1 2 3
a
a 3 group_a group_a
b 2 2 group_a,
'year': year 1 3 4 5
a
a 2 1 0 1
b 3 1 1 0}
想法是加入列表,如果存在于 cols_a
:
中,则仅对列使用 where
cols_a = ['month', 'week']
cols_b = ['year']
out = {}
for col in cols_a + cols_b:
ct1 = pd.crosstab(df.a, df[col])
if col in cols_a:
ct1 = ct1.where(ct1 >=2, 'group_a')
out[f'{(col)}'] = ct1
没有革命性的不同,但这里有一个 dict comprehension:
形式的解决方案
{k: pd.crosstab(df['a'], df[k]).applymap(lambda x: 'group_a' if x<2 else x)
if k in cols_a else
pd.crosstab(df['a'], df[k])
for k in cols_a+cols_b
}
输出:
{'month': month january march may
a
a group_a 3 group_a
b 3 2 group_a,
'week': week 1 2 3
a
a 3 group_a group_a
b 2 2 group_a,
'year': year 1 3 4 5
a
a 2 1 0 1
b 3 1 1 0}
这是避免 applymap
:
的替代方法
def group_a(df):
return df.where(df >= 2, 'group_a')
{k: pd.crosstab(df['a'], df[k]).transform(group_a)
if k in cols_a else
pd.crosstab(df['a'], df[k])
for k in cols_a+cols_b}
我想要列 'month'、'week' 和 'year' 与列 'a' 的交叉表,但我只想替换列 [=] 的值 0 和 1 17=] 和 'week'。我在下面有一个代码在技术上确实有效,但我想知道是否有更有效的编写方法?任何指针都会很棒!谢谢
import pandas as pd
import numpy as np
out = {}
df = pd.DataFrame({'a': ['a','b','b','a','b','b','a','b','a'],
'month':['march','march','january', 'march','january','january', 'may','march','march'],
'week':['1','1','1', '1','2','3', '3','2','1'],
'year':['5','3','4', '3','1','1', '1','1','1']})
cols_a =['month', 'week']
cols_b = ['year']
out1 = {}
out2={}
for col in cols_a:
ct1 = pd.crosstab(df.a, df[col])
ct2 = pd.DataFrame(ct1.where(ct1 >=2, 'group_a'))
out1[f'{(col)}'] = ct2
for col in cols_b:
ct3 = pd.crosstab(df.a, df[col])
out2[f'{(col)}'] = ct3
out3 = {**out1, **out2}
当前的输出是这样的,这是正确的
{'month': month january march may
a
a group_a 3 group_a
b 3 2 group_a,
'week': week 1 2 3
a
a 3 group_a group_a
b 2 2 group_a,
'year': year 1 3 4 5
a
a 2 1 0 1
b 3 1 1 0}
想法是加入列表,如果存在于 cols_a
:
where
cols_a = ['month', 'week']
cols_b = ['year']
out = {}
for col in cols_a + cols_b:
ct1 = pd.crosstab(df.a, df[col])
if col in cols_a:
ct1 = ct1.where(ct1 >=2, 'group_a')
out[f'{(col)}'] = ct1
没有革命性的不同,但这里有一个 dict comprehension:
形式的解决方案{k: pd.crosstab(df['a'], df[k]).applymap(lambda x: 'group_a' if x<2 else x)
if k in cols_a else
pd.crosstab(df['a'], df[k])
for k in cols_a+cols_b
}
输出:
{'month': month january march may
a
a group_a 3 group_a
b 3 2 group_a,
'week': week 1 2 3
a
a 3 group_a group_a
b 2 2 group_a,
'year': year 1 3 4 5
a
a 2 1 0 1
b 3 1 1 0}
这是避免 applymap
:
def group_a(df):
return df.where(df >= 2, 'group_a')
{k: pd.crosstab(df['a'], df[k]).transform(group_a)
if k in cols_a else
pd.crosstab(df['a'], df[k])
for k in cols_a+cols_b}