根据 pandas 中其他列的条件,仅保留列的那些行中的值
Retain the values only in those rows of the column based on the condition on other columns in pandas
我有一个数据框 df_in,其中包含以 pi 和 pm.[=13= 开头的列名]
df_in = pd.DataFrame([[1,2,3,4,"",6,7,8,9],["",1,32,43,59,65,"",83,97],["",51,62,47,58,64,74,86,99],[73,51,42,67,54,65,"",85,92]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])
如果列名称中以pi开头的行为空,则使以pm开头的列的相同行也为空直到我们有一个以 pi 开头的新列。并对其他列也重复相同的过程。
预期输出:
df_out = pd.DataFrame([[1,2,3,4,"","",7,8,9],["","","","",59,65,"","",""],["","","","",58,64,74,86,99],[73,51,42,67,54,65,"","",""]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])
怎么做?
您可以通过 str.startswith
with cumulative sum and then compare values by empty spaces in groupby
for mask used for set empty spaces in DataFrame.mask
:
比较列名称来创建组
g = df_in.columns.str.startswith('pi').cumsum()
df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform(lambda x: x.iat[0]), '')
#first for me failed in pandas 1.2.3
#df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform('first'), '')
print (df)
piabc pmed pmrde pmret pirtc pmere piuyt pmfgf pmthg
0 1 2 3 4 7 8 9
1 59 65
2 58 64 74 86 99
3 73 51 42 67 54 65
我有一个数据框 df_in,其中包含以 pi 和 pm.[=13= 开头的列名]
df_in = pd.DataFrame([[1,2,3,4,"",6,7,8,9],["",1,32,43,59,65,"",83,97],["",51,62,47,58,64,74,86,99],[73,51,42,67,54,65,"",85,92]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])
如果列名称中以pi开头的行为空,则使以pm开头的列的相同行也为空直到我们有一个以 pi 开头的新列。并对其他列也重复相同的过程。
预期输出:
df_out = pd.DataFrame([[1,2,3,4,"","",7,8,9],["","","","",59,65,"","",""],["","","","",58,64,74,86,99],[73,51,42,67,54,65,"","",""]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])
怎么做?
您可以通过 str.startswith
with cumulative sum and then compare values by empty spaces in groupby
for mask used for set empty spaces in DataFrame.mask
:
g = df_in.columns.str.startswith('pi').cumsum()
df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform(lambda x: x.iat[0]), '')
#first for me failed in pandas 1.2.3
#df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform('first'), '')
print (df)
piabc pmed pmrde pmret pirtc pmere piuyt pmfgf pmthg
0 1 2 3 4 7 8 9
1 59 65
2 58 64 74 86 99
3 73 51 42 67 54 65