根据 pandas 中其他列的条件，仅保留列的那些行中的值

Question

我有一个数据框 df_in，其中包含以 pi 和 pm.[=13= 开头的列名]

df_in = pd.DataFrame([[1,2,3,4,"",6,7,8,9],["",1,32,43,59,65,"",83,97],["",51,62,47,58,64,74,86,99],[73,51,42,67,54,65,"",85,92]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])

如果列名称中以pi开头的行为空，则使以pm开头的列的相同行也为空直到我们有一个以 pi 开头的新列。并对其他列也重复相同的过程。

预期输出：

df_out = pd.DataFrame([[1,2,3,4,"","",7,8,9],["","","","",59,65,"","",""],["","","","",58,64,74,86,99],[73,51,42,67,54,65,"","",""]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])

怎么做？

Answer 1

您可以通过 str.startswith with cumulative sum and then compare values by empty spaces in groupby for mask used for set empty spaces in DataFrame.mask:

比较列名称来创建组

g = df_in.columns.str.startswith('pi').cumsum()
df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform(lambda x: x.iat[0]), '')

#first for me failed in pandas 1.2.3
#df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform('first'), '')


print (df)
  piabc pmed pmrde pmret pirtc pmere piuyt pmfgf pmthg
0     1    2     3     4                 7     8     9
1                           59    65                  
2                           58    64    74    86    99
3    73   51    42    67    54    65

根据 pandas 中其他列的条件，仅保留列的那些行中的值

Retain the values only in those rows of the column based on the condition on other columns in pandas

python

dataframe

python-2.7

python-3.x

pandas