在删除 NaN 值的同时跨行合并 DataFrame

Merging a DataFrame across Rows while dropping the NaN values

我有这个数据框

我通过编写这段代码实现了这一点

df = pd.DataFrame(columns = ['Step Number' , 'CAN_Send' , 'CAN_Values'])
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
keys = []
for can_signals in can:
    for key,value in can_signals.items():
        if key not in keys:
            keys.append(key)
            df = df.append({'Step Number' : key} , ignore_index = True)
            df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
            df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
        else:
            df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
            df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df

我需要一个看起来像这样的数据框

我无法破解如何在同时删除 NaN 的同时跨列合并。

我试过

df = df.groupby('Step Number')[['CAN_Send' , 'CAN_Values']]

但这不起作用,因为没有数值操作将 groupby 对象转换为框架,因为我有字符串值,任何删除 NaN 的方法最终都会清除我的整个数据框。

非常感谢这方面的任何帮助!

提前致谢!

可先.ffill(). Then groupby() Step Number and then aggregate() the remaining 2 columns with dropna()填写Step Number的缺失值,如下:

df['Step Number'] = df['Step Number'].ffill()

df_out = (df.groupby('Step Number', as_index=False)
            .agg(lambda x: x.dropna(how='all'))
            .apply(pd.Series.explode)
         )

结果:

print(df_out)

  Step Number         CAN_Send CAN_Values
0         ta1  atpcinfolamp_co          3
1         ta2       xyz_signal          4
1         ta2       abc_signal          5

编辑

对于您的新数据集,您可以使用以下代码。它也适用于以前的数据集,并且通常适用于您的程序逻辑创建的结构。

df['Step Number'] = df['Step Number'].ffill()
df['CAN_Send'] = df['CAN_Send'].ffill(limit=1)
df['CAN_Values'] = df['CAN_Values'].bfill(limit=1)
df = df.dropna().drop_duplicates()

演示

资料准备:

您的代码经过微调,可以使您的逻辑正常工作。否则,如果您有一个键出现不止一次但其他键出现在该键之间(例如键以序列 ta1ta2ta1 出现),您现有的逻辑将失败为已存在于列表 keys

中的此键的 Step Number 添加新行(例如最后一个 ta1
df = pd.DataFrame(columns = ['Step Number' , 'CAN_Send' , 'CAN_Values'])
#can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta1': ('hdcinfolamp_co', '5')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}] 
#keys = []
last_key = ''
for can_signals in can:
    for key,value in can_signals.items():
        if key != last_key:
#            keys.append(key)
            last_key = key
            df = df.append({'Step Number' : key} , ignore_index = True)
#            df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
#            df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
#        else:
#            df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
#            df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
        df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
        df = df.append({'CAN_Values' : value[1]} , ignore_index = True)

df

  Step Number         CAN_Send CAN_Values
0         ta1              NaN        NaN
1         NaN  atpcinfolamp_co        NaN
2         NaN              NaN          3
3         NaN   hdcinfolamp_co        NaN
4         NaN              NaN          5
5         ta2              NaN        NaN
6         NaN       xyz_signal        NaN
7         NaN              NaN          4
8         NaN       abc_signal        NaN
9         NaN              NaN          5

运行 新密码:

df['Step Number'] = df['Step Number'].ffill()
df['CAN_Send'] = df['CAN_Send'].ffill(limit=1)
df['CAN_Values'] = df['CAN_Values'].bfill(limit=1)
df = df.dropna().drop_duplicates()

结果:

print(df)

  Step Number         CAN_Send CAN_Values
1         ta1  atpcinfolamp_co          3
3         ta1   hdcinfolamp_co          5
6         ta2       xyz_signal          4
8         ta2       abc_signal          5

编辑 2

实际上,对于源数据的结构can,您可以通过以下更简单的方式直接到达所需的数据帧:

can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta1': ('hdcinfolamp_co', '5')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}] 

data = {'Step Number': [list(x.keys())[0] for x in can], 'CAN_Send': [list(x.values())[0][0] for x in can], 'CAN_Values': [list(x.values())[0][1] for x in can]}
df = pd.DataFrame(data)