在删除 NaN 值的同时跨行合并 DataFrame
Merging a DataFrame across Rows while dropping the NaN values
我有这个数据框
我通过编写这段代码实现了这一点
df = pd.DataFrame(columns = ['Step Number' , 'CAN_Send' , 'CAN_Values'])
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
keys = []
for can_signals in can:
for key,value in can_signals.items():
if key not in keys:
keys.append(key)
df = df.append({'Step Number' : key} , ignore_index = True)
df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
else:
df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df
我需要一个看起来像这样的数据框
我无法破解如何在同时删除 NaN 的同时跨列合并。
我试过
df = df.groupby('Step Number')[['CAN_Send' , 'CAN_Values']]
但这不起作用,因为没有数值操作将 groupby 对象转换为框架,因为我有字符串值,任何删除 NaN 的方法最终都会清除我的整个数据框。
非常感谢这方面的任何帮助!
提前致谢!
可先.ffill()
. Then groupby()
Step Number
and then aggregate()
the remaining 2 columns with dropna()
填写Step Number
的缺失值,如下:
df['Step Number'] = df['Step Number'].ffill()
df_out = (df.groupby('Step Number', as_index=False)
.agg(lambda x: x.dropna(how='all'))
.apply(pd.Series.explode)
)
结果:
print(df_out)
Step Number CAN_Send CAN_Values
0 ta1 atpcinfolamp_co 3
1 ta2 xyz_signal 4
1 ta2 abc_signal 5
编辑
对于您的新数据集,您可以使用以下代码。它也适用于以前的数据集,并且通常适用于您的程序逻辑创建的结构。
df['Step Number'] = df['Step Number'].ffill()
df['CAN_Send'] = df['CAN_Send'].ffill(limit=1)
df['CAN_Values'] = df['CAN_Values'].bfill(limit=1)
df = df.dropna().drop_duplicates()
演示
资料准备:
您的代码经过微调,可以使您的逻辑正常工作。否则,如果您有一个键出现不止一次但其他键出现在该键之间(例如键以序列 ta1
、ta2
、ta1
出现),您现有的逻辑将失败为已存在于列表 keys
中的此键的 Step Number
添加新行(例如最后一个 ta1
)
df = pd.DataFrame(columns = ['Step Number' , 'CAN_Send' , 'CAN_Values'])
#can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta1': ('hdcinfolamp_co', '5')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
#keys = []
last_key = ''
for can_signals in can:
for key,value in can_signals.items():
if key != last_key:
# keys.append(key)
last_key = key
df = df.append({'Step Number' : key} , ignore_index = True)
# df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
# df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
# else:
# df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
# df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df
Step Number CAN_Send CAN_Values
0 ta1 NaN NaN
1 NaN atpcinfolamp_co NaN
2 NaN NaN 3
3 NaN hdcinfolamp_co NaN
4 NaN NaN 5
5 ta2 NaN NaN
6 NaN xyz_signal NaN
7 NaN NaN 4
8 NaN abc_signal NaN
9 NaN NaN 5
运行 新密码:
df['Step Number'] = df['Step Number'].ffill()
df['CAN_Send'] = df['CAN_Send'].ffill(limit=1)
df['CAN_Values'] = df['CAN_Values'].bfill(limit=1)
df = df.dropna().drop_duplicates()
结果:
print(df)
Step Number CAN_Send CAN_Values
1 ta1 atpcinfolamp_co 3
3 ta1 hdcinfolamp_co 5
6 ta2 xyz_signal 4
8 ta2 abc_signal 5
编辑 2
实际上,对于源数据的结构can
,您可以通过以下更简单的方式直接到达所需的数据帧:
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta1': ('hdcinfolamp_co', '5')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
data = {'Step Number': [list(x.keys())[0] for x in can], 'CAN_Send': [list(x.values())[0][0] for x in can], 'CAN_Values': [list(x.values())[0][1] for x in can]}
df = pd.DataFrame(data)
我有这个数据框
我通过编写这段代码实现了这一点
df = pd.DataFrame(columns = ['Step Number' , 'CAN_Send' , 'CAN_Values'])
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
keys = []
for can_signals in can:
for key,value in can_signals.items():
if key not in keys:
keys.append(key)
df = df.append({'Step Number' : key} , ignore_index = True)
df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
else:
df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df
我需要一个看起来像这样的数据框
我无法破解如何在同时删除 NaN 的同时跨列合并。
我试过
df = df.groupby('Step Number')[['CAN_Send' , 'CAN_Values']]
但这不起作用,因为没有数值操作将 groupby 对象转换为框架,因为我有字符串值,任何删除 NaN 的方法最终都会清除我的整个数据框。
非常感谢这方面的任何帮助!
提前致谢!
可先.ffill()
. Then groupby()
Step Number
and then aggregate()
the remaining 2 columns with dropna()
填写Step Number
的缺失值,如下:
df['Step Number'] = df['Step Number'].ffill()
df_out = (df.groupby('Step Number', as_index=False)
.agg(lambda x: x.dropna(how='all'))
.apply(pd.Series.explode)
)
结果:
print(df_out)
Step Number CAN_Send CAN_Values
0 ta1 atpcinfolamp_co 3
1 ta2 xyz_signal 4
1 ta2 abc_signal 5
编辑
对于您的新数据集,您可以使用以下代码。它也适用于以前的数据集,并且通常适用于您的程序逻辑创建的结构。
df['Step Number'] = df['Step Number'].ffill()
df['CAN_Send'] = df['CAN_Send'].ffill(limit=1)
df['CAN_Values'] = df['CAN_Values'].bfill(limit=1)
df = df.dropna().drop_duplicates()
演示
资料准备:
您的代码经过微调,可以使您的逻辑正常工作。否则,如果您有一个键出现不止一次但其他键出现在该键之间(例如键以序列 ta1
、ta2
、ta1
出现),您现有的逻辑将失败为已存在于列表 keys
Step Number
添加新行(例如最后一个 ta1
)
df = pd.DataFrame(columns = ['Step Number' , 'CAN_Send' , 'CAN_Values'])
#can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta1': ('hdcinfolamp_co', '5')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
#keys = []
last_key = ''
for can_signals in can:
for key,value in can_signals.items():
if key != last_key:
# keys.append(key)
last_key = key
df = df.append({'Step Number' : key} , ignore_index = True)
# df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
# df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
# else:
# df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
# df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df
Step Number CAN_Send CAN_Values
0 ta1 NaN NaN
1 NaN atpcinfolamp_co NaN
2 NaN NaN 3
3 NaN hdcinfolamp_co NaN
4 NaN NaN 5
5 ta2 NaN NaN
6 NaN xyz_signal NaN
7 NaN NaN 4
8 NaN abc_signal NaN
9 NaN NaN 5
运行 新密码:
df['Step Number'] = df['Step Number'].ffill()
df['CAN_Send'] = df['CAN_Send'].ffill(limit=1)
df['CAN_Values'] = df['CAN_Values'].bfill(limit=1)
df = df.dropna().drop_duplicates()
结果:
print(df)
Step Number CAN_Send CAN_Values
1 ta1 atpcinfolamp_co 3
3 ta1 hdcinfolamp_co 5
6 ta2 xyz_signal 4
8 ta2 abc_signal 5
编辑 2
实际上,对于源数据的结构can
,您可以通过以下更简单的方式直接到达所需的数据帧:
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta1': ('hdcinfolamp_co', '5')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
data = {'Step Number': [list(x.keys())[0] for x in can], 'CAN_Send': [list(x.values())[0][0] for x in can], 'CAN_Values': [list(x.values())[0][1] for x in can]}
df = pd.DataFrame(data)