这是使用条件将行从一个数据框移动到另一个数据框的正确方法吗?
Is this the right way to move rows from one dataframe to another with a condtion?
当 df1 和 df2 中的卡路里相同时,我想将一些行从 df1 移动到 df2。两个dfs有相同的列。
import numpy as np
import pandas as pd
np.random.seed(0)
df1 = pd.DataFrame(data = {
"calories": [420, 80, 90, 10],
"duration": [50, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
"calories": [420, 380, 390],
"duration": [60, 40, 45]
})
print(df1)
print(df2)
calories duration
0 420 50
1 80 4
2 90 5
3 10 2
calories duration
0 420 60
1 380 40
2 390 45
rows = df1.loc[df1.calories == df2.calories, :]
df2 = df2.append(rows, ignore_index=True)
df1.drop(rows.index, inplace=True)
print('df1:')
print(df1)
print('df2:')
print(df2)
然后报这个错:
raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects
编辑:解决方案
import numpy as np
import pandas as pd
np.random.seed(0)
df1 = pd.DataFrame(data = {
"mid": [420, 380, 90, 420],
"A": [50, 4, 5, 3],
"B": [420, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
"mid": [420, 380, 390],
"A": [60, 40, 80],
"B": [150, 24, 25]
})
print('df1:')
print(df1)
print('df2:')
print(df2)
new_df1 = df1[~df1.mid.isin(df2.mid)]
dup_df1 = df1[df1.mid.isin(df2.mid)]
new_df2 = df2.append(dup_df1, ignore_index=True)
print('dup:')
print(dup_df1)
print('df1:')
print(new_df1)
print('df2:')
print(new_df2)
由于你的数据帧长度不一样,你需要使用merge
to find rows with common calories
values. You need to merge on the index
and calories
values; that can most easily be achieved by using reset_index
临时添加一个index
列来合并:
dftemp = df1.reset_index().merge(df2.reset_index(), on=['index', 'calories'], suffixes=['', '_y'])
输出:
index calories duration duration_y
0 0 420 50 60
您现在可以 concat
从 dftemp
到 df2
的 calories
和 duration
值(再次使用 reset_index
重置索引):
df2 = pd.concat([df2, dftemp[['calories', 'duration']]]).reset_index(drop=True)
输出(对于您的示例数据):
calories duration
0 420 60
1 380 40
2 390 45
3 420 50
要删除从 df1
复制到 df2
的行,我们仅在索引上合并,然后过滤掉两个 calories
值不同的行:
dftemp = df1.merge(df2, left_index=True, right_index=True, suffixes=['', '_y']).query('calories != calories_y')
df1 = dftemp[['calories', 'duration']].reset_index(drop=True)
输出(对于您的示例数据):
calories duration
0 80 4
1 90 5
2 10 3
当 df1 和 df2 中的卡路里相同时,我想将一些行从 df1 移动到 df2。两个dfs有相同的列。
import numpy as np
import pandas as pd
np.random.seed(0)
df1 = pd.DataFrame(data = {
"calories": [420, 80, 90, 10],
"duration": [50, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
"calories": [420, 380, 390],
"duration": [60, 40, 45]
})
print(df1)
print(df2)
calories duration
0 420 50
1 80 4
2 90 5
3 10 2
calories duration
0 420 60
1 380 40
2 390 45
rows = df1.loc[df1.calories == df2.calories, :]
df2 = df2.append(rows, ignore_index=True)
df1.drop(rows.index, inplace=True)
print('df1:')
print(df1)
print('df2:')
print(df2)
然后报这个错:
raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects
编辑:解决方案
import numpy as np
import pandas as pd
np.random.seed(0)
df1 = pd.DataFrame(data = {
"mid": [420, 380, 90, 420],
"A": [50, 4, 5, 3],
"B": [420, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
"mid": [420, 380, 390],
"A": [60, 40, 80],
"B": [150, 24, 25]
})
print('df1:')
print(df1)
print('df2:')
print(df2)
new_df1 = df1[~df1.mid.isin(df2.mid)]
dup_df1 = df1[df1.mid.isin(df2.mid)]
new_df2 = df2.append(dup_df1, ignore_index=True)
print('dup:')
print(dup_df1)
print('df1:')
print(new_df1)
print('df2:')
print(new_df2)
由于你的数据帧长度不一样,你需要使用merge
to find rows with common calories
values. You need to merge on the index
and calories
values; that can most easily be achieved by using reset_index
临时添加一个index
列来合并:
dftemp = df1.reset_index().merge(df2.reset_index(), on=['index', 'calories'], suffixes=['', '_y'])
输出:
index calories duration duration_y
0 0 420 50 60
您现在可以 concat
从 dftemp
到 df2
的 calories
和 duration
值(再次使用 reset_index
重置索引):
df2 = pd.concat([df2, dftemp[['calories', 'duration']]]).reset_index(drop=True)
输出(对于您的示例数据):
calories duration
0 420 60
1 380 40
2 390 45
3 420 50
要删除从 df1
复制到 df2
的行,我们仅在索引上合并,然后过滤掉两个 calories
值不同的行:
dftemp = df1.merge(df2, left_index=True, right_index=True, suffixes=['', '_y']).query('calories != calories_y')
df1 = dftemp[['calories', 'duration']].reset_index(drop=True)
输出(对于您的示例数据):
calories duration
0 80 4
1 90 5
2 10 3