根据另一个数据帧中存在的阈值对数据帧的选定列执行操作
perform operation on selected columns of dataframe based on threshold present in another dataframe
我有一个数据框
df1 = pd.DataFrame([["A",1,98,56,61,1,4,6], ["B",1,79,54,36,2,5,7], ["C",1,97,32,83,3,6,8],["B",1,96,31,90,4,7,9], ["C",1,45,32,12,5,8,10], ["A",1,67,33,55,6,9,11]], columns=["id","date","c1","c2","c3","x","y","z"])
我有另一个数据框,其中存在选定列的条件
df2 = pd.DataFrame([["c2",40], ["c1",80], ["C3",90]], columns=["col","condition"])
根据 df2 中存在的条件对 df1 执行操作。就像 df2 中 c1 的值为 80 一样,如果值小于 80,则将 df1 的 c1 列中存在的值更改为 -1,如果高于 80,则将值更改为 1。对存在的其他列执行类似的操作在 df2 中也是。
预期输出:
df_out = pd.DataFrame([["A",1,1,1,-1,1,4,6], ["B",1,-1,1,-1,2,5,7], ["C",1,1,-1,-1,3,6,8],["B",1,1,-1,1,4,7,9], ["C",1,-1,-1,-1,5,8,10], ["A",1,-1,-1,-1,6,9,11]], columns=["id","date","c1","c2","c3","x","y","z"])
怎么做?
首先将 df2
转换为 Series
,然后创建用于与列名进行比较的掩码,为大于或等于 DataFrame.ge
by Series
and pass to numpy.where
:
进行压缩
s = df2.set_index('col')['condition']
m = df1.columns.isin(s.index)
df1.loc[:, m] = np.where(df1.loc[:, m].ge(s), 1, -1)
print (df1)
id date c1 c2 c3 x y z
0 A 1 1 1 -1 1 4 6
1 B 1 -1 1 -1 2 5 7
2 C 1 1 -1 -1 3 6 8
3 B 1 1 -1 1 4 7 9
4 C 1 -1 -1 -1 5 8 10
5 A 1 -1 -1 -1 6 9 11
我有一个数据框
df1 = pd.DataFrame([["A",1,98,56,61,1,4,6], ["B",1,79,54,36,2,5,7], ["C",1,97,32,83,3,6,8],["B",1,96,31,90,4,7,9], ["C",1,45,32,12,5,8,10], ["A",1,67,33,55,6,9,11]], columns=["id","date","c1","c2","c3","x","y","z"])
我有另一个数据框,其中存在选定列的条件
df2 = pd.DataFrame([["c2",40], ["c1",80], ["C3",90]], columns=["col","condition"])
根据 df2 中存在的条件对 df1 执行操作。就像 df2 中 c1 的值为 80 一样,如果值小于 80,则将 df1 的 c1 列中存在的值更改为 -1,如果高于 80,则将值更改为 1。对存在的其他列执行类似的操作在 df2 中也是。
预期输出:
df_out = pd.DataFrame([["A",1,1,1,-1,1,4,6], ["B",1,-1,1,-1,2,5,7], ["C",1,1,-1,-1,3,6,8],["B",1,1,-1,1,4,7,9], ["C",1,-1,-1,-1,5,8,10], ["A",1,-1,-1,-1,6,9,11]], columns=["id","date","c1","c2","c3","x","y","z"])
怎么做?
首先将 df2
转换为 Series
,然后创建用于与列名进行比较的掩码,为大于或等于 DataFrame.ge
by Series
and pass to numpy.where
:
s = df2.set_index('col')['condition']
m = df1.columns.isin(s.index)
df1.loc[:, m] = np.where(df1.loc[:, m].ge(s), 1, -1)
print (df1)
id date c1 c2 c3 x y z
0 A 1 1 1 -1 1 4 6
1 B 1 -1 1 -1 2 5 7
2 C 1 1 -1 -1 3 6 8
3 B 1 1 -1 1 4 7 9
4 C 1 -1 -1 -1 5 8 10
5 A 1 -1 -1 -1 6 9 11