pandas 不使用 sql 合并
pandas merge not using sql
我有这两个数据框。
jj1a
,driver_trip,prob,rprob
0,1_3,1.0,1
1,1_5,1.0,1
2,1_9,1.0,1
3,1_11,1.0,1
4,1_12,1.0,1
5,1_15,1.0,1
6,1_17,1.0,1
7,1_31,1.0,1
8,1_33,1.0,1
9,1_43,1.0,1
jjra
,driver_trip,概率,r概率
0,1_1,1.0,0.0
1,1_2,1.0,0.0
2,1_3,1.0,0.0
3,1_4,1.0,0.0
4,1_5,1.0,0.0
5,1_6,1.0,0.0
6,1_7,1.0,0.0
7,1_8,1.0,0.0
8,1_9,1.0,0.0
9,1_10,1.0,0.0
</pre>
这是我想要的输出:
rrss3
,driver_trip,概率
0,1_1,0.0
1,1_10,0.0
2,1_2,0.0
3,1_3,1.0
4,1_4,0.0
5,1_5,1.0
6,1_6,0.0
7,1_7,0.0
8,1_8,0.0
9,1_9,1.0
</pre>
我设法做到了,但很笨拙。寻求更好的解决方案。
我的太阳:
将 pandas 导入为 pd
从 pandasql 导入 sqldf</p>
<p>kkm=pd.merge(jj1a, jjra, left_on='driver_trip', right_on='driver_trip', how='right',排序=真)</p>
<p>qqq="""select driver_trip ,rprob_x prob from kkm where rprob_x=1 and rprob_y=0 union select driver_trip ,rprob_y prob from kkm where rprob_x is null and rprob_y = 0;"""</p>
<p>rrss3 = sqldf(qqq,locals())
</pre>
我打算不使用 sqldf 只是使用合并但无法弄清楚..
我得到了一些在合并后过滤 bull 的示例,但不确定下一步是什么..
kkm[pd.isnull(kkm).any(axis=1)]
kkm[-pd.isnull(kkm).any(axis=1)]
是否也可以按 driver_trip 列的顺序排序
例如2_1 应该在 100_1 之前,但我不知道该怎么做。我可以在 oracle sql 中完成。
</p>
<pre><code>with mk3 as (
select '1_1' driver_trip, 1 prob from dual
union
select '2_100' driver_trip ,0 prob from dual
union
select '100_1' driver_trip, 0 prob from dual
union
select '2_2' driver_trip, 0 prob from dual
union
select '1_100' driver_trip,1 prob from dual
)
select driver_trip,prob
from mk3 order by (
to_number(substr(driver_trip,1,instr(driver_trip,'_')-1))
),to_number(substr(driver_trip,instr(driver_trip,'_')+1,length(driver_trip)-instr(driver_trip,'_')))
继续使用基于 merge
的解决方案。
您可以乘以 prob
值。
rrss3 = pd.merge(jj1a, jjra, on='driver_trip', how='right').fillna(0.0)
rrss3['prob'] = rrss3['prob_x'] * rrss3['prob_y']
rrss3[['driver_trip', 'prob']].sort('driver_trip')
或者您可以使用 numpy.where
rrss3 = pd.merge(jj1a, jjra, on='driver_trip', how='right').fillna(0.0)
rrss3['prob'] = np.where((rrss3['prob_x'] == 1) & (rrss3['prob_y'] == 1), 1, 0)
rrss3[['driver_trip', 'prob']].sort('driver_trip')
我有这两个数据框。
jj1a
,driver_trip,prob,rprob
0,1_3,1.0,1
1,1_5,1.0,1
2,1_9,1.0,1
3,1_11,1.0,1
4,1_12,1.0,1
5,1_15,1.0,1
6,1_17,1.0,1
7,1_31,1.0,1
8,1_33,1.0,1
9,1_43,1.0,1
jjra
,driver_trip,概率,r概率 0,1_1,1.0,0.0 1,1_2,1.0,0.0 2,1_3,1.0,0.0 3,1_4,1.0,0.0 4,1_5,1.0,0.0 5,1_6,1.0,0.0 6,1_7,1.0,0.0 7,1_8,1.0,0.0 8,1_9,1.0,0.0 9,1_10,1.0,0.0 </pre>
这是我想要的输出:
rrss3
,driver_trip,概率 0,1_1,0.0 1,1_10,0.0 2,1_2,0.0 3,1_3,1.0 4,1_4,0.0 5,1_5,1.0 6,1_6,0.0 7,1_7,0.0 8,1_8,0.0 9,1_9,1.0 </pre>
我设法做到了,但很笨拙。寻求更好的解决方案。
我的太阳:
将 pandas 导入为 pd 从 pandasql 导入 sqldf</p> <p>kkm=pd.merge(jj1a, jjra, left_on='driver_trip', right_on='driver_trip', how='right',排序=真)</p> <p>qqq="""select driver_trip ,rprob_x prob from kkm where rprob_x=1 and rprob_y=0 union select driver_trip ,rprob_y prob from kkm where rprob_x is null and rprob_y = 0;"""</p> <p>rrss3 = sqldf(qqq,locals()) </pre>
我打算不使用 sqldf 只是使用合并但无法弄清楚.. 我得到了一些在合并后过滤 bull 的示例,但不确定下一步是什么..
kkm[pd.isnull(kkm).any(axis=1)] kkm[-pd.isnull(kkm).any(axis=1)]
是否也可以按 driver_trip 列的顺序排序 例如2_1 应该在 100_1 之前,但我不知道该怎么做。我可以在 oracle sql 中完成。
</p> <pre><code>with mk3 as ( select '1_1' driver_trip, 1 prob from dual union select '2_100' driver_trip ,0 prob from dual union select '100_1' driver_trip, 0 prob from dual union select '2_2' driver_trip, 0 prob from dual union select '1_100' driver_trip,1 prob from dual ) select driver_trip,prob from mk3 order by ( to_number(substr(driver_trip,1,instr(driver_trip,'_')-1)) ),to_number(substr(driver_trip,instr(driver_trip,'_')+1,length(driver_trip)-instr(driver_trip,'_')))
继续使用基于 merge
的解决方案。
您可以乘以 prob
值。
rrss3 = pd.merge(jj1a, jjra, on='driver_trip', how='right').fillna(0.0)
rrss3['prob'] = rrss3['prob_x'] * rrss3['prob_y']
rrss3[['driver_trip', 'prob']].sort('driver_trip')
或者您可以使用 numpy.where
rrss3 = pd.merge(jj1a, jjra, on='driver_trip', how='right').fillna(0.0)
rrss3['prob'] = np.where((rrss3['prob_x'] == 1) & (rrss3['prob_y'] == 1), 1, 0)
rrss3[['driver_trip', 'prob']].sort('driver_trip')