在较大列中查找大于或等于较短搜索列中每个值的第一个值
Find the first value in larger column greater than or equal to each value in shorter search column
我一直在尝试寻找一种矢量化方法来获取大列(> 500k 行)中第一个值的索引大于或等于较短列(~9k 行)中每个值的索引。
目前正在循环遍历较短列中的每个值,并将其与整个较大列进行比较。循环数 = 较短列的长度。
np.random.seed(2)
veclong = np.random.randint(0, 1000, 100000)
vecshort = np.random.randint(0, 1000, 500)
dfShort=pd.DataFrame(data=vecshort/10000, columns=['Short'])
dfLong=pd.DataFrame(data=veclong/10000, columns=['Long'])
c1=len(dfShort)
out2=[];
for n1 in range(c1):
val=dfShort['Short'].iloc[n1]
dfAns=dfLong[dfLong>=val].dropna()
ans=dfAns['Long'].iloc[0]
idx=dfAns.index[0]
out=[ans,idx]
out2.extend([out])
out2=np.asarray(out2)
dfShort['Location']=out2[:,1]
dfShort['Value']=out2[:,0]
您应该考虑以下几点:
def myfunc(x):
try:
return dfLong[dfLong.Long>=x].index[0]
except:
return None
dfShort['Location'] = dfShort.Short.apply(lambda x: myfunc(x))
dfShort['Value'] = dfShort.Location.apply(lambda x: dfLong.iloc[x, 0] if x!= None else None)
print(dfShort.head())
输出
+----+---------+-----------+--------+
| | Short | Location | Value |
+----+---------+-----------+--------+
| 0 | 0.0636 | 10 | 0.0674 |
| 1 | 0.0876 | 27 | 0.0938 |
| 2 | 0.0799 | 16 | 0.0831 |
| 3 | 0.0977 | 95 | 0.0997 |
| 4 | 0.0602 | 10 | 0.0674 |
+----+---------+-----------+--------+
我一直在尝试寻找一种矢量化方法来获取大列(> 500k 行)中第一个值的索引大于或等于较短列(~9k 行)中每个值的索引。
目前正在循环遍历较短列中的每个值,并将其与整个较大列进行比较。循环数 = 较短列的长度。
np.random.seed(2)
veclong = np.random.randint(0, 1000, 100000)
vecshort = np.random.randint(0, 1000, 500)
dfShort=pd.DataFrame(data=vecshort/10000, columns=['Short'])
dfLong=pd.DataFrame(data=veclong/10000, columns=['Long'])
c1=len(dfShort)
out2=[];
for n1 in range(c1):
val=dfShort['Short'].iloc[n1]
dfAns=dfLong[dfLong>=val].dropna()
ans=dfAns['Long'].iloc[0]
idx=dfAns.index[0]
out=[ans,idx]
out2.extend([out])
out2=np.asarray(out2)
dfShort['Location']=out2[:,1]
dfShort['Value']=out2[:,0]
您应该考虑以下几点:
def myfunc(x):
try:
return dfLong[dfLong.Long>=x].index[0]
except:
return None
dfShort['Location'] = dfShort.Short.apply(lambda x: myfunc(x))
dfShort['Value'] = dfShort.Location.apply(lambda x: dfLong.iloc[x, 0] if x!= None else None)
print(dfShort.head())
输出
+----+---------+-----------+--------+
| | Short | Location | Value |
+----+---------+-----------+--------+
| 0 | 0.0636 | 10 | 0.0674 |
| 1 | 0.0876 | 27 | 0.0938 |
| 2 | 0.0799 | 16 | 0.0831 |
| 3 | 0.0977 | 95 | 0.0997 |
| 4 | 0.0602 | 10 | 0.0674 |
+----+---------+-----------+--------+