按索引在 pandas 列的列表中查找值
Finding value in list in pandas column by index
我有一个包含许多列的数据框,这些列是值列表,是浮点数。我想根据索引在这些列表中找到一个或多个元素,该索引是单独列中的一个或多个整数。例如
Results Index_1 Index_2
0 [2.347, 140.8, 1010.8, 723.7, 7, 0, 2.898, 9.1... 0 [0, 4, 6]
1 [93794, 112.7, 5.014, 0, 1778.1, 3473.82, 0, 3... 1 [1, 5]
2 [2.927, 12.647, 9047, 0, 1204.5, 13.4, 6.3, 4.... 2 [2, 0]
3 [0, 1.801, 7.104, 2121.2, 20.375, 6.348, 11.35... 2 [2, 9, 3]
我想要创建两列,其中包含基于索引和 Index_2 的结果值。例如
Results Index_1 Index_2 Outcome Outcome_2
0 [2.347, 140.8, 1010.8, 723.7, 7, 0, 2.898, 9.1... 0 [0, 4, 6] 2.347 [2.347, 7, 2.898]
1 [93794, 112.7, 5.014, 0, 1778.1, 3473.82, 0, 3... 1 [1, 5] 112.700 [112.7, 3473.82]
2 [2.927, 12.647, 9047, 0, 1204.5, 13.4, 6.3, 4.... 2 [2, 0] 9047 [9047, 2.927]
3 [0, 1.801, 7.104, 2121.2, 20.375, 6.348, 11.35... 2 [2, 9, 3] 7.104 [7.104, 12531.5, 20.375]
结果在数量上有所不同,其中一个指数也是如此。
出于某种原因,index
函数无法正常工作,我收到的错误包括:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
这是我运行df['Outcome'] = df['Results'].index['Index_1']
的时候。我玩过代码,但似乎无法让它工作。
我已经检查了列的 dtype,它们 return list
用于结果,numpy.int64
用于 Index_1。
您可以应用可以执行此操作的 lambda 函数:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['Results','Index_1','Index_2'],
data=[[[2.347, 140.8, 1010.8, 723.7, 7, 0, 2.898, 9.1], 0, [0, 4, 6]],
[[93794, 112.7, 5.014, 0, 1778.1, 3473.82, 0, 3],1, [1, 5]],
[[2.927, 12.647, 9047, 0, 1204.5, 13.4, 6.3, 4],2, [2, 0]],
[[0, 1.801, 7.104, 2121.2, 20.375, 6.348, 11.35],2, [2, 6, 3]]])
for x in [1,2]:
df[f'Outcome_{x}'] = df.apply(lambda row: (np.array(row['Results'])[row[f'Index_{x}']]).tolist(), axis=1)
输出:
print(df.to_string())
Results Index_1 Index_2 Outcome_1 Outcome_2
0 [2.347, 140.8, 1010.8, 723.7, 7, 0, 2.898, 9.1] 0 [0, 4, 6] 2.347 [2.347, 7.0, 2.898]
1 [93794, 112.7, 5.014, 0, 1778.1, 3473.82, 0, 3] 1 [1, 5] 112.700 [112.7, 3473.82]
2 [2.927, 12.647, 9047, 0, 1204.5, 13.4, 6.3, 4] 2 [2, 0] 9047.000 [9047.0, 2.927]
3 [0, 1.801, 7.104, 2121.2, 20.375, 6.348, 11.35] 2 [2, 6, 3] 7.104 [7.104, 11.35, 2121.2]
我有一个包含许多列的数据框,这些列是值列表,是浮点数。我想根据索引在这些列表中找到一个或多个元素,该索引是单独列中的一个或多个整数。例如
Results Index_1 Index_2
0 [2.347, 140.8, 1010.8, 723.7, 7, 0, 2.898, 9.1... 0 [0, 4, 6]
1 [93794, 112.7, 5.014, 0, 1778.1, 3473.82, 0, 3... 1 [1, 5]
2 [2.927, 12.647, 9047, 0, 1204.5, 13.4, 6.3, 4.... 2 [2, 0]
3 [0, 1.801, 7.104, 2121.2, 20.375, 6.348, 11.35... 2 [2, 9, 3]
我想要创建两列,其中包含基于索引和 Index_2 的结果值。例如
Results Index_1 Index_2 Outcome Outcome_2
0 [2.347, 140.8, 1010.8, 723.7, 7, 0, 2.898, 9.1... 0 [0, 4, 6] 2.347 [2.347, 7, 2.898]
1 [93794, 112.7, 5.014, 0, 1778.1, 3473.82, 0, 3... 1 [1, 5] 112.700 [112.7, 3473.82]
2 [2.927, 12.647, 9047, 0, 1204.5, 13.4, 6.3, 4.... 2 [2, 0] 9047 [9047, 2.927]
3 [0, 1.801, 7.104, 2121.2, 20.375, 6.348, 11.35... 2 [2, 9, 3] 7.104 [7.104, 12531.5, 20.375]
结果在数量上有所不同,其中一个指数也是如此。
出于某种原因,index
函数无法正常工作,我收到的错误包括:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
这是我运行df['Outcome'] = df['Results'].index['Index_1']
的时候。我玩过代码,但似乎无法让它工作。
我已经检查了列的 dtype,它们 return list
用于结果,numpy.int64
用于 Index_1。
您可以应用可以执行此操作的 lambda 函数:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['Results','Index_1','Index_2'],
data=[[[2.347, 140.8, 1010.8, 723.7, 7, 0, 2.898, 9.1], 0, [0, 4, 6]],
[[93794, 112.7, 5.014, 0, 1778.1, 3473.82, 0, 3],1, [1, 5]],
[[2.927, 12.647, 9047, 0, 1204.5, 13.4, 6.3, 4],2, [2, 0]],
[[0, 1.801, 7.104, 2121.2, 20.375, 6.348, 11.35],2, [2, 6, 3]]])
for x in [1,2]:
df[f'Outcome_{x}'] = df.apply(lambda row: (np.array(row['Results'])[row[f'Index_{x}']]).tolist(), axis=1)
输出:
print(df.to_string())
Results Index_1 Index_2 Outcome_1 Outcome_2
0 [2.347, 140.8, 1010.8, 723.7, 7, 0, 2.898, 9.1] 0 [0, 4, 6] 2.347 [2.347, 7.0, 2.898]
1 [93794, 112.7, 5.014, 0, 1778.1, 3473.82, 0, 3] 1 [1, 5] 112.700 [112.7, 3473.82]
2 [2.927, 12.647, 9047, 0, 1204.5, 13.4, 6.3, 4] 2 [2, 0] 9047.000 [9047.0, 2.927]
3 [0, 1.801, 7.104, 2121.2, 20.375, 6.348, 11.35] 2 [2, 6, 3] 7.104 [7.104, 11.35, 2121.2]