比较未对齐的系列列 Pandas

Question

比较 2 个不同大小的系列对象：

IN[248]:df['Series value 1']
Out[249]: 
0     70
1     66.5
2     68
3     60
4     100
5     12
Name: Stu_perc, dtype: int64

IN[250]:benchmark_value 
#benchamrk is a subset of data from df2 only based on certain filters
Out[251]: 
0    70
Name: Stu_perc, dtype: int64

基本上我想比较 df['Series value 1'] 与 benchmark_value 和 return 列匹配列表中大于基准值 95% 的值。这两个的类型都是 Pandas 系列。但是两者的尺寸不同，因此无法进行比较。

给定输入：

IN[252]:df['Matching list']=(df2['Series value 1']>=0.95*benchmark_value)
OUT[253]: ValueError: Can only compare identically-labeled Series objects

想要的输出：

 [IN]:
 df['Matching list']=(df2['Stu_perc']>=0.95*benchmark_value)
 #0.95*Benchmark value is 66.5 in this case.

 df['Matching list']
 [OUT]:
0     70
1     66.5
2     68
3     NULL
4     100
5     NULL

Answer 1

因为 benchmark_value 是 Series，对于标量需要 select Series 的第一个值 Series.iat and set NaNs by Series.where:

benchmark_value = pd.Series([70], index=[0])

val = benchmark_value.iat[0]
df2['Matching list']= df2['Stu_perc'].where(df2['Stu_perc']>=0.95*val)
print (df2)
     Stu_perc Matching list
0       70.0           70.0
1       66.5           66.5
2       68.0           68.0
3       60.0            NaN
4      100.0          100.0
5       12.0            NaN

如果 benchmark_value 为空，一般解决方案也有效是 next，iter 用于 return Series 的第一个值，如果不存在则使用默认值- 这里 0:

benchmark_value = pd.Series([])

val = next(iter(benchmark_value), 0)
df2['Matching list']= df2['Stu_perc'].where(df2['Stu_perc']>=0.95*val)
print (df2)
    Stu_perc  Matching list
0       70.0           70.0
1       66.5           66.5
2       68.0           68.0
3       60.0           60.0
4      100.0          100.0
5       12.0           12.0

Answer 2

基准值似乎是单行系列，所以不是实际数字，我相信您需要先访问它。

但这将 return 布尔值列表。要仅获取所需的值，您可以使用 where 函数。

试试这个：

df['Matching list']= df2['Stu_perc'].where(df2['Stu_perc'] >=0.95*benchmark_value[0][0]))

Answer 3

你的基准值是单值吗？

如果是，您可能需要使用 df['Matching list']=(df['Stu_perc']>=0.95*benchmark_value.values)

将序列 benchmark_value 转换为数字（无索引）

比较未对齐的系列列 Pandas

Compare Misaligned Series columns Pandas

python

series

pandas

valueerror