使用逻辑(布尔)表达式切片 Pandas Dataframe
Slicing with a logical (boolean) expression a Pandas Dataframe
当我尝试使用逻辑表达式对我的 Pandas 数据帧进行切片时出现异常。
我的数据具有以下形式:
df
GDP_norm SP500_Index_deflated_norm
Year
1980 2.121190 0.769400
1981 2.176224 0.843933
1982 2.134638 0.700833
1983 2.233525 0.829402
1984 2.395658 0.923654
1985 2.497204 0.922986
1986 2.584896 1.09770
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 38 entries, 1980 to 2017
Data columns (total 2 columns):
GDP_norm 38 non-null float64
SP500_Index_deflated_norm 38 non-null float64
dtypes: float64(2)
memory usage: 912.0 bytes
命令如下:
df[((df['GDP_norm'] >=3.5 & df['GDP_norm'] <= 4.5) & (df['SP500_Index_deflated_norm'] > 3)) | (
(df['GDP_norm'] >= 4.0 & df['GDP_norm'] <= 5.0) & (df['SP500_Index_deflated_norm'] < 3.5))]
错误信息如下:
TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
我建议单独创建布尔掩码以获得更好的可读性和更容易的错误处理。
m1
和 m2
代码中缺少 ()
,问题在于运算符优先级:
docs - 6.16。运算符优先级 &
的优先级高于 >=
:
Operator Description
lambda Lambda expression
if – else Conditional expression
or Boolean OR
and Boolean AND
not x Boolean NOT
in, not in, is, is not, Comparisons, including membership tests
<, <=, >, >=, !=, == and identity tests
| Bitwise OR
^ Bitwise XOR
& Bitwise AND
(expressions...), [expressions...], Binding or tuple display, list display,
{key: value...}, {expressions...} dictionary display, set display
m1 = (df['GDP_norm'] >=3.5) & (df['GDP_norm'] <= 4.5)
m2 = (df['GDP_norm'] >= 4.0) & (df['GDP_norm'] <= 5.0)
m3 = m1 & (df['SP500_Index_deflated_norm'] > 3)
m4 = m2 & (df['SP500_Index_deflated_norm'] < 3.5)
df[m3 | m4]
您正在遭受 chained comparisons 的影响。发生的事情是表达式 df['GDP_norm'] >=3.5 & df['GDP_norm'] <= 4.5
被评估为:
df['GDP_norm'] >= (3.5 & df['GDP_norm']) <= 4.5
当然,这会失败,因为 float
无法与 bool
进行比较,如您的错误消息中所述。相反,使用括号来隔离每个布尔掩码并分配给变量:
m1 = (df['GDP_norm'] >= 3.5) & (df['GDP_norm'] <= 4.5)
m2 = df['SP500_Index_deflated_norm'] > 3
m3 = (df['GDP_norm'] >= 4.0) & (df['GDP_norm'] <= 5.0)
m4 = df['SP500_Index_deflated_norm'] < 3.5
res = df[(m1 & m2) | (m3 & m4)]
当我尝试使用逻辑表达式对我的 Pandas 数据帧进行切片时出现异常。
我的数据具有以下形式:
df
GDP_norm SP500_Index_deflated_norm
Year
1980 2.121190 0.769400
1981 2.176224 0.843933
1982 2.134638 0.700833
1983 2.233525 0.829402
1984 2.395658 0.923654
1985 2.497204 0.922986
1986 2.584896 1.09770
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 38 entries, 1980 to 2017
Data columns (total 2 columns):
GDP_norm 38 non-null float64
SP500_Index_deflated_norm 38 non-null float64
dtypes: float64(2)
memory usage: 912.0 bytes
命令如下:
df[((df['GDP_norm'] >=3.5 & df['GDP_norm'] <= 4.5) & (df['SP500_Index_deflated_norm'] > 3)) | (
(df['GDP_norm'] >= 4.0 & df['GDP_norm'] <= 5.0) & (df['SP500_Index_deflated_norm'] < 3.5))]
错误信息如下:
TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
我建议单独创建布尔掩码以获得更好的可读性和更容易的错误处理。
m1
和 m2
代码中缺少 ()
,问题在于运算符优先级:
docs - 6.16。运算符优先级 &
的优先级高于 >=
:
Operator Description
lambda Lambda expression
if – else Conditional expression
or Boolean OR
and Boolean AND
not x Boolean NOT
in, not in, is, is not, Comparisons, including membership tests
<, <=, >, >=, !=, == and identity tests
| Bitwise OR
^ Bitwise XOR
& Bitwise AND
(expressions...), [expressions...], Binding or tuple display, list display,
{key: value...}, {expressions...} dictionary display, set display
m1 = (df['GDP_norm'] >=3.5) & (df['GDP_norm'] <= 4.5)
m2 = (df['GDP_norm'] >= 4.0) & (df['GDP_norm'] <= 5.0)
m3 = m1 & (df['SP500_Index_deflated_norm'] > 3)
m4 = m2 & (df['SP500_Index_deflated_norm'] < 3.5)
df[m3 | m4]
您正在遭受 chained comparisons 的影响。发生的事情是表达式 df['GDP_norm'] >=3.5 & df['GDP_norm'] <= 4.5
被评估为:
df['GDP_norm'] >= (3.5 & df['GDP_norm']) <= 4.5
当然,这会失败,因为 float
无法与 bool
进行比较,如您的错误消息中所述。相反,使用括号来隔离每个布尔掩码并分配给变量:
m1 = (df['GDP_norm'] >= 3.5) & (df['GDP_norm'] <= 4.5)
m2 = df['SP500_Index_deflated_norm'] > 3
m3 = (df['GDP_norm'] >= 4.0) & (df['GDP_norm'] <= 5.0)
m4 = df['SP500_Index_deflated_norm'] < 3.5
res = df[(m1 & m2) | (m3 & m4)]