Pandas:处理缺失数据时的真值条件是什么
Pandas: what is truth value condition when dealing with missing data
我有一个创建比率的函数。它被定义为
def create_ratio(data,num,den):
if data[num].isnull():
ratio = -9997
if data[den].isnull():
ratio = -9998
if data[num].isnull() & data[den].isnull():
ratio = -9999
else:
ratio = data[num]/data[den]
return ratio
我有 pandas 数据框 (df_credit),其中包括信用卡余额 (cc_bal) 和限额 (cc_limit),我想计算信用卡利用率余额超过限制
df_credit['cc_util'] = create_ratio(df_credit,'cc_bal','cc_limit')
我收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-66-d53809a7690d> in <module>
----> 1 data['ratio_cc_util'] = create_ratio(data,'open_credit_card_credit_limit_nomiss','open_credit_card_credit_limit_nomiss')
2 data['ratio_cc_util'].hist()
<ipython-input-65-99bc55b184ed> in create_ratio(data, num, den)
1 def create_ratio(data,num,den):
----> 2 if data[num].isnull():
3 ratio = -9997
4 if data[den].isnull():
5 ratio = -9998
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
1441 def __nonzero__(self):
1442 raise ValueError(
-> 1443 f"The truth value of a {type(self).__name__} is ambiguous. "
1444 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1445 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
这个错误的解决方法是什么?谢谢
- 您混合使用标量和级数,您的函数需要 return 给定调用上下文的级数或数组
- 实现此条件逻辑的最简单方法是
np.select()
- 有模拟数据,包括满足您用例的缺失值
df = pd.DataFrame({
"cc_bal": np.random.uniform(200, 1000, 200),
"cc_limit": np.random.uniform(800, 1200, 200),})
df.loc[np.unique(np.random.choice(range(len(df)), 30)), "cc_bal"] = None
df.loc[np.unique(np.random.choice(range(len(df)), 30)), "cc_limit"] = None
def create_ratio(df, num, den):
return np.select(
[
df[num].isnull() & df[den].isnull(),
df[num].isnull(),
df[den].isnull(),
],
[-9999, -9997, -9998],
df[num] / df[den],
)
df["ratio"] = create_ratio(df, "cc_bal", "cc_limit")
df
示例输出
cc_bal
cc_limit
ratio
0
372.633
981.996
0.379465
1
845.541
1133.69
0.745831
2
449.406
975.903
0.460503
3
209.827
922.829
0.227374
4
237.347
936.654
0.253398
5
351.154
nan
-9998
6
nan
873.671
-9997
7
803.396
861.791
0.93224
8
591.136
807.176
0.732352
9
675.397
847.059
0.797344
我有一个创建比率的函数。它被定义为
def create_ratio(data,num,den):
if data[num].isnull():
ratio = -9997
if data[den].isnull():
ratio = -9998
if data[num].isnull() & data[den].isnull():
ratio = -9999
else:
ratio = data[num]/data[den]
return ratio
我有 pandas 数据框 (df_credit),其中包括信用卡余额 (cc_bal) 和限额 (cc_limit),我想计算信用卡利用率余额超过限制
df_credit['cc_util'] = create_ratio(df_credit,'cc_bal','cc_limit')
我收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-66-d53809a7690d> in <module>
----> 1 data['ratio_cc_util'] = create_ratio(data,'open_credit_card_credit_limit_nomiss','open_credit_card_credit_limit_nomiss')
2 data['ratio_cc_util'].hist()
<ipython-input-65-99bc55b184ed> in create_ratio(data, num, den)
1 def create_ratio(data,num,den):
----> 2 if data[num].isnull():
3 ratio = -9997
4 if data[den].isnull():
5 ratio = -9998
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
1441 def __nonzero__(self):
1442 raise ValueError(
-> 1443 f"The truth value of a {type(self).__name__} is ambiguous. "
1444 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1445 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
这个错误的解决方法是什么?谢谢
- 您混合使用标量和级数,您的函数需要 return 给定调用上下文的级数或数组
- 实现此条件逻辑的最简单方法是
np.select()
- 有模拟数据,包括满足您用例的缺失值
df = pd.DataFrame({
"cc_bal": np.random.uniform(200, 1000, 200),
"cc_limit": np.random.uniform(800, 1200, 200),})
df.loc[np.unique(np.random.choice(range(len(df)), 30)), "cc_bal"] = None
df.loc[np.unique(np.random.choice(range(len(df)), 30)), "cc_limit"] = None
def create_ratio(df, num, den):
return np.select(
[
df[num].isnull() & df[den].isnull(),
df[num].isnull(),
df[den].isnull(),
],
[-9999, -9997, -9998],
df[num] / df[den],
)
df["ratio"] = create_ratio(df, "cc_bal", "cc_limit")
df
示例输出
cc_bal | cc_limit | ratio | |
---|---|---|---|
0 | 372.633 | 981.996 | 0.379465 |
1 | 845.541 | 1133.69 | 0.745831 |
2 | 449.406 | 975.903 | 0.460503 |
3 | 209.827 | 922.829 | 0.227374 |
4 | 237.347 | 936.654 | 0.253398 |
5 | 351.154 | nan | -9998 |
6 | nan | 873.671 | -9997 |
7 | 803.396 | 861.791 | 0.93224 |
8 | 591.136 | 807.176 | 0.732352 |
9 | 675.397 | 847.059 | 0.797344 |