处理 Pandas 中的异常值
Dealing with outliers in Pandas
美好的一天。问题如下 - 当试图从 table
中的一列中删除离群值时
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy import stats
import numpy as np
df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/LargeData/m2_survey_data.csv")
df["ConvertedComp"].plot(kind="box", figsize=(10,10))
z_scores = stats.zscore(df["ConvertedComp"])
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=1)
new_df = df[filtered_entries]
以下错误崩溃。
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
<ipython-input-133-7811da442811> in <module>
4 z_scores
5 abs_z_scores = np.abs(z_scores)
----> 6 filtered_entries = (abs_z_scores < 3).all(axis=1)
7 #new_df = df[filtered_entries]
C:\ProgramData\WatsonStudioDesktop\miniconda3\envs\desktop\lib\site-packages\numpy\core\_methods.py in _all(a, axis, dtype, out, keepdims)
44
45 def _all(a, axis=None, dtype=None, out=None, keepdims=False):
---> 46 return umr_all(a, axis, dtype, out, keepdims)
47
48 def _count_reduce_items(arr, axis):
AxisError: axis 1 is out of bounds for array of dimension 1
多谢指教,思路差不多了
您的 zscore
仅在 1 列上计算,因此结果是一个一维数组
z_scores = stats.zscore(df["ConvertedComp"])
new_df = df[np.abs(z_scores) < 3]
现在,如果您 运行 zscore
处理多个列,那么您的原始代码会起作用:
z_scores = stats.zscore(df[["ConvertedComp", 'AnotherColumn']])
new_df = df[(np.abs(z_scores) < 3).all(axis=1)]
美好的一天。问题如下 - 当试图从 table
中的一列中删除离群值时import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy import stats
import numpy as np
df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/LargeData/m2_survey_data.csv")
df["ConvertedComp"].plot(kind="box", figsize=(10,10))
z_scores = stats.zscore(df["ConvertedComp"])
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=1)
new_df = df[filtered_entries]
以下错误崩溃。
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
<ipython-input-133-7811da442811> in <module>
4 z_scores
5 abs_z_scores = np.abs(z_scores)
----> 6 filtered_entries = (abs_z_scores < 3).all(axis=1)
7 #new_df = df[filtered_entries]
C:\ProgramData\WatsonStudioDesktop\miniconda3\envs\desktop\lib\site-packages\numpy\core\_methods.py in _all(a, axis, dtype, out, keepdims)
44
45 def _all(a, axis=None, dtype=None, out=None, keepdims=False):
---> 46 return umr_all(a, axis, dtype, out, keepdims)
47
48 def _count_reduce_items(arr, axis):
AxisError: axis 1 is out of bounds for array of dimension 1
多谢指教,思路差不多了
您的 zscore
仅在 1 列上计算,因此结果是一个一维数组
z_scores = stats.zscore(df["ConvertedComp"])
new_df = df[np.abs(z_scores) < 3]
现在,如果您 运行 zscore
处理多个列,那么您的原始代码会起作用:
z_scores = stats.zscore(df[["ConvertedComp", 'AnotherColumn']])
new_df = df[(np.abs(z_scores) < 3).all(axis=1)]