如何定义多变量的四分位数范围并绘制箱线图
How to define the Quartile range for multivariable and plot the box plot
如何使用箱形图绘制以下数据的异常值
no,store_id,revenue,profit,state,country
0,101,779183,281257,WD,India
1,101,144829,838451,WD,India
2,101,766465,757565,AL,Japan
代码在下面,代码在那里直到将数据转换为standardscalar
任何可以选择minmaxscalar
。之后How to define Quartile range
to define outliers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
df = pd.read_csv(r'anomaly.csv',index_col=False);
df1 = pd.get_dummies(data=df)
df2 = StandardScaler().fit_transform(df1)
盒须图按照惯例显示数据的第 25 个和第 75 个百分位数。
这是使用您提供的数据的中位数自动计算的。
例如,对于以下数据:
no,store_id,revenue,profit,state,country
0,101,779183,281257,WD,India
1,101,144829,838451,WD,India
2,101,766465,757565,AL,Japan
2,101,1000000,757565,AL,Italy
您可以为收入列显示如下箱线图:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
df = pd.read_csv(r'anomaly.csv',index_col=False)
df1 = pd.get_dummies(data=df)
df2 = StandardScaler().fit_transform(df1)
green_diamond = dict(markerfacecolor='g', marker='D')
fig1, ax1 = plt.subplots()
ax1.set_title('Box plot')
ax1.boxplot(df['revenue'], flierprops=green_diamond)
plt.show()
异常值显示:
如何使用箱形图绘制以下数据的异常值
no,store_id,revenue,profit,state,country
0,101,779183,281257,WD,India
1,101,144829,838451,WD,India
2,101,766465,757565,AL,Japan
代码在下面,代码在那里直到将数据转换为standardscalar
任何可以选择minmaxscalar
。之后How to define Quartile range
to define outliers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
df = pd.read_csv(r'anomaly.csv',index_col=False);
df1 = pd.get_dummies(data=df)
df2 = StandardScaler().fit_transform(df1)
盒须图按照惯例显示数据的第 25 个和第 75 个百分位数。
这是使用您提供的数据的中位数自动计算的。
例如,对于以下数据:
no,store_id,revenue,profit,state,country
0,101,779183,281257,WD,India
1,101,144829,838451,WD,India
2,101,766465,757565,AL,Japan
2,101,1000000,757565,AL,Italy
您可以为收入列显示如下箱线图:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
df = pd.read_csv(r'anomaly.csv',index_col=False)
df1 = pd.get_dummies(data=df)
df2 = StandardScaler().fit_transform(df1)
green_diamond = dict(markerfacecolor='g', marker='D')
fig1, ax1 = plt.subplots()
ax1.set_title('Box plot')
ax1.boxplot(df['revenue'], flierprops=green_diamond)
plt.show()
异常值显示: