为什么 Seaborn 一直在 x 轴上绘制不存在的范围值?
Why does Seaborn keep drawing non-existing range value on the x-axis?
片段:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
test = pd.DataFrame({'value':[1,2,5,7,8,10,11,12,15,16,18,20,36,37,39]})
test['range'] = pd.cut(test.value, np.arange(0,45,5)) # generate range
test = test.groupby('range')['value'].count().to_frame().reset_index() # count occurance in each range
test = test[test.value!=0] #filter out rows with value = 0
plt.figure(figsize=(10,5))
plt.xticks(rotation=90)
plt.yticks(np.arange(0,10, 1))
sns.barplot(x=test.range, y=test.value)
输出:
如果我们看看 test
中的内容:
range value
0 (0, 5] 3
1 (5, 10] 3
2 (10, 15] 3
3 (15, 20] 3
7 (35, 40] 3
范围(20,25], (25,30],(30,35]
已经被过滤掉了,但他们仍然出现在图中。这是为什么?如何输出没有空范围的图?
P.S。 @jezrael 的解决方案与上面的代码片段完美搭配。我在真实数据集上试过:
片段:
test['range'] = test['range'].cat.remove_unused_categories()
警告:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
我使用以下方法来避免警告:
test['range'].cat.remove_unused_categories(inplace=True)
这是由于使用了多个变量造成的,所以要注意:
test = blah blah blah
test_df = test[test.value!=0]
test_df['range'] = test_df['range'].cat.remove_unused_categories() # warning!
有问题 range
列是 categorical
, so categories are not removed by design like in another operations。
你需要Series.cat.remove_unused_categories
:
...
test = test[test.value!=0] #filter out rows with value = 0
print (test['range'])
0 (0, 5]
1 (5, 10]
2 (10, 15]
3 (15, 20]
7 (35, 40]
Name: range, dtype: category
Categories (8, interval[int64]):
[(0, 5] < (5, 10] < (10, 15] < (15, 20] < (20, 25] < (25, 30] < (30, 35] < (35, 40]]
test['range'] = test['range'].cat.remove_unused_categories()
print (test['range'])
0 (0, 5]
1 (5, 10]
2 (10, 15]
3 (15, 20]
7 (35, 40]
Name: range, dtype: category
Categories (5, interval[int64]):
[(0, 5] < (5, 10] < (10, 15] < (15, 20] < (35, 40]]
plt.figure(figsize=(10,5))
plt.xticks(rotation=90)
plt.yticks(np.arange(0,10, 1))
sns.barplot(x=test.range, y=test.value)
编辑:
你需要copy
:
test_df = test[test.value!=0].copy()
test_df['range'] = test_df['range'].cat.remove_unused_categories() # no warning!
如果您稍后修改 test_df
中的值,您会发现修改不会传播回原始数据 (test
),并且 Pandas 会发出警告。
片段:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
test = pd.DataFrame({'value':[1,2,5,7,8,10,11,12,15,16,18,20,36,37,39]})
test['range'] = pd.cut(test.value, np.arange(0,45,5)) # generate range
test = test.groupby('range')['value'].count().to_frame().reset_index() # count occurance in each range
test = test[test.value!=0] #filter out rows with value = 0
plt.figure(figsize=(10,5))
plt.xticks(rotation=90)
plt.yticks(np.arange(0,10, 1))
sns.barplot(x=test.range, y=test.value)
输出:
如果我们看看 test
中的内容:
range value
0 (0, 5] 3
1 (5, 10] 3
2 (10, 15] 3
3 (15, 20] 3
7 (35, 40] 3
范围(20,25], (25,30],(30,35]
已经被过滤掉了,但他们仍然出现在图中。这是为什么?如何输出没有空范围的图?
P.S。 @jezrael 的解决方案与上面的代码片段完美搭配。我在真实数据集上试过:
片段:
test['range'] = test['range'].cat.remove_unused_categories()
警告:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
我使用以下方法来避免警告:
test['range'].cat.remove_unused_categories(inplace=True)
这是由于使用了多个变量造成的,所以要注意:
test = blah blah blah
test_df = test[test.value!=0]
test_df['range'] = test_df['range'].cat.remove_unused_categories() # warning!
有问题 range
列是 categorical
, so categories are not removed by design like in another operations。
你需要Series.cat.remove_unused_categories
:
...
test = test[test.value!=0] #filter out rows with value = 0
print (test['range'])
0 (0, 5]
1 (5, 10]
2 (10, 15]
3 (15, 20]
7 (35, 40]
Name: range, dtype: category
Categories (8, interval[int64]):
[(0, 5] < (5, 10] < (10, 15] < (15, 20] < (20, 25] < (25, 30] < (30, 35] < (35, 40]]
test['range'] = test['range'].cat.remove_unused_categories()
print (test['range'])
0 (0, 5]
1 (5, 10]
2 (10, 15]
3 (15, 20]
7 (35, 40]
Name: range, dtype: category
Categories (5, interval[int64]):
[(0, 5] < (5, 10] < (10, 15] < (15, 20] < (35, 40]]
plt.figure(figsize=(10,5))
plt.xticks(rotation=90)
plt.yticks(np.arange(0,10, 1))
sns.barplot(x=test.range, y=test.value)
编辑:
你需要copy
:
test_df = test[test.value!=0].copy()
test_df['range'] = test_df['range'].cat.remove_unused_categories() # no warning!
如果您稍后修改 test_df
中的值,您会发现修改不会传播回原始数据 (test
),并且 Pandas 会发出警告。