为什么 Seaborn 一直在 x 轴上绘制不存在的范围值？

Question

片段：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

test = pd.DataFrame({'value':[1,2,5,7,8,10,11,12,15,16,18,20,36,37,39]})
test['range'] = pd.cut(test.value, np.arange(0,45,5)) # generate range
test = test.groupby('range')['value'].count().to_frame().reset_index() # count occurance in each range
test = test[test.value!=0] #filter out rows with value = 0

plt.figure(figsize=(10,5))
plt.xticks(rotation=90)
plt.yticks(np.arange(0,10, 1))
sns.barplot(x=test.range, y=test.value)

输出：

如果我们看看 test 中的内容：

     range   value
0   (0, 5]      3
1   (5, 10]     3
2   (10, 15]    3
3   (15, 20]    3
7   (35, 40]    3

范围(20,25], (25,30],(30,35]已经被过滤掉了，但他们仍然出现在图中。这是为什么？如何输出没有空范围的图？

P.S。 @jezrael 的解决方案与上面的代码片段完美搭配。我在真实数据集上试过：

片段：

test['range'] = test['range'].cat.remove_unused_categories()

警告：

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

我使用以下方法来避免警告：

test['range'].cat.remove_unused_categories(inplace=True)

这是由于使用了多个变量造成的，所以要注意：

test = blah blah blah
test_df = test[test.value!=0]
test_df['range'] = test_df['range'].cat.remove_unused_categories() # warning!

Answer 1

有问题 range 列是 categorical, so categories are not removed by design like in another operations。

你需要Series.cat.remove_unused_categories:

...
test = test[test.value!=0] #filter out rows with value = 0
print (test['range'])
0      (0, 5]
1     (5, 10]
2    (10, 15]
3    (15, 20]
7    (35, 40]
Name: range, dtype: category
Categories (8, interval[int64]):
[(0, 5] < (5, 10] < (10, 15] < (15, 20] < (20, 25] < (25, 30] < (30, 35] < (35, 40]]

test['range'] = test['range'].cat.remove_unused_categories()
print (test['range'])
0      (0, 5]
1     (5, 10]
2    (10, 15]
3    (15, 20]
7    (35, 40]
Name: range, dtype: category
Categories (5, interval[int64]): 
[(0, 5] < (5, 10] < (10, 15] < (15, 20] < (35, 40]]

plt.figure(figsize=(10,5))
plt.xticks(rotation=90)
plt.yticks(np.arange(0,10, 1))
sns.barplot(x=test.range, y=test.value)

编辑：

你需要copy:

test_df = test[test.value!=0].copy()
test_df['range'] = test_df['range'].cat.remove_unused_categories() # no warning!

如果您稍后修改 test_df 中的值，您会发现修改不会传播回原始数据 (test)，并且 Pandas 会发出警告。

为什么 Seaborn 一直在 x 轴上绘制不存在的范围值？

Why does Seaborn keep drawing non-existing range value on the x-axis?

pandas

seaborn