为什么 Seaborn 一直在 x 轴上绘制不存在的范围值?

Why does Seaborn keep drawing non-existing range value on the x-axis?

片段:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

test = pd.DataFrame({'value':[1,2,5,7,8,10,11,12,15,16,18,20,36,37,39]})
test['range'] = pd.cut(test.value, np.arange(0,45,5)) # generate range
test = test.groupby('range')['value'].count().to_frame().reset_index() # count occurance in each range
test = test[test.value!=0] #filter out rows with value = 0

plt.figure(figsize=(10,5))
plt.xticks(rotation=90)
plt.yticks(np.arange(0,10, 1))
sns.barplot(x=test.range, y=test.value)

输出:

如果我们看看 test 中的内容:

     range   value
0   (0, 5]      3
1   (5, 10]     3
2   (10, 15]    3
3   (15, 20]    3
7   (35, 40]    3

范围(20,25], (25,30],(30,35]已经被过滤掉了,但他们仍然出现在图中。这是为什么?如何输出没有空范围的图?


P.S。 @jezrael 的解决方案与上面的代码片段完美搭配。我在真实数据集上试过:

片段:

test['range'] = test['range'].cat.remove_unused_categories()

警告:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

我使用以下方法来避免警告:

test['range'].cat.remove_unused_categories(inplace=True)

这是由于使用了多个变量造成的,所以要注意:

test = blah blah blah
test_df = test[test.value!=0]
test_df['range'] = test_df['range'].cat.remove_unused_categories() # warning!

有问题 range 列是 categorical, so categories are not removed by design like in another operations

你需要Series.cat.remove_unused_categories:

...
test = test[test.value!=0] #filter out rows with value = 0
print (test['range'])
0      (0, 5]
1     (5, 10]
2    (10, 15]
3    (15, 20]
7    (35, 40]
Name: range, dtype: category
Categories (8, interval[int64]):
[(0, 5] < (5, 10] < (10, 15] < (15, 20] < (20, 25] < (25, 30] < (30, 35] < (35, 40]]

test['range'] = test['range'].cat.remove_unused_categories()
print (test['range'])
0      (0, 5]
1     (5, 10]
2    (10, 15]
3    (15, 20]
7    (35, 40]
Name: range, dtype: category
Categories (5, interval[int64]): 
[(0, 5] < (5, 10] < (10, 15] < (15, 20] < (35, 40]]

plt.figure(figsize=(10,5))
plt.xticks(rotation=90)
plt.yticks(np.arange(0,10, 1))
sns.barplot(x=test.range, y=test.value)

编辑:

你需要copy:

test_df = test[test.value!=0].copy()
test_df['range'] = test_df['range'].cat.remove_unused_categories() # no warning!

如果您稍后修改 test_df 中的值,您会发现修改不会传播回原始数据 (test),并且 Pandas 会发出警告。