如何将回归数据转换为分类数据？

Question

我有一个包含列的数据

   ['symboling', 'Company', 'fueltype', 'aspiration', 'doornumber',
   'carbody', 'drivewheel', 'enginelocation', 'carlength', 'carwidth',
   'curbweight', 'enginetype', 'cylindernumber', 'enginesize',
   'fuelsystem', 'horsepower', 'price', 'total_mpg']

目标是预测汽车价格。现在他的价格数据是连续的。我想知道如何转换它以便我可以使用分类模型。

经过搜索，我确实发现我可以通过定义范围来做到这一点，但我无法理解它。请帮助我

Answer 1

假设我们有一个包含 2 个连续列的数据框，分别命名为 x1 和 x2:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

x1 = np.random.rand(100)
x2 = np.random.rand(100)
df = pd.DataFrame({"x1":x1,"x2":x2})
df.head()

#        x1       x2
#0  0.049202    0.131046
#1  0.606525    0.756687
#2  0.910932    0.944692
#3  0.904655    0.439637
#4  0.565204    0.418432

# Plot values
sns.scatterplot(x=range(100),y=df["x1"])
sns.scatterplot(x=range(100),y=df["x2"])

然后我们可以像这样制作一些水桶：

x1_cat = pd.cut(df['x1'], bins=[0.,0.2,0.4,0.6,0.8,np.inf], labels=[0,1,2,3,4])
x2_cat = pd.cut(df['x2'], bins=[0.,0.2,0.4,0.6,0.8,np.inf], labels=[0,1,2,3,4])
df_cat = pd.concat([x1_cat,x2_cat],axis=1)
df_cat.head()

#   x1  x2
#0  0   0
#1  3   3
#2  4   4
#3  4   2
#4  2   2

# Plot values
sns.scatterplot(x=range(100),y=df_cat["x1"])
sns.scatterplot(x=range(100),y=df_cat["x2"])

如何将回归数据转换为分类数据？

how can I convert Regression data into Classification data?

regression

classification

machine-learning

pandas

scikit-learn