如何增加决策树中列的重要性？

Question

我有 name,ratings,ratings_count,genres 列的数据集。

例如：Movies_Data.csv

   Name             ratings ratings_count Action Adventure Horror Musical Thriller       
    Mad-Max            2           7         1        0       0       0       1
    Mitchell[1975]     3.25        2         1        0       0       0       1
    John Wick          4.23        4         1        0       0       0       0
    Insidious          3.75        10        0        0       1       0       0

我把它分为特征和标签。然后为 Name 列执行标签编码。

这是拆分后的特征数据集。

特点：

ratings ratings_count Action Adventure Horror Musical Thriller       
   2           7         1        0       0       0       1
   3.25        2         1        0       0       0       1
   4.23        4         1        0       0       0       0
   3.75        10        0        0       1       0       0

现在的问题是我有大约 18 'Genre' 列。所以我认为我的决策树更重视这些列而不是 ratings 和 ratings_count.

就像我让树预测具有以下参数的电影：

ratings:3 ratings_count:2 Action:1 Adventure:0 Horror:0 Musical:0 Thriller:1

显然应该预测 Mitchell[1975] 因为 ratings:3 接近 3.25 和 ratings_count 与我的输入相同。但它预测 Mad-Max。如何提高评分和 ratings_count 列的重要性？

我是机器学习新手。那么我可以使用任何其他方法或任何其他算法来获得更好的推荐吗？

P.s.I 知道我们可以使用神经网络，但我只需要坚持使用 Basic ML 算法。

谢谢！

Answer 1

首先，随机森林几乎总是比决策树带来更好的结果。他们有更多的超参数需要调整，但这也可以帮助您带来更好的结果。它被称为集成算法，并且效果很好，因为它平均了许多决策树。它有较少的过拟合问题，所以它应该表现得更好。

如果您仍然遇到问题，您可以尝试融合一些类别（或获取更多数据），以便您的算法可以正确推断评级的重要性。

另外，这个问题可能更适合 Cross Validated，在那里你可以提出更多理论问题。

祝你好运！

如何增加决策树中列的重要性？

How to increase importance of a column in Decision Tree?

python

machine-learning

decision-tree