使用点列表列训练模型

Question

我想按深度对裂缝进行分类。为此，我在数据框中存储了以下特征：

WindowsDf = pd.DataFrame(dataForWindowsDf, columns=['IsCrack', 'CheckTypeEncode', 'DepthCrack',
                                                    'WindowOfInterest'])
#dataForWindowsDf is a list which iteratively built from csv files.
#Windows data frame taking this list and build a data frame from it.

所以，我的目标列是 'DepthCrack'，其他列是特征向量的一部分。 WindowOfInterest 是二维列表的一列 - 点列表 - 表示已完成测试的图表（基于作为时间函数从表面返回的电磁波）：

[[0.9561600000000001, 0.10913097635410397], [0.95621,0.1100000]...]

我面临的问题是如何训练模型——使用一列二维列表（我试图按原样推送它，但没有成功）？你建议用什么方法来处理这个问题？

我考虑过从二维列表中提取特征 - 以获得一维特征（积分等）

Answer 1

你可以将这个特征一分为二，比如 WindowOfInterest 可以变成：

WindowOfInterest_x1 和 WindowOfInterest_x2

例如来自您的 DataFrame :

>>> import pandas as pd

>>> df = pd.DataFrame({'IsCrack': [1, 1, 1, 1, 1], 
...                    'CheckTypeEncode': [0, 1, 0, 0, 0], 
...                    'DepthCrack': [0.4, 0.2, 1.4, 0.7, 0.1], 
...                    'WindowOfInterest': [[0.9561600000000001, 0.10913097635410397], [0.95621,0.1100000], [0.459561, 0.635410397], [0.4495621,0.32], [0.621,0.2432]]}, 
...                   index = [0, 1, 2, 3, 4])
>>> df
    IsCrack CheckTypeEncode DepthCrack  WindowOfInterest
0   1       0               0.4         [0.9561600000000001, 0.10913097635410397]
1   1       1               0.2         [0.95621, 0.11]
2   1       0               1.4         [0.459561, 0.635410397]
3   1       0               0.7         [0.4495621, 0.32]
4   1       0               0.1         [0.621, 0.2432]

我们可以 split 像这样 list :

>>> df[['WindowOfInterest_x1','WindowOfInterest_x2']] = pd.DataFrame(df['WindowOfInterest'].tolist(), index=df.index)
>>> df

        IsCrack  CheckTypeEncode    DepthCrack          WindowOfInterest                           WindowOfInterest_x1  WindowOfInterest_x2
0       1        0                  0.4                 [0.9561600000000001, 0.10913097635410397]  0.956160             0.109131
1       1        1                  0.2                 [0.95621, 0.11]                            0.956210             0.110000
2       1        0                  1.4                 [0.459561, 0.635410397]                    0.459561             0.635410
3       1        0                  0.7                 [0.4495621, 0.32]                          0.449562             0.320000
4       1        0                  0.1                 [0.621, 0.2432]                            0.621000             0.243200

最后，我们可以 drop WindowOfInterest 列：

>>> df = df.drop(['WindowOfInterest'], axis=1)
>>> df
    IsCrack CheckTypeEncode DepthCrack  WindowOfInterest_x1 WindowOfInterest_x2
0   1       0               0.4         0.956160            0.109131
1   1       1               0.2         0.956210            0.110000
2   1       0               1.4         0.459561            0.635410
3   1       0               0.7         0.449562            0.320000
4   1       0               0.1         0.621000            0.243200

现在您可以将 WindowOfInterest_x1 和 WindowOfInterest_x2 作为您模型的特征。

使用点列表列训练模型

Training a model with List of points column

python

classification

machine-learning

pandas