使用 Pandas 创建 NumPy 数组

Question

我正在尝试将 scikit 与一个电子表格 (.xlsx) 中的一些数据结合使用。为此，我使用 Pandas 读取电子表格，然后我将使用 numpy 来使用 scikit。

这里的问题是，当我将我的 DF 结构转换为 numpy 时，我几乎丢失了所有数据！我认为这是因为它没有列名。只有原始数据。例如：

28.7967 16.0021 2.6449 0.3918 0.1982

31.6036 11.7235 2.5185 0.5303 0.3773

162.052 136.031 4.0612 0.0374 0.0187

到目前为止我的代码：

def split_data():
    test_data = pd.read_excel('magic04.xlsx', sheetname=0, skip_footer=16020)
    #code below prints correctly the data
    print test_data.iloc[:, 0:10] 

    #none of the code below work as expected 
    test1 = np.array(test_data.iloc[:, 0:10])
    test2 = test_data.as_matrix()

我真的迷路了。非常欢迎任何帮助...

Answer 1

我建议您在 read_excel 中使用 header=None。请参阅以下内容：

df = pd.read_excel('stuff.xlsx')
>> df
    28.7967 16.0021 2.6449  0.3918  0.1982
0   31.6036 11.7235 2.5185  0.5303  0.3773
1   162.0520    136.0310    4.0612  0.0374  0.0187

>> df.ix[:, 1: 2]

0
1

对战：

df = pd.read_excel('stuff.xlsx', header=None)
>> df

0   1   2   3   4
0   28.7967 16.0021 2.6449  0.3918  0.1982
1   31.6036 11.7235 2.5185  0.5303  0.3773
2   162.0520    136.0310    4.0612  0.0374  0.0187

>> df.ix[:, 1: 2]
    1   2
0   16.0021 2.6449
1   11.7235 2.5185
2   136.0310    4.0612

使用 Pandas 创建 NumPy 数组

Creating NumPy array with Pandas

python

arrays

numpy

pandas

scikit-learn