PyImport_Import 使用 C++ 读取 TSV 后出现分段错误

Question

我正在使用 C++ 作为 Python 模块的包装器。首先，我读入一个 TSV 文件，将其转换为一个 numpy 数组，导入我的 Python 模块，然后将 numpy 数组传递给 Python 以进行进一步分析。当我第一次编写程序时，我使用随机生成的数组来测试所有内容，并且运行良好。但是，一旦我用导入的 TSV 数组替换了随机生成的数组，我在尝试导入 Python 模块时遇到了分段错误。这是我的一些代码：

#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#define PY_SSIZE_T_CLEAN

#include <python3.8/Python.h>
#include "./venv/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h"

#include <stdio.h>
#include <iostream>
#include <stdlib.h>
#include <random>
#include <fstream>
#include <sstream>

int main(int argc, char* argv[]) {

    setenv("PYTHONPATH", ".", 0);

    Py_Initialize();
    import_array();

    static const int numberRows = 1000;
    static const int numberColumns = 500;

    npy_intp dims[2]{ numberRows, numberColumns };

    static const int numberDims = 2;

    double(*c_arr)[numberColumns]{ new double[numberRows][numberColumns] };

    // *********************************************************** 
    // THIS PART OF THE CODE GENERATES A RANDOM ARRAY AND WORKS WITH THE REST OF THE CODE
    // // initialize random number generation
    // typedef std::mt19937 MyRNG;
    // std::random_device r;
    // MyRNG rng{r()};
    // std::lognormal_distribution<double> lognormalDistribution(1.6, 0.25);

    // //populate array
    // for (int i=0; i < numberRows; i++) {
    //     for (int j=0; j < numberColumns; j++) {
    //         c_arr[i][j] = lognormalDistribution(rng);
    //     }
    // }
    // ***********************************************************

    // *********************************************************** 
    // THIS PART OF THE CODE INGESTS AN ARRAY FROM TSV AND CAUSES CODE TO FAIL AT PyImport_Import
    std::ifstream data("data.mat");
    std::string line;
    int row = 0;
    int column = 0;
    while (std::getline(data, line)) {
        std::stringstream lineStream(line);
        std::string cell;
        while (std::getline(lineStream, cell, '\t')) {
            c_arr[row][column] = std::stod(cell);
            column++;
        }
        row++;
        column = 0;
        if (row > numberRows) {
            break;
        }
    }
    // *********************************************************** 

    PyArrayObject *npArray = reinterpret_cast<PyArrayObject*>(
        PyArray_SimpleNewFromData(numberDims, dims, NPY_DOUBLE, reinterpret_cast<void*>(c_arr))
        );

    const char *moduleName = "cpp_test";
    PyObject *pname = PyUnicode_FromString(moduleName);

    // ***********************************************************
    // CODE FAILS HERE - SEGMENTATION FAULT
    PyObject *pyModule = PyImport_Import(pname);


    // .......
    // THERE IS MORE CODE BELOW NOT INCLUDED HERE
}

所以，我不确定为什么代码在从 TSV 文件中提取数据时会失败，但在我使用随机生成的数据时却不会。

编辑：（非常愚蠢的错误传入）我在 while 循环中使用条件 row > numberRows 作为停止条件，因此这影响了用于数组中最后一行的行号。一旦我将该条件更改为 row == numberRows，一切正常。谁知道在构建数组时特定于行如此重要？我会把它作为愚蠢的编程错误的证明，也许有人会从中学到一些东西。

Answer 1

请注意，您不必使用数组以二维方式存储信息（如双精度值），因为您还可以使用动态大小的容器，例如 std::vector，如下所示。使用 std::vector 的优势是您不必事先知道输入文件中的行数和列数 (data.mat)。所以你不必事先为行和列分配内存。您可以动态添加值。

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include<fstream>
int main() {
    std::string line;
    double word;

    
    std::ifstream inFile("data.mat");
    
    //create/use a std::vector instead of builit in array 
    std::vector<std::vector<double>> vec;
    
    if(inFile)
    {
        while(getline(inFile, line, '\n'))        
        {
            //create a temporary vector that will contain all the columns
            std::vector<double> tempVec;
            
            
            std::istringstream ss(line);
            
            //read word by word(or double by double) 
            while(ss >> word)
            {
                //std::cout<<"word:"<<word<<std::endl;
                //add the word to the temporary vector 
                tempVec.push_back(word);
            
            }      
            
            //now all the words from the current line has been added to the temporary vector 
            vec.emplace_back(tempVec);
        }    
    }
    
    else 
    {
        std::cout<<"file cannot be opened"<<std::endl;
    }
    
    inFile.close();
    
    //lets check out the elements of the 2D vector so the we can confirm if it contains all the right elements(rows and columns)
    for(std::vector<double> &newvec: vec)
    {
        for(const double &elem: newvec)
        {
            std::cout<<elem<<" ";
        }
        std::cout<<std::endl;
    }
    
    
    
    return 0;
}

可以看到上面程序的输出here。由于您没有提供 data.mat 文件，我创建了一个示例 data.mat 文件并在我的程序中使用它，可以在上面提到的 link.

中找到

PyImport_Import 使用 C++ 读取 TSV 后出现分段错误

PyImport_Import segmentation fault after reading in TSV with C++

c++

python

numpy

segmentation-fault