读取同一列中混合有浮点数、整数和字符串的文本文件

Question

在 Matlab 中加载一个格式良好且分隔良好的文本文件相对简单，但我很难处理一个必须读入的文本文件。遗憾的是我无法更改源文件的结构，所以我必须处理用我所拥有的。

基本文件结构为：

123 180 (two integers, white space delimited)
1.5674e-8
.
.
(floating point numbers in column 1, column 2 empty)
.
.
100 4501 (another two integers)
5.3456e-4 (followed by even more floating point numbers)
.
. 
.
.
45 String (A integer in column 1, string in column 2)
.
.
.

简单

[data1,data2]=textread('filename.txt','%f %s', ...
                    'emptyvalue', NaN)

无效。如何正确过滤输入数据？到目前为止，我在网上和 Matlab 中找到的所有示例都有助于处理结构良好的数据，所以我有点不知从哪里开始。

因为我必须阅读一大堆这些文件 >100，所以我宁愿不遍历每个文件中的每一行。我希望有一个更快的方法。

编辑：我在这里提供了一个示例文件：test.txt（google 驱动器）

Answer 1

您可以使用较低级别的函数逐行读取文件，然后手动解析每一行。

你像在 C 中一样打开文件句柄

fid = fopen(filename);

然后你可以使用fgetl

读取一行

line = fgetl(fid);

在空格上对其进行字符串标记可能是最好的第一步，将每个片段存储在元胞数组中（因为矩阵不支持参差不齐的数组）

colnum = 1;
while ~isempty(rem)
    [token, rem] = strtok(rem, ' ');
    entries{linenum, colnum} = token;
    colnum = colnum + 1;
end

然后你可以将所有这些包装在另一个 while 循环中以迭代这些行

linenum = 1;
while ~feof(fid)
    % getl, strtok, index bookkeeping as above
end

到底是边读边解析文件还是先将其读入元胞数组然后再遍历，这取决于您。

您的单元格条目都将是字符串（字符数组），因此您需要使用 str2num 将它们转换为数字。它在制定格式方面做得很好，所以这可能就是您所需要的。

Answer 2

我查看了您提供的文本文件并试图得出一些一般性结论 -

当一行有两个整数时，第二个整数对应后面的行数。
你总是有（两个整数（A，B）后跟 "B" 浮点数），重复两次。
在那之后你有一些自由格式的文本（或者至少，在那之后我无法推断出任何有用的格式）。

这是一个混乱的格式，所以我怀疑是否会有好的解决方案。一些有用的一般原则是：

当你需要阅读一行时使用fgetl（它读取到下一个换行符）
当可以一次读取多行时使用 textscan - 它比一次读取一行快得多。它有很多关于如何解析的选项，值得了解（我建议输入 doc textscan 并阅读整个内容）。
如有疑问，只需将行作为字符串读取，然后在 MATLAB 中进行分析。

有了我的，这里有一个简单的文件解析器。它可能需要一些修改，因为您能够推断出更多关于文件结构的信息，但它在您提供的 ~700 行测试文件上相当快。

我刚刚给了变量虚拟名称，例如 "a"、"b"、"floats" 等。您应该将它们更改为更符合您需要的名称。

function output = readTestFile(filename)

    fid = fopen(filename, 'r');

    % Read the first line
    line = '';
    while isempty(line)
        line = fgetl(fid);
    end
    nums = textscan(line, '%d %d', 'CollectOutput', 1);

    a = nums{1}(1);
    b = nums{1}(2);

    % Read 'b' of the next lines:
    contents = textscan(fid, '%f', b);
    floats1 = contents{1};

    % Read the next line:
    line = '';
    while isempty(line)
        line = fgetl(fid);
    end
    nums = textscan(line, '%d %d', 'CollectOutput', 1);

    c = nums{1}(1);
    d = nums{1}(2);

    % Read 'd' of the next lines:
    contents = textscan(fid, '%f', d);
    floats2 = contents{1};

    % Read the rest:
    rest = textscan(fid, '%s', 'Delimiter', '\n');

    output.a = a;
    output.b = b;
    output.c = c;
    output.d = d;
    output.floats1 = floats1;
    output.floats2 = floats2;
    output.rest = rest{1};

end

读取同一列中混合有浮点数、整数和字符串的文本文件

Read textfile with a mix of floats, integers and strings in the same column

matlab

ascii

text-files