将 .rtf 数据导入 MatLab

Question

我正在处理巨大的富文本数据文件 (.rtf)。文件中的数据由两列数字组成，这些数字采用类似 table 的属性进行格式化。此外，这些数字要么非常大，要么非常小，因此需要与这些数字相关联的精度非常高。

如何将第一列的数据分配给"A"，第二列的数据分配给"B"？（这些是向量吗？）我现在的问题与事实有关富文本格式不与导入 MatLab 配合使用，将 .rtf 文件转换为 .txt（然后导入）会将两列的数据合并为一个交替信息列。

一旦我有了 "A"，我需要能够比较单个指定值并将其与第一列数据进行比较，找到最接近的值，然后在第二列中产生相应的值.

假设我的文件中有这个数据样本：

1.0E-5      78.29777
1.0625E-5   75.9674
1.125E-5    73.83424
1.1875E-5   71.87197
1.25E-5     70.05895
1.375E-5    66.8116
1.5E-5      63.9797
1.625E-5    61.48167

而我的单个指定值是 1.123E-5，这个值最接近 1.125E-5 因此所需的输出是 73.83424。

我该怎么做，我不熟悉 MatLab 数据导入语法，不知道从哪里开始？

提前感谢大家的帮助！！

Answer 1

我会这样做：将内容复制到 excel 或 Google 电子表格中，然后另存为 .csv，从这里很容易

T = readtable('path/to/my/data.csv');

T 现在包含您的数字作为 Table 数据类型的双浮点数。

A = T{:, 1}; % column 1

B = T{:, 2}; % column 2

祝你好运！

Answer 2

您可以使用 low level IO with regular expressions 读入您的 *.rtf 文件并在不进行任何转换的情况下获取数据。使用您的示例数据和一个 *.rtf 文件，我拼凑了一个笨重的解析器，可以为您获取数据。如果您在文本编辑器中打开 *.rtf 文件，您会注意到（至少在我的文件中）它有 2 header 行：

{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 6.3.9600}\viewkind4\uc1

后面还有一些 header 与您的数据混合在一起（可能只是写字板失败）：

\pard\sa200\sl276\slmult1\f0\fs22\lang9 1.0E-5      78.29777\par

所以我们跳过前两行，区别对待第三行，然后再处理剩下的：

fID = fopen('test.rtf', 'r'); % Open our data file

nheaders = 2; % Number of full header lines
npartialheaders = 1; % Number of header lines with your data mixed in

ii = 1;
mydata = [];
while ~feof(fID) % Loop until we reach the end of the file
    if ii <= nheaders
        % Do nothing
        tline = fgetl(fID); % Read in a line of data, discard it
        ii = ii + 1;
    else
        tline = fgetl(fID); % Read in a line of data
        out = regexp(tline, '([\s\d.E-])', 'match');

        if ~isempty(out) % Our regex found some data
            % The regexp returns every character in a cell, concatenate them
            % and split them along the spaces
            data_str = strsplit([out{:}], ' ');

            if ii > nheaders && ii <= (nheaders + npartialheaders)
                % Header is mixed with your data
                % We should only want the second and third matches
                data_num = str2double(data_str(2:3));
                mydata = [mydata; data_num];
            else
                % Just your data on these lines
                data_num = str2double(data_str(1:2));
                mydata = [mydata; data_num];
            end
        end

        ii = ii + 1;
    end
end

fclose(fID);

哪个returns:

mydata =

    1.00000000000000e-05    78.2977700000000
    1.06250000000000e-05    75.9674000000000
    1.12500000000000e-05    73.8342400000000
    1.18750000000000e-05    71.8719700000000
    1.25000000000000e-05    70.0589500000000
    1.37500000000000e-05    66.8116000000000
    1.50000000000000e-05    63.9797000000000
    1.62500000000000e-05    61.4816700000000

诚然，这是丑陋、低效的代码。我敢肯定可以进行很多更改以使其更加健壮和高效，但它应该可以帮助您入门。

现在您已经有了数据，我认为您可以开始计算第二部分了。如果您还没有，请查看 MATLAB 的 matrix indexing documentation. As a hint for one implementation, take a look at the outputs for min 并思考如何从向量中减去常量。

% What is this doing? It's a mystery! [~, matchidx] = min(abs(mydata(:,1) - querypoint)); disp(mydata(matchidx, 2))

将 .rtf 数据导入 MatLab

Importing .rtf data into MatLab

import

matlab

rtf

vector

plaintext