将多种类型的文本文件转换为矩阵
Convert text file of multiple types to matrix
我正在使用 iris 数据集,它看起来如下...
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
...
如您所见,数据中有不同的类型。前几个是浮点数,最后一个是字符串。因此我不能使用 dlmread
。当我尝试时,出现错误。
我尝试使用 fscanf
,但我的解决方案没有给我想要的...
filename = "train.txt"
A = fopen(filename, 'r')
data = fscanf(A, '%f %f %f %f %s')
这是给 data
作为 1x1 数组。
我想要的是将数据转换成一个矩阵,我可以在其中按行和列访问值。因此,data(1,1)
将是 5.4
。我不太熟悉 Octave 中的 I/O,因此非常感谢您的帮助。
Regular experssions can be very helpful in problems like this. They allow you to search for a particular pattern or patterns. For example, using regexp you can find all instances of a pattern in your datasheet and read them into an array, with out = regexp(str, expression, 'match')
. Depending on how you set up the program, it'll likely read it in as a 1xn array. But if you know the number of columns in each row, you can easily convert to an array with something like vec2mat.
以下对我有用,在 Matlab R2017a 和 Octave 4.2.1 中。有关详细信息,请参阅 textscan
documentation。
fid = fopen('filename.txt');
x = textscan(fid, '%f,%f,%f,%f,%s');
fclose(fid);
x_num = [x{1:4}];
x_str = x{5};
这给出了
x_num =
5.400000000000000 3.700000000000000 1.500000000000000 0.200000000000000
4.800000000000000 3.400000000000000 1.600000000000000 0.200000000000000
4.800000000000000 3.000000000000000 1.400000000000000 0.100000000000000
4.300000000000000 3.000000000000000 1.100000000000000 0.100000000000000
5.800000000000000 4.000000000000000 1.200000000000000 0.200000000000000
x_str =
5×1 cell array
'Iris-setosa'
'Iris-setosa'
'Iris-setosa'
'Iris-setosa'
'Iris-setosa'
您可以使用 textscan function 并将参数 CollectOutput
设置为 true
;
轻松实现此目的
Logical indicator determining data concatenation, specified as the
comma-separated pair consisting of 'CollectOutput' and either true or
false. If true, then the importing function concatenates consecutive
output cells of the same fundamental MATLAB® class into a single
array.
示例:
filename = 'train.txt';
fid = fopen(filename, 'r');
data = textscan(fid,'%f%f%f%f%s','CollectOutput',true,'Delimiter',',');
fclose(fid);
data
变量将以元胞数组的形式返回,其中文件内容将根据基础类型进行分组。第一个单元格将包含数值,而第二个单元格将包含字符串值...您可以按如下方式分别检索它们:
numerics = data{1};
texts = data{2};
我正在使用 iris 数据集,它看起来如下...
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
...
如您所见,数据中有不同的类型。前几个是浮点数,最后一个是字符串。因此我不能使用 dlmread
。当我尝试时,出现错误。
我尝试使用 fscanf
,但我的解决方案没有给我想要的...
filename = "train.txt"
A = fopen(filename, 'r')
data = fscanf(A, '%f %f %f %f %s')
这是给 data
作为 1x1 数组。
我想要的是将数据转换成一个矩阵,我可以在其中按行和列访问值。因此,data(1,1)
将是 5.4
。我不太熟悉 Octave 中的 I/O,因此非常感谢您的帮助。
Regular experssions can be very helpful in problems like this. They allow you to search for a particular pattern or patterns. For example, using regexp you can find all instances of a pattern in your datasheet and read them into an array, with out = regexp(str, expression, 'match')
. Depending on how you set up the program, it'll likely read it in as a 1xn array. But if you know the number of columns in each row, you can easily convert to an array with something like vec2mat.
以下对我有用,在 Matlab R2017a 和 Octave 4.2.1 中。有关详细信息,请参阅 textscan
documentation。
fid = fopen('filename.txt');
x = textscan(fid, '%f,%f,%f,%f,%s');
fclose(fid);
x_num = [x{1:4}];
x_str = x{5};
这给出了
x_num =
5.400000000000000 3.700000000000000 1.500000000000000 0.200000000000000
4.800000000000000 3.400000000000000 1.600000000000000 0.200000000000000
4.800000000000000 3.000000000000000 1.400000000000000 0.100000000000000
4.300000000000000 3.000000000000000 1.100000000000000 0.100000000000000
5.800000000000000 4.000000000000000 1.200000000000000 0.200000000000000
x_str =
5×1 cell array
'Iris-setosa'
'Iris-setosa'
'Iris-setosa'
'Iris-setosa'
'Iris-setosa'
您可以使用 textscan function 并将参数 CollectOutput
设置为 true
;
Logical indicator determining data concatenation, specified as the comma-separated pair consisting of 'CollectOutput' and either true or false. If true, then the importing function concatenates consecutive output cells of the same fundamental MATLAB® class into a single array.
示例:
filename = 'train.txt';
fid = fopen(filename, 'r');
data = textscan(fid,'%f%f%f%f%s','CollectOutput',true,'Delimiter',',');
fclose(fid);
data
变量将以元胞数组的形式返回,其中文件内容将根据基础类型进行分组。第一个单元格将包含数值,而第二个单元格将包含字符串值...您可以按如下方式分别检索它们:
numerics = data{1};
texts = data{2};