从 MATLAB 中的文本扫描输出构建结构
Construct a struct from textscan output in MATLAB
我有一个这样的制表符分隔文件:
refseq gene symb locus_id chr strand start end cds_start cds_end status chrm
ENST00000456328.2 ENST00000456328.2 DDX11L1 00000456328 chr1 1 11868 14409 14409 14409 Reviewed 1
ENST00000515242.2 ENST00000515242.2 DDX11L1 00000515242 chr1 1 11871 14412 14412 14412 Reviewed 1
ENST00000518655.2 ENST00000518655.2 DDX11L1 00000518655 chr1 1 11873 14409 14409 14409 Reviewed 1
ENST00000450305.2 ENST00000450305.2 DDX11L1 00000450305 chr1 1 12009 13670 13670 13670 Reviewed 1
ENST00000438504.2 ENST00000438504.2 WASH7P 00000438504 chr1 0 14362 29370 29370 29370 Reviewed 1
我想将它作为结构读入 Matlab,如下所示:
我试过这样做:
fid = fopen('gencode.v19.pseudogene_gistic.txt');
headers = textscan(fid,'%s%s%s%s%s%s%s%s%s%s%s%s',1,'delimiter','\t')
data = textscan(fid,'%s%s%s%d%s%d%d%d%d%d%s%d','delimiter','\t')
fclose(fid);
cdata = struct('refseq',data{1}, 'gene',data{2}, 'symb',data{3}, 'locus_id',data{4}, 'chr',data{5}, 'strand',data{6}, 'start',transpose(data{7}), 'end',data{8}, 'cds_start',data{9}, 'cds_end',data{10}, 'status',data{11}, 'chrn',data{12};
然而,它returns这样的结构包含可笑的单元格。所有数字字段都起作用 differently.NOTE:我想要一个 1x17149 结构,而不是 17149x1 结构。
有人能帮忙吗?谢谢
问题是 textscan
returns 数值元胞数组和字符数组元胞数组。您需要转换一个或另一个以使它们相同。
这是一些适用于您显示的数据的代码
%// Load the headers
headers = textscan(fid,'%s%s%s%s%s%s%s%s%s%s%s%s', 1);
%// Load the data
data = textscan(fid,'%s%s%s%d%s%d%d%d%d%d%s%d');
%// Find which ones aren't nested cell arrays
isarray = ~cellfun(@(x)iscell(x), data);
%// Convert to nested cell arrays
data(isarray) = cellfun(@num2cell, data(isarray), 'uni', 0);
%// Create a structure using the headers as field names
cdata = cell2struct(cat(2, data{:}).', [headers{:}]).';
我有一个这样的制表符分隔文件:
refseq gene symb locus_id chr strand start end cds_start cds_end status chrm
ENST00000456328.2 ENST00000456328.2 DDX11L1 00000456328 chr1 1 11868 14409 14409 14409 Reviewed 1
ENST00000515242.2 ENST00000515242.2 DDX11L1 00000515242 chr1 1 11871 14412 14412 14412 Reviewed 1
ENST00000518655.2 ENST00000518655.2 DDX11L1 00000518655 chr1 1 11873 14409 14409 14409 Reviewed 1
ENST00000450305.2 ENST00000450305.2 DDX11L1 00000450305 chr1 1 12009 13670 13670 13670 Reviewed 1
ENST00000438504.2 ENST00000438504.2 WASH7P 00000438504 chr1 0 14362 29370 29370 29370 Reviewed 1
我想将它作为结构读入 Matlab,如下所示:
我试过这样做:
fid = fopen('gencode.v19.pseudogene_gistic.txt');
headers = textscan(fid,'%s%s%s%s%s%s%s%s%s%s%s%s',1,'delimiter','\t')
data = textscan(fid,'%s%s%s%d%s%d%d%d%d%d%s%d','delimiter','\t')
fclose(fid);
cdata = struct('refseq',data{1}, 'gene',data{2}, 'symb',data{3}, 'locus_id',data{4}, 'chr',data{5}, 'strand',data{6}, 'start',transpose(data{7}), 'end',data{8}, 'cds_start',data{9}, 'cds_end',data{10}, 'status',data{11}, 'chrn',data{12};
然而,它returns这样的结构包含可笑的单元格。所有数字字段都起作用 differently.NOTE:我想要一个 1x17149 结构,而不是 17149x1 结构。
有人能帮忙吗?谢谢
问题是 textscan
returns 数值元胞数组和字符数组元胞数组。您需要转换一个或另一个以使它们相同。
这是一些适用于您显示的数据的代码
%// Load the headers
headers = textscan(fid,'%s%s%s%s%s%s%s%s%s%s%s%s', 1);
%// Load the data
data = textscan(fid,'%s%s%s%d%s%d%d%d%d%d%s%d');
%// Find which ones aren't nested cell arrays
isarray = ~cellfun(@(x)iscell(x), data);
%// Convert to nested cell arrays
data(isarray) = cellfun(@num2cell, data(isarray), 'uni', 0);
%// Create a structure using the headers as field names
cdata = cell2struct(cat(2, data{:}).', [headers{:}]).';