MATLAB:table 的相等行或字符串的相等单词

MATLAB: Equal rows of table OR Equal words of strings

我想用不同的字符串制作不同的表格。字符串具有不同的长度,因此表将具有不同的行数。我想合并这些表(最后),因此需要我拥有的表具有相同数量的行。我的计划是使用 NaN 来执行此操作,但没有成功。

我的代码尝试在这里,我正在努力的地方被标记为“问题位置”。代码:

 String = ["Random info in middle one, "+ ...
           "Random info still continues. ",
           "Random info in middle two. "+ ...
           "Random info still continues. ExtraWord1 ExtraWord2 ExtraWord3 "];  % String 2 has one word more than string one
    
%%%%%% FOCUS AREA BEGINS %%%%%%%%
    for x=1:length(String)
        % Plan to add NaNs
        documents_Overall = tokenizedDocument(String(x,1));
        tdetails = tokenDetails(documents_Overall);
        StringTable = tdetails(:,{'Token','Type'});
        StringHeight(x) = height(StringTable);
    
    MaxHeight=max(StringHeight);
    StringTable(end+1:MaxHeight,1)=NaN; % Problem location.
    
    %Plan to Convert table back to string
    DataCell = table2cell(StringTable);
    String(x,1) = [DataCell{:}];
end

%%%%%% FOCUS AREA ENDS %%%%%%%%


%Plan to combine tables

    documents_Middle = tokenizedDocument(String);
    tdetails = tokenDetails(documents_Middle);
        
    t = table();d = tokenizedDocument(String);
    variableNames = [];variables = [];
    
    for n=1:length(d)
     variableNames = [variableNames {sprintf('Tokens for sentence %d',n)} {sprintf('Type for sentence %d',n)}];
     variables = [variables {d(n).tokenDetails.Token} {d(n).tokenDetails.Type}];
    end
    
    %Table = cell2table(variables);
    table(variables{:},'VariableNames',variableNames)

此延续旨在使行数等于行数,对于任意数量的字符串,所有其他字符串都需要填充以匹配最长的字符串。我的计划是使用 NaN 来实现这个目标,但还没有成功。这个例子的结果应该是这样的:

感谢所有帮助。 谢谢

我在回答您的 的基础上建立了基础。

下面的逻辑是先求出最大列的大小(本例中为14);然后,我们找到需要填充的列的索引(我们知道这些列是成对出现的,所以在这样做时我们可以只考虑每隔一列);最后,我们遍历需要填充的列,用 <missing>(相当于 NaN 的字符串)填充所述列,并用 letters.

填充下一个列
s = ["Random info in middle one, "+ ...
           "Random info still continues. ",
           "Random info in middle two. "+ ...
           "Random info still continues. ExtraWord ExtraWord ExtraWord "];

t = table();
d = tokenizedDocument(s);

variableNames = [];
variables = [];
max_column_size = 1;

for n=1:length(d)
 variableNames = [variableNames {sprintf('Tokens for sentence %d',n)} {sprintf('Type for sentence %d',n)}];
 variables = [variables {d(n).tokenDetails.Token} {d(n).tokenDetails.Type}];
 column_size = size(d(n).tokenDetails.Token,1);
 if column_size > max_column_size
    max_column_size = column_size;
 end
end

% Setup anonymous function to determine size of column
f = @(x) size(x,1) < max_column_size;

% Loop over variables to determine which columns need to be padded
indeces_to_pad = find(cell2mat(cellfun(f,variables,'UniformOutput',false)));
indeces_to_pad(2:2:end) = [];

% Loop over the columns to be padded and pad them
for n=1:length(indeces_to_pad)
    index_to_pad = indeces_to_pad(n);
    column_size_diff = max_column_size - length(variables{index_to_pad});
    variables{index_to_pad} = [variables{index_to_pad}; NaN((column_size_diff), 1)];
    variables{index_to_pad+1} = [variables{index_to_pad+1}; categorical(repmat("letters",(column_size_diff), 1))];
end


table(variables{:},'VariableNames',variableNames)

将导致以下结果 table:

ans =

  14×4 table

    Tokens for sentence 1    Type for sentence 1    Tokens for sentence 2    Type for sentence 2
    _____________________    ___________________    _____________________    ___________________

         "Random"                letters                 "Random"                letters        
         "info"                  letters                 "info"                  letters        
         "in"                    letters                 "in"                    letters        
         "middle"                letters                 "middle"                letters        
         "one"                   letters                 "two"                   letters        
         ","                     punctuation             "."                     punctuation    
         "Random"                letters                 "Random"                letters        
         "info"                  letters                 "info"                  letters        
         "still"                 letters                 "still"                 letters        
         "continues"             letters                 "continues"             letters        
         "."                     punctuation             "."                     punctuation    
         <missing>               letters                 "ExtraWord"             letters        
         <missing>               letters                 "ExtraWord"             letters        
         <missing>               letters                 "ExtraWord"             letters