将 mat2str 推广到元胞数组

Question

我有时会错过一个生成（可能嵌套的）元胞数组的字符串表示形式的函数。这将是 mat2str 的概括，它仅适用于非元胞数组（数字、字符或逻辑类型）。

Given an array x, how to obtain a string representation y, such that evaluating this string produces x?

例如输入

x = {[10 20], {'abc'; false; true;}};

应该产生一个类似

的输出字符串

y = '{[10 20], {''abc''; false; true}}';

（或关于分隔符间距的一些变化），使得

isequal(x, eval(y))

是true。

Answer 1

以下函数适用于任意数组，具有任何嵌套结构和任何形状的数组，只要它们都是二维数组即可。不支持多维数组（与 mat2str 相同）。

该函数还允许为元胞数组指定任意行和列分隔符（例如，在逗号和 space 之间进行选择），并可选择为非元胞数组强制使用这些分隔符（从而覆盖 mat2str'行为）。元胞数组中的默认分隔符对于列是 ' '，对于行是 '; '。

function y = array2str(x, col_sep, row_sep, sep_noncell)
% Converts a (possibly cell, nested) array to string representation
%
% Optional inputs col_sep and row_sep specify separators for the cell arrays.
% They can be arbitrary strings (but they should be chosen as per Matlab rules
% so that the output string evaluates to the input). Optional flag sep_noncell
% can be used to force those separators with non-cell arrays too, instead of
% the separators produced by mat2str (space and semicolon)

% Default values
if nargin<4
    sep_noncell = false;
end
if nargin<3
    row_sep = '; ';
end
if nargin<2
    col_sep = ' ';
end

x = {x}; % this is to initiallize processing
y = {[]}; % [] indicates content unknown yet: we need to go on
done = false;
while ~done
    done = true; % tentatively
    for n = 1:numel(y);
        if isempty(y{n}) % we need to go deeper
            done = false;
            if ~iscell(x{1}) % we've reached ground
                s = mat2str(x{1}); % final content
                if sep_noncell % replace mat2str's separators if required
                    s = regexprep(s,'(?<=^[^'']*(''[^'']*'')*[^'']*) ', col_sep);
                    s = regexprep(s,'(?<=^[^'']*(''[^'']*'')*[^'']*);', row_sep);
                end
                y{n} = s; % put final content...
                x(1) = []; % ...and remove from x
            else % advance one level
                str = ['{' repmat([{[]}, col_sep], 1, numel(x{1})) '}'];
                ind_sep = find(cellfun(@(t) isequal(t, col_sep), str));
                if ~isempty(ind_sep)
                    str(ind_sep(end)) = []; % remove last column separator
                    ind_sep(end) = [];
                end
                step_sep = size(x{1}, 2);
                str(ind_sep(step_sep:step_sep:end)) = {row_sep};
                y = [y(1:n-1) str y(n+1:end)]; % mark for further processing...
                x = [reshape(x{1}.', 1, []) x(2:end)]; % ...and unbox x{1},
                    % transposed and linearized
            end
        end
    end
end
y = [y{:}]; % concatenate all strings

上述函数使用正则表达式强制在非元胞数组中指定分隔符。由于受支持的后视模式的限制，这在 Matlab 中有效，但在 Octave 中无效。以下修改版本避免了正则表达式，因此可以在 Matlab 和 Octave 中使用。只有 if sep_noncell 和匹配的 end 之间的部分相对于第一个版本发生了变化。

function y = array2str(x, col_sep, row_sep, sep_noncell)
% Converts a (possibly cell, nested) array to string representation.
% Octave-friendly version
%
% Optional inputs col_sep and row_sep specify separators for the cell arrays.
% They can be arbitrary strings (but they should be chosen as per Matlab rules
% so that the output string evaluates to the input). Optional flag sep_noncell
% can be used to force those separators with non-cell arrays too, instead of
% the separators produced by mat2str (space and semicolon)

% Default values
if nargin<4
    sep_noncell = false;
end
if nargin<3
    row_sep = '; ';
end
if nargin<2
    col_sep = ' ';
end

x = {x}; % this is to initiallize processing
y = {[]}; % [] indicates content unknown yet: we need to go on
done = false;
while ~done
    done = true; % tentatively
    for n = 1:numel(y);
        if isempty(y{n}) % we need to go deeper
            done = false;
            if ~iscell(x{1}) % we've reached ground
                s = mat2str(x{1}); % final content
                if sep_noncell % replace mat2str's separators if required
                    for k = flip(find(~mod(cumsum(s==''''),2) & s==' ')) % process
                        % backwards, because indices to the right will become invalid
                        s = [s(1:k-1) col_sep s(k+1:end)];
                    end
                    for k = flip(find(~mod(cumsum(s==''''),2) & s==';'))
                        s = [s(1:k-1) row_sep s(k+1:end)];
                    end
                end
                y{n} = s; % put final content...
                x(1) = []; % ...and remove from x
            else % advance one level
                str = ['{' repmat([{[]}, col_sep], 1, numel(x{1})) '}'];
                ind_sep = find(cellfun(@(t) isequal(t, col_sep), str));
                if ~isempty(ind_sep)
                    str(ind_sep(end)) = []; % remove last column separator
                    ind_sep(end) = [];
                end
                step_sep = size(x{1}, 2);
                str(ind_sep(step_sep:step_sep:end)) = {row_sep};
                y = [y(1:n-1) str y(n+1:end)]; % mark for further processing...
                x = [reshape(x{1}.', 1, []) x(2:end)]; % ...and unbox x{1},
                    % transposed and linearized
            end
        end
    end
end
y = [y{:}]; % concatenate all strings

工作原理

我选择了一种非递归方法，因为我通常更习惯迭代而不是递归。

通过在元胞数组 (y) 中保留子字符串或空数组 ([]) 逐渐构建输出。 y 单元格中的空数组表示 "further processing is needed"。子字符串定义 "structure"，或最终定义单元格嵌套最深层的数字、字符或逻辑内容。

在每次迭代中，将y中找到的第一个空数组替换为实际内容，或者替换为子字符串和其他空数组以供稍后处理。当 y 不包含任何空数组时，过程结束，连接 y 的所有子字符串以获得最终字符串输出。

例如，给定输入 x = {[10 20], {'abc'; false; true;}}; 并调用 y = array2str(x)，每个步骤中的数组 y 是一个元胞数组，其中包含：

'{'   []   ', '   []   '}'

'{'   '[10 20]'   ', '   []   '}'

'{'   '[10 20]'   ', '   '{'   []   '; '   []   '; '   []   '}'   '}'

'{'   '[10 20]'   ', '   '{'   ''abc''   '; '   []   '; '   []   '}'   '}'

'{'   '[10 20]'   ', '   '{'   ''abc''   '; '   'false'   '; '   []   '}'   '}'

'{'   '[10 20]'   ', '   '{'   ''abc''   '; '   'false'   '; '   'true'   '}'   '}'

而后者最后拼接成字符串

'{[10 20] {''abc''; false; true}}'

作为自定义分隔符的示例，array2str(x, ', ', '; ', true) 会给出

'{[10, 20], {''abc''; false; true}}'

Answer 2

此过程将数据结构转换为稍后可以求值的字符串，命名为 serialization。

有一个 serialize function for Octave 可用于此目的，它支持具有任意维数（不仅是 2d）的任何核心数据类型（不仅是元胞数组）。

示例：

## Works for normal 2d numeric arrays
octave> a = magic (4);
octave> serialize (a)
ans = double([16 2 3 13;5 11 10 8;9 7 6 12;4 14 15 1])
octave> assert (eval (serialize (a)), a)

## Works for normal 3d numeric arrays with all precision
octave> a = rand (3, 3, 3);
octave> serialize (a)
ans = cat(3,double([0.53837757395682650507 0.41720691649633284692 0.66860079620859769189;0.018390655109800025518 0.56538265981533797344 0.20709955358395887304;0.86811365238275806089 0.18398187533949311723 0.20280927116918162634]),double([0.40869259684132724919 0.96877003954154328191 0.32138458265911834522;0.37357584261201565168 0.69925333907961184643 0.10937000120952171389;0.3804633375950405294 0.32942660641033155722 0.79302478034566603604]),double([0.44879474273802461015 0.78659287316710135851 0.49078191654039543534;0.66470978375890155121 0.87740365914996953922 0.77817214018098579409;0.51361398808500036139 0.75508941052835898411 0.70283088935085502591]))
octave> assert (eval (serialize (a)), a)

## Works for 3 dimensional cell arrays of strings
octave> a = reshape ({'foo', 'bar' 'qux', 'lol', 'baz', 'hello', 'there', 'octave'}, [2 2 2])
a = {2x2x2 Cell Array}
octave> serialize (a)
ans = cat(3,{["foo"],["qux"];["bar"],["lol"]},{["baz"],["there"];["hello"],["octave"]})
octave> assert (eval (serialize (a)), a)

但是，更好的问题是 why do you want to do this in the first place? If the reason you're doing this is to send variables between multiple instances of Octave, consider using the parallel and mpi 具有专门为此目的设计的功能的软件包。

将 mat2str 推广到元胞数组

Generalization of mat2str to cell arrays

arrays

string

matlab

octave

cell-array

工作原理