使用 shell 脚本将文件输入格式化为所需的输出

Question

我需要将给定的输入格式化为显示的输出。我该怎么做？

输入：

\n    \abc\:\abc_2\,\n    \rick\:\rick_1\,\n    \harry\:\harry_1\,\n    \Christine\:\Christine_2\,\n

Answer 1

如果数据到达的一行中包含反斜杠，那么我认为您可以稍微小心地让 sed 工作。

您需要用换行符替换每个 \, 序列。
您需要将每个 \n 后面的零个或多个空格替换为空。
您需要将每个剩余的反斜杠替换为空。
您需要删除最后一个换行符（这样您就没有两个）。

转换为：

echo '\n    \abc\:\abc_2\,\n    \rick\:\rick_1\,\n    \harry\:\harry_1\,\n    \Christine\:\Christine_2\,\n' |
sed -e 's/\,/\n/g' \
    -e 's/\n *//g' \
    -e 's/\//g' \
    -e 's/\n$//'

当我使用 GNU sed 时，这对我来说工作正常。它不会用 BSD (Mac OS X) sed 产生 'correct' 输出；我没有在输出中插入换行符。这是因为 BSD sed 遵守 POSIX sed 规范，它说：

The escape sequence '\n' shall match a <newline> embedded in the pattern space. A literal <newline> shall not be used in the BRE of a context address or in the substitute function.

sed 在 Mac OS X 上的手册页说的大致相同：

The escape sequence \n matches a newline character embedded in the pattern space. You cannot, however, use a literal newline character in an address or in the substitute command.

你如何解决这个问题？痛苦，可能就是答案。可以使用y命令，因为POSIX说：

[2addr]y/string1/string2/
Replace all occurrences of characters in string1 with the corresponding characters in string2. If a <backslash> followed by an 'n' appear [sic] in string1 or string2, the two characters shall be handled as a single <newline>. If the number of characters in string1 and string2 are not equal, or if any of the characters in string1 appear more than once, the results are undefined. Any character other than <backslash> or <newline> can be used instead of <slash> to delimit the strings. If the delimiter is not 'n', within string1 and string2, the delimiter itself can be used as a literal character if it is preceded by a <backslash>. If a <backslash> character is immediately followed by a <backslash> character in string1 or string2, the two <backslash> characters shall be counted as a single literal <backslash> character. The meaning of a <backslash> followed by any character that is not 'n', a <backslash>, or the delimiter character is undefined.

Mac OS X 手册页不那么冗长，也不那么精确，但说的大致相同。所以，我认为诀窍是将 \, 映射到 Control-A 等字符，然后使用 y/^A/\n/ 映射 Control-A 换行。

即：

echo '\n    \abc\:\abc_2\,\n    \rick\:\rick_1\,\n    \harry\:\harry_1\,\n    \Christine\:\Christine_2\,\n' |
sed -e 's/\,/^A/g' \
    -e 'y/^A/\n/' \
    -e 's/\n *//g' \
    -e 's/\//g' \
    -e 's/\n$//'

（显示为 ^A 的实际上是 Control-A；我需要使用 Control-VControl-A in vim 来输入字符。）无论如何，这可以在 Mac OS X 或 BSD sed 中正常工作。

Answer 2

使用 GNU awk 进行多字符 RS：

$ awk -v RS=',?\\n[[:space:]]+' 'gsub(/\/,"")' file
abc:abc_2
rick:rick_1
harry:harry_1
Christine:Christine_2

使用 shell 脚本将文件输入格式化为所需的输出

Formatting file input to a required output using shell script

unix

bash

shell

awk

sh