仅在彼此直接相邻的大写字母之间插入空格

inserting spaces inbetween only capital letters that are directly adjacent to each other

我的目标是在各种缩写字母之间添加 spaces。

缩写有3个条件:

  1. 缩写至少包含两个或更多字母。
  2. 缩写总是大写。
  3. 特殊字符“/”被认为是大写字母

想想DNS、IP、TCP/IP等

我想对它们进行文本处理以变成:

D N S

I P

T C P / I P

等 假设我有这句话

Because IP provides this basic routing function, the term “IP router,” is often used. Other, older terms for router are (IP gateway), [Internet gateway], and 'gateway'. TCP/IP 12345.

运行 这个命令有点解决了我的问题:sed -e "s/[a-z \, \. \' \“ \” \( \) 0-9]*/& /g" -e "s/ */ /g" test.txt 不是很完美。

我明白了:

Because I P provides this basic routing function, the term “ I P router,” is often used. Other, older terms for router are ( I P gateway), [ Internet gateway ], and 'gateway'. T C P / I P 12345.

"和I P之间还有一个space。

在 ( 和 I P 之间有一个 space .

[和Internet之间还有一个space。

使用 $ sed -e "s/[a-z \, \. \' \“ \” \( \) \[ \] 0-9]*/& /g" -e "s/ */ /g" test.txt 转义 [ 和 ] 无效,如下所示。

Because IP provides this basic routing function, the term “IP router,” is often used. Other, older terms for router are (IP gateway), [Internet gateway], and 'gateway'. TCP/IP 12345.

正则表达式

/([A-Z])([A-Z])/

将匹配彼此相邻的两个大写字母的实例。然后你想在替换中使用捕获组来获得相同的字母,它们之间有 space。

/ /

这只会捕获一行中的前两个大写字母,因此第一次迭代后的输出如下所示:

Think of D NS, I P, T CP/I P, etc.

因此您需要重复替换直到正则表达式不匹配。在 Python 中,这将是:

the_string = 'Think of DNS, IP, TCP/IP, etc.'

while re.search(r'([A-Z])([A-Z])', the_string):
    the_string = re.sub(r'([A-Z])([A-Z])', r' ', the_string)

the_string 现在结束为:

Think of D NS, I P, T CP/I P, etc.

使用 GNU sed 和条件跳转:

echo 'think of DNS, IP, TCP/IP, etc.' | sed -E ':x; s/([A-Z/])([A-Z/])/ /; tx'

输出:

think of D N S, I P, T C P / I P, etc.