仅在彼此直接相邻的大写字母之间插入空格
inserting spaces inbetween only capital letters that are directly adjacent to each other
我的目标是在各种缩写字母之间添加 spaces。
缩写有3个条件:
- 缩写至少包含两个或更多字母。
- 缩写总是大写。
- 特殊字符“/”被认为是大写字母
想想DNS、IP、TCP/IP等
我想对它们进行文本处理以变成:
D N S
I P
T C P / I P
等
假设我有这句话
Because IP provides this basic routing function, the term “IP router,” is often used. Other, older terms for router are (IP gateway), [Internet gateway], and 'gateway'. TCP/IP 12345.
运行 这个命令有点解决了我的问题:sed -e "s/[a-z \, \. \' \“ \” \( \) 0-9]*/& /g" -e "s/ */ /g" test.txt
不是很完美。
我明白了:
Because I P provides this basic routing function, the term “ I P router,” is often used. Other, older terms for router are ( I P gateway), [ Internet gateway ], and 'gateway'. T C P / I P 12345.
"和I P之间还有一个space。
在 ( 和 I P 之间有一个 space .
[和Internet之间还有一个space。
使用 $ sed -e "s/[a-z \, \. \' \“ \” \( \) \[ \] 0-9]*/& /g" -e "s/ */ /g" test.txt
转义 [ 和 ] 无效,如下所示。
Because IP provides this basic routing function, the term “IP router,” is often used. Other, older terms for router are (IP gateway), [Internet gateway], and 'gateway'. TCP/IP 12345.
正则表达式
/([A-Z])([A-Z])/
将匹配彼此相邻的两个大写字母的实例。然后你想在替换中使用捕获组来获得相同的字母,它们之间有 space。
/ /
这只会捕获一行中的前两个大写字母,因此第一次迭代后的输出如下所示:
Think of D NS, I P, T CP/I P, etc.
因此您需要重复替换直到正则表达式不匹配。在 Python 中,这将是:
the_string = 'Think of DNS, IP, TCP/IP, etc.'
while re.search(r'([A-Z])([A-Z])', the_string):
the_string = re.sub(r'([A-Z])([A-Z])', r' ', the_string)
the_string
现在结束为:
Think of D NS, I P, T CP/I P, etc.
使用 GNU sed 和条件跳转:
echo 'think of DNS, IP, TCP/IP, etc.' | sed -E ':x; s/([A-Z/])([A-Z/])/ /; tx'
输出:
think of D N S, I P, T C P / I P, etc.
我的目标是在各种缩写字母之间添加 spaces。
缩写有3个条件:
- 缩写至少包含两个或更多字母。
- 缩写总是大写。
- 特殊字符“/”被认为是大写字母
想想DNS、IP、TCP/IP等
我想对它们进行文本处理以变成:
D N S
I P
T C P / I P
等 假设我有这句话
Because IP provides this basic routing function, the term “IP router,” is often used. Other, older terms for router are (IP gateway), [Internet gateway], and 'gateway'. TCP/IP 12345.
运行 这个命令有点解决了我的问题:sed -e "s/[a-z \, \. \' \“ \” \( \) 0-9]*/& /g" -e "s/ */ /g" test.txt
不是很完美。
我明白了:
Because I P provides this basic routing function, the term “ I P router,” is often used. Other, older terms for router are ( I P gateway), [ Internet gateway ], and 'gateway'. T C P / I P 12345.
"和I P之间还有一个space。
在 ( 和 I P 之间有一个 space .
[和Internet之间还有一个space。
使用 $ sed -e "s/[a-z \, \. \' \“ \” \( \) \[ \] 0-9]*/& /g" -e "s/ */ /g" test.txt
转义 [ 和 ] 无效,如下所示。
Because IP provides this basic routing function, the term “IP router,” is often used. Other, older terms for router are (IP gateway), [Internet gateway], and 'gateway'. TCP/IP 12345.
正则表达式
/([A-Z])([A-Z])/
将匹配彼此相邻的两个大写字母的实例。然后你想在替换中使用捕获组来获得相同的字母,它们之间有 space。
/ /
这只会捕获一行中的前两个大写字母,因此第一次迭代后的输出如下所示:
Think of D NS, I P, T CP/I P, etc.
因此您需要重复替换直到正则表达式不匹配。在 Python 中,这将是:
the_string = 'Think of DNS, IP, TCP/IP, etc.'
while re.search(r'([A-Z])([A-Z])', the_string):
the_string = re.sub(r'([A-Z])([A-Z])', r' ', the_string)
the_string
现在结束为:
Think of D NS, I P, T CP/I P, etc.
使用 GNU sed 和条件跳转:
echo 'think of DNS, IP, TCP/IP, etc.' | sed -E ':x; s/([A-Z/])([A-Z/])/ /; tx'
输出:
think of D N S, I P, T C P / I P, etc.