在正则表达式替换键中的非单词字符时保留文本中的 key:value 值 (Notepad++)

Question

尝试在 Notepad++ 中用下划线 _ 替换任何非单词字符 \W 但没有成功line- 某种 space 描述的层次结构，以键值对结束）。 python 解决方案也可以使用，因为我试图在重新格式化后用它做其他事情。示例：

This 100% isn't what I want   
  Yet, it's-what-I've got currently: D@rnit :(  
This_100_is_what_I_d_like: See?  
  Indentation_isn_t_necessary  
    _to_maintain_but_would_be_nice: :)<-preserved!
  I_m_Mr_Conformist_over_here: |Whereas, I'm like whatever's clever.| 
If_you_can_help: Thanks 100.1%!

Answer 1

看到对您要执行的操作的更好描述，我认为您无法使用单个正则表达式从 notepad++ 内部执行此操作。但是，您可以编写一个 python 脚本来滚动浏览您的文档，一次一行，并清除冒号左侧的任何内容（如果存在的话）

这是一个简单粗暴的例子（未经测试）。这假设 doc 是一个打开的文件指针，指向您要清理的文件

import re

sanitized_lines = []
for line in doc:
    line_match = re.match(r"^(\s*)([^:\n]*)(.*)", line)
    indentation = line_match.group(1)
    left_of_colon = line_match.group(2)
    remainder = line_match.group(3)

    left_of_colon = re.sub(r"\W", "_", left_of_colon)

    sanitized_lines.append("".join((indentation, left_of_colon, remainder)))

sanitized_doc = "".join(sanitized_lines)
print(sanitized_doc)

Answer 2

我承认我在回答一个题外话我只是喜欢这个问题。按住 CTRL+H，启用 N++ 中的正则表达式然后搜索：

(:[^\r\n]*|^\s+)|\W(?<![\r\n])

并替换为：

(?1:_)

正则表达式有两个主要部分。外部交替的第一侧匹配行的前导空格（缩进）或第一次出现冒号后的所有内容，第二侧匹配除回车 return \r 或换行符以外的非单词字符\n 字符（负向后视）以保留换行符。替换字符串是一个条件块，它表示如果第一个捕获组匹配，则将其替换为自身，如果不匹配，则将其替换为 _.

Answer 3

您可以试试这个 python 脚本，

ss="""This 100% isn't what I want   
  Yet, it's-what-I've got currently: D@rnit :(  
If you can help: Thanks 100.1%!"""

import re
splitcapture=re.compile(r'(?m)^([^:\n]+)(:[^\n]*|)$')
subregx=re.compile(r'\W+')
print(splitcapture.sub(lambda m: subregx.sub('_', m.group(1))+m.group(2), ss))

其中我首先尝试匹配每一行并分别捕获 2 个部分（one part 不包含 ':'character is capured to group 1, and the other possible part started with ':' 然后继续该行的末尾被捕获到 group 2)，然后仅对第 1 组捕获的字符串实施替换过程，最后加入 2 个部分，replaced group 1 + group 2

输出为

This_100_isn_t_what_I_want_
_Yet_it_s_what_I_ve_got_currently: D@rnit :(  
If_you_can_help: Thanks 100.1%!

在正则表达式替换键中的非单词字符时保留文本中的 key:value 值 (Notepad++)

Preserve key:value values in text while regex replacing non-word characters in keys (Notepad++)

python

regex

pcre

notepad++

key-value