正则表达式仅在不在用户名中时才匹配“_”字符
Regex match "_" char only if it isn't in a username
上下文和解释
我正在做一个电报机器人,我想在每个不在用户名中的 "_"
字符(以 "@"
开头的单词)之前添加 excape 字符 "\"
像 "@username_"
,以防止一些降价错误(事实上,在电报中,"_"
字符用于使字符串变为斜体)。
所以,例如,有这个字符串:
"hello i like this char _ write me lol_ @myusername_"
我只想匹配前两个 "_"
个字符而不匹配第三个
问题
使用正则表达式模式执行此操作的正确方法是什么?
预期条件和匹配
Condition
Match
"_"
alone: ("_"
)
YES
"_"
in a word without "@"
: ("lol_"
)
YES
"_"
in a word starting with "@"
: ("@username_"
)
NO
"_"
in a word containing "@"
after the "@"
: ("lol@username_"
)
NO
"_"
in a word containing "@"
before the "@"
: ("lol_@username"
)
YES
"_"
in a world like: ("lol_@username_"
)
first: YES second: NO
我试过的
到目前为止我已经知道了,但是它不能正常工作:
"(?=[^@]+)(?:\s[^\s]*(_)[^\s]*\s)"
编辑
我还希望在这个字符串中:"lol_@username_"
第一个字符 "_"
被匹配
我假设您只关心 @
在单词的 start 处。您可以使用 re.sub
以及 replace
和 (?:\s|^)[^@]\S+\b
来匹配符合您的规范的词:
import re
s = "hello i like this char _ write me lol_ @myusername_ asd@_a @_asdf"
s = re.sub(r"(?:\s|^)[^@]\S*\b", lambda x: x.group().replace("_", r"\_"), s)
print(s) # => hello i like this char \_ write me lol\_ @myusername_ asd@\_a @_asdf
如果您关心 @
出现在 任何地方 中,请尝试 (?:\s|^)[^@\s]+\b
:
s = "he_llo i like this char _ write me lol_ @myusername_ asd@_a @_asdf"
s = re.sub(r"(?:\s|^)[^@\s]+\b", lambda x: x.group().replace("_", r"\_"), s)
print(s) # => he\_llo i like this char \_ write me lol\_ @myusername_ asd@_a @_asdf
根据 OP 评论,听起来最新的规范是转义 _
,除了 @
之后的任何地方,一句话:
>>> s = "he_llo i lol_@username_ _ write me lol_ @myusername_ asd@_a @_asdf"
>>> re.sub(r"(?:\s|^)[^@]+@", lambda x: x.group().replace("_", r"\_"), s)
'he\_llo i lol\_@username_ \_ write me lol\_ @myusername_ asd@_a @_asdf'
使用 PyPi 正则表达式库提取:
import regex
string = "hello i like this char _ write me lol_ @myusername_"
print(regex.findall(r'(?<!\S)@\w+(*SKIP)(*F)|_', string))
# ['_', '_']
说明
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
@ '@'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount possible))
--------------------------------------------------------------------------------
(*SKIP)(*F) skip the match, search from the failure location
--------------------------------------------------------------------------------
| or
--------------------------------------------------------------------------------
_ a '_' char
删除 与 re
:
import re
string = "hello i like this char _ write me lol_ @myusername_"
print(re.sub(r'(?<!\S)(@\w+)|_', r'', string))
# hello i like this char write me lol @myusername_
将替换为re
:
import re
string = "hello i like this char _ write me lol_ @myusername_"
print(re.sub(r'(?<!\S)(@\w+)|_', lambda x: x.group(1) or "-", string))
# hello i like this char - write me lol- @myusername_
您可以在匹配 @
之后匹配所有非空白字符,并使用交替捕获组中的 _
。如果是re.sub的回调,检查组1是否存在
如果是,return 转义的下划线或转义的第 1 组值(也是下划线),否则 return 保持不变的匹配。
@\S+|(_)
import re
strings = [
"_",
"lol_",
"@username_",
"lol@username_",
"lol_@username",
"lol_@username_"
]
for s in strings:
result = re.sub(
r"@\S+|(_)",
lambda x: x.group(1).replace("_", r"\_") if x.group(1) else x.group(),
s
)
print(result)
输出
\_
lol\_
@username_
lol@username_
lol\_@username
lol\_@username_
根据@OlvinRoght 的评论,稍作修改,这应该可以解决问题:
正则表达式
((?:^|\s)(?:[^@\s]*?))(_)((?:[^@\s]*?))(?=@|\s|$)
代码示例
import re
text = '_hi hello i like this char _ write me lol_ _word something_ @myusername_ something_@username_'
regex = r"((?:^|\s)(?:[^@\s]*?))(_)((?:[^@\s]*?))(?=@|\s|$)"
# Leave the first and last capturing group as-is and replace the underscore with '\_'
subst = "\1\\_\3"
print( re.sub(regex, subst, text) )
预期输出:
\_hi hello i like this char \_ write me lol\_ \_word something\_ @myusername_ something\_@username_
演示
注:
虽然这可行,但@TheFourthBird 的回答更快。 (我认为更优雅。)
上下文和解释
我正在做一个电报机器人,我想在每个不在用户名中的 "_"
字符(以 "@"
开头的单词)之前添加 excape 字符 "\"
像 "@username_"
,以防止一些降价错误(事实上,在电报中,"_"
字符用于使字符串变为斜体)。
所以,例如,有这个字符串:
"hello i like this char _ write me lol_ @myusername_"
我只想匹配前两个 "_"
个字符而不匹配第三个
问题
使用正则表达式模式执行此操作的正确方法是什么?
预期条件和匹配
Condition | Match |
---|---|
"_" alone: ("_" ) |
YES |
"_" in a word without "@" : ("lol_" ) |
YES |
"_" in a word starting with "@" : ("@username_" ) |
NO |
"_" in a word containing "@" after the "@" : ("lol@username_" ) |
NO |
"_" in a word containing "@" before the "@" : ("lol_@username" ) |
YES |
"_" in a world like: ("lol_@username_" ) |
first: YES second: NO |
我试过的
到目前为止我已经知道了,但是它不能正常工作:
"(?=[^@]+)(?:\s[^\s]*(_)[^\s]*\s)"
编辑
我还希望在这个字符串中:"lol_@username_"
第一个字符 "_"
被匹配
我假设您只关心 @
在单词的 start 处。您可以使用 re.sub
以及 replace
和 (?:\s|^)[^@]\S+\b
来匹配符合您的规范的词:
import re
s = "hello i like this char _ write me lol_ @myusername_ asd@_a @_asdf"
s = re.sub(r"(?:\s|^)[^@]\S*\b", lambda x: x.group().replace("_", r"\_"), s)
print(s) # => hello i like this char \_ write me lol\_ @myusername_ asd@\_a @_asdf
如果您关心 @
出现在 任何地方 中,请尝试 (?:\s|^)[^@\s]+\b
:
s = "he_llo i like this char _ write me lol_ @myusername_ asd@_a @_asdf"
s = re.sub(r"(?:\s|^)[^@\s]+\b", lambda x: x.group().replace("_", r"\_"), s)
print(s) # => he\_llo i like this char \_ write me lol\_ @myusername_ asd@_a @_asdf
根据 OP 评论,听起来最新的规范是转义 _
,除了 @
之后的任何地方,一句话:
>>> s = "he_llo i lol_@username_ _ write me lol_ @myusername_ asd@_a @_asdf"
>>> re.sub(r"(?:\s|^)[^@]+@", lambda x: x.group().replace("_", r"\_"), s)
'he\_llo i lol\_@username_ \_ write me lol\_ @myusername_ asd@_a @_asdf'
使用 PyPi 正则表达式库提取:
import regex
string = "hello i like this char _ write me lol_ @myusername_"
print(regex.findall(r'(?<!\S)@\w+(*SKIP)(*F)|_', string))
# ['_', '_']
说明
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
@ '@'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount possible))
--------------------------------------------------------------------------------
(*SKIP)(*F) skip the match, search from the failure location
--------------------------------------------------------------------------------
| or
--------------------------------------------------------------------------------
_ a '_' char
删除 与 re
:
import re
string = "hello i like this char _ write me lol_ @myusername_"
print(re.sub(r'(?<!\S)(@\w+)|_', r'', string))
# hello i like this char write me lol @myusername_
将替换为re
:
import re
string = "hello i like this char _ write me lol_ @myusername_"
print(re.sub(r'(?<!\S)(@\w+)|_', lambda x: x.group(1) or "-", string))
# hello i like this char - write me lol- @myusername_
您可以在匹配 @
之后匹配所有非空白字符,并使用交替捕获组中的 _
。如果是re.sub的回调,检查组1是否存在
如果是,return 转义的下划线或转义的第 1 组值(也是下划线),否则 return 保持不变的匹配。
@\S+|(_)
import re
strings = [
"_",
"lol_",
"@username_",
"lol@username_",
"lol_@username",
"lol_@username_"
]
for s in strings:
result = re.sub(
r"@\S+|(_)",
lambda x: x.group(1).replace("_", r"\_") if x.group(1) else x.group(),
s
)
print(result)
输出
\_
lol\_
@username_
lol@username_
lol\_@username
lol\_@username_
根据@OlvinRoght 的评论,稍作修改,这应该可以解决问题:
正则表达式
((?:^|\s)(?:[^@\s]*?))(_)((?:[^@\s]*?))(?=@|\s|$)
代码示例
import re
text = '_hi hello i like this char _ write me lol_ _word something_ @myusername_ something_@username_'
regex = r"((?:^|\s)(?:[^@\s]*?))(_)((?:[^@\s]*?))(?=@|\s|$)"
# Leave the first and last capturing group as-is and replace the underscore with '\_'
subst = "\1\\_\3"
print( re.sub(regex, subst, text) )
预期输出:
\_hi hello i like this char \_ write me lol\_ \_word something\_ @myusername_ something\_@username_
演示
注:
虽然这可行,但@TheFourthBird 的回答更快。 (我认为更优雅。)