Python 正则表达式替换 - 反转搜索删除太多
Python RegEx Replace - Inverting the Search removing too much
在 Python 2.7 工作。我试图从字符串中删除所有不是数据库和表名组合的东西。我为此使用正则表达式,无意中删除了所有空格(我需要保留这些空格来分隔值)
s = "replace view dw1.tbl1_st as select dw2.tbl1_st.col1, dw2.tbl1_st.col2, "
s = s + "dw2.tbl1_st.col3, dw2.tbl1_st.col4 dw2.tbl1_st.col5, "
s = s + "dw2.tbl1_st.col6, dw2.tbl1_st.col7 dw2.tbl1_st.col15, dw2.tbl1_st.col8, "
s = s + "dw2.tbl1_st.col9, dw2.tbl1_st.col10, dw2.tbl1_st.col11, dw2.tbl1_st.col12, "
s = s + "dw2.tbl1_st.col13, dw2.tbl1_st.col14 from dw2.tbl1_st;"
replaced = re.sub(r'((?!\w+\.\w+).)', '', s)
结果集正在删除“.”在数据库和表名之间。但是我想要 ”。”以及要保留的空白。
>> replaced
'dw1dw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_
stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_
stdw2tbl1_stdw2tbl1_stdw2'
>> desired_results (Option 1)
'dw1.dw2.tbl1_st dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st,
dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st,
dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.'
或同样可行:
>> desired_results (Option 2)
'dw1 dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st
dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st
dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2'
一个选项,如果你知道你的字符串的结构并且它是相当规则的,这将起作用,而不是使用 .
来匹配所有东西,使用否定来匹配除 space 之外的任何东西或逗号:
>>> replaced = re.sub(r'((?!\w+\.\w+)[^, ])', '', s)
>>> replaced
' dw1 dw2tbl1_st, dw2tbl1_st, dw2tbl1_st, dw2tbl1_st dw2tbl1_st,
dw2tbl1_st, dw2tbl1_st dw2tbl1_st, dw2tbl1_st, dw2tbl1_st, dw2tbl1_st,
dw2tbl1_st, dw2tbl1_st, dw2tbl1_st, dw2tbl1_st dw2'
或者更好的是,使用 re.findall
和负捕获组:
,最后用 space 或任何你想要的加入结果列表:
>>> " ".join(re.findall(r'((?:\w+\.\w+))',s))
'dw1.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st
dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st
dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st
dw2.tbl1_st dw2.tbl1_st'
在 Python 2.7 工作。我试图从字符串中删除所有不是数据库和表名组合的东西。我为此使用正则表达式,无意中删除了所有空格(我需要保留这些空格来分隔值)
s = "replace view dw1.tbl1_st as select dw2.tbl1_st.col1, dw2.tbl1_st.col2, "
s = s + "dw2.tbl1_st.col3, dw2.tbl1_st.col4 dw2.tbl1_st.col5, "
s = s + "dw2.tbl1_st.col6, dw2.tbl1_st.col7 dw2.tbl1_st.col15, dw2.tbl1_st.col8, "
s = s + "dw2.tbl1_st.col9, dw2.tbl1_st.col10, dw2.tbl1_st.col11, dw2.tbl1_st.col12, "
s = s + "dw2.tbl1_st.col13, dw2.tbl1_st.col14 from dw2.tbl1_st;"
replaced = re.sub(r'((?!\w+\.\w+).)', '', s)
结果集正在删除“.”在数据库和表名之间。但是我想要 ”。”以及要保留的空白。
>> replaced
'dw1dw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_
stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_stdw2tbl1_
stdw2tbl1_stdw2tbl1_stdw2'
>> desired_results (Option 1)
'dw1.dw2.tbl1_st dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st,
dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st,
dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.tbl1_st, dw2.'
或同样可行:
>> desired_results (Option 2)
'dw1 dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st
dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st
dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2tbl1_st dw2'
一个选项,如果你知道你的字符串的结构并且它是相当规则的,这将起作用,而不是使用 .
来匹配所有东西,使用否定来匹配除 space 之外的任何东西或逗号:
>>> replaced = re.sub(r'((?!\w+\.\w+)[^, ])', '', s)
>>> replaced
' dw1 dw2tbl1_st, dw2tbl1_st, dw2tbl1_st, dw2tbl1_st dw2tbl1_st,
dw2tbl1_st, dw2tbl1_st dw2tbl1_st, dw2tbl1_st, dw2tbl1_st, dw2tbl1_st,
dw2tbl1_st, dw2tbl1_st, dw2tbl1_st, dw2tbl1_st dw2'
或者更好的是,使用 re.findall
和负捕获组:
,最后用 space 或任何你想要的加入结果列表:
>>> " ".join(re.findall(r'((?:\w+\.\w+))',s))
'dw1.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st
dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st
dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st dw2.tbl1_st
dw2.tbl1_st dw2.tbl1_st'