Python 正则表达式模式，用于查找代码行是否以 space 或制表符结束

Question

很抱歉提出这么低级的问题，但我真的在来这里之前试图寻找答案...... 基本上我有一个脚本，它在 .py 文件中搜索并逐行读取那里的代码 -> 脚本的目标是查找一行是否以 space 或制表符结束，如下例所示

i = 5 
z = 25

基本上在 i 变量之后我们应该有一个 \s 并且在 z 变量之后有一个 \t 。（希望代码格式不要抹掉）

def custom_checks(file, rule):
    """
    @param file: file: file in-which you search for a specific character
    @param rule: the specific character you search for
    @return: dict obj with the form { line number : character }
    """
    rule=re.escape(rule)
    logging.info(f"     File {os.path.abspath(file)} checked for {repr(rule)} inside it ")
    result_dict = {}

    file = fileinput.input([file])
    for idx, line in enumerate(file):
        if re.search(rule, line):
            result_dict[idx + 1] = str(rule)

    file.close()
    if not len(result_dict):
        logging.info("Zero non-compliance found based on the rule:2 consecutive empty rows")
    else:
        logging.warning(f'Found the next errors:{result_dict}')

之后，如果我检查日志输出，我会看到：检查里面的 '\+s\\s\$' 我不知道为什么 \ 是双倍的基本上我也从 config.json 得到所有的正则表达式，就是这个：

{
  "ends with tab":"+\t$",
  "ends with space":"+s\s$"

}

有人可以在这个方向上帮我吗->我基本上知道我可以用其他方式做，比如反转行 [::-1] 得到第一个字符，看看它是不是 \s 等等，但我真的想用正则表达式来做。谢谢！

Answer 1

尝试：

rules = {
  'ends with tab': re.compile(r'\t$'),
  'ends with space': re.compile(r' $'),
}

注意：从迭代中获取行时，文件将在每个字符串的末尾留下换行符 ('\n')，正则表达式中的 $ 匹配第一个 newline 之前的位置在字符串中。因此，如果使用正则表达式，则无需显式去除换行符。

if rule.search(line):
    ...

不过，就我个人而言，我会使用 line.rstrip() != line.rstrip('\n') 一次性标记任何类型的尾随空格。

如果要直接检查行尾的特定字符，则需要去除所有换行符，并且需要检查该行是否为空。例如：

char = '\t'
s = line.strip('\n')

if s and s[-1] == char:
    ...

附录 1：从 JSON 配置中读取规则

# here from a string, but could be in a file, of course
json_config = """
{
    "ends with tab": "\t$",
    "ends with space": " $"
}
"""

rules = {k: re.compile(v) for k, v in json.loads(json_config).items()}

附录 2：评论

下面展示了如何注释掉规则，以及检测文件中注释的规则来处理。由于JSON不支持注释，我们可以考虑用yaml代替：

yaml_config = """
ends with space: ' $'
ends with tab: \t$
is comment: ^\s*#
# ignore: 'foo'
"""

import yaml

rules = {k: re.compile(v) for k, v in yaml.safe_load(yaml_config).items()}

注：'is comment'很简单。假设的 'has comment' 更难定义——为什么？我将把它留作 reader 的练习 ;-)

注意 2：在文件中，yaml 配置将没有双反斜杠，例如：

cat > config.yml << EOF
ends with space: ' $'
ends with tab: \t$
is comment: ^\s*#
# ignore: 'foo'
EOF

补充思考

您可能想 autopep8 试一试。

示例：

cat > foo.py << EOF
# this is a comment   

text = """
# xyz  
bar  
"""
def foo(): 
    # to be continued  
    pass 

def bar():
  pass     

 
  
EOF

注意：要显示多余的空格：

cat foo.py | perl -pe 's/$/|/'
# this is a comment   |
|
text = """|
# xyz  |
bar  |
"""|
def foo(): |
    # to be continued  |
    pass |
|
def bar():|
  pass     |
|
 |
  |

上面有几个 PEP8 问题（行尾有额外的空格，函数之间只有 1 行，等等）。 Autopep8 修复了所有问题（但正确地保留了文本变量不变）：

autopep8 foo.py | perl -pe 's/$/|/'
# this is a comment|
|
text = """|
# xyz  |
bar  |
"""|
|
|
def foo():|
    # to be continued|
    pass|
|
|
def bar():|
    pass|

Python 正则表达式模式，用于查找代码行是否以 space 或制表符结束

Python regex pattern in order to find if a code line is finishing with a space or tab character

python

regex

附录 1：从 JSON 配置中读取规则

附录 2：评论

补充思考