在文件中查找关键字，解析它们所在的行，return dict

Question

我需要在文件中查找信息。该文件有很多行，但我要查找的是这样的

 Initial command:
 /opt/user/program/pg.c01/l1.exe "/scratch/user/pg-18930.inp" -scrdir="/scratch/user/"
 Entering Link 1 = /opt/software/program/pg.c01/l1.exe PID=     18941.
  
 Copyright (c) 1950-2050, program, Inc.  All Rights Reserved.

我需要为 -scrdir="/scratch/user/" 和 PID= 18941 解析文件。

我想要return这样的字典

dict = {"-scrdir=":"/scratch/user/", "PID":18941}

这应该是通用的，因为我想传递一组要搜索的东西，即 -scrdir=, and/or PID and/or other, and, get returned 文件中那些关键字（如果存在）之后的任何内容。

到目前为止，我所做的似乎有效，但逻辑语句似乎很重作为 MWE，我将信息存储在列表而不是文件中，并且具有以下内容

log = ["this is a line Initial",
   '/opt/user/program/pg.c01/l1.exe "/scratch/user/pg-18930.inp" -scrdir="/scratch/user/"',
   "Entering Link 1 = /opt/software/program/pg.c01/l1.exe PID=     18941.",
   "  ",
   "Copyright (c) 1950-2050, program, Inc.  All Rights Reserved."]
dicti = {}
phrases = ["-scrdir", "PID"]
# with open(file, 'r') as log:# would use in real situation
    for line in log:
        if any(word in line for word in phrases):
            for phrase in phrases:
                try:
                    dicti[phrase]=line.split(phrase+"=")[1]
                except:
                    pass

有没有更简洁的写法？

最后要注意的是，文件通常比 1 MB 小得多，速度不是优先事项。它不需要快速或高效......我想只是优雅。

Answer 1

您可以在您的文本中写下您想要搜索的所有特定正则表达式，然后将它们与 | 交替运算符（相当于 OR 运算符）组合起来：

import re

REGEXES = (
    '(-scrdir)="([/\w]+)"',
    '(PID)=\s*(\d+)',
)

dicti = dict(
    [z for z in w if z != '']  # filter all empty strings in matches
    for y in filter(lambda x: x, map(re.compile("|".join(REGEXES)).findall, log))  # get all matches in a row
    for w in y  # loop over all row matches
)

dicti 是：

{'-scrdir': '/scratch/user/', 'PID': '18941'}

即使您连续有多场比赛，它也能正常工作。例如，如果您有：

log = ["this is a line Initial",
   '/opt/user/program/pg.c01/l1.exe "/scratch/user/pg-18930.inp" -scrdir="/scratch/user/" Entering Link 1 = /opt/software/program/pg.c01/l1.exe PID=     18941.',
   "  ",
   "Copyright (c) 1950-2050, program, Inc.  All Rights Reserved."]

输出将是：

{'-scrdir': '/scratch/user/', 'PID': '18941'}

如果您要查找的文本分布在多行中，则它不起作用。

在文件中查找关键字，解析它们所在的行，return dict

find keywords in file, parse lines they are on, return dict

io

parsing

python-3.x