在文件中查找关键字,解析它们所在的行,return dict

find keywords in file, parse lines they are on, return dict

我需要在文件中查找信息。该文件有很多行,但我要查找的是这样的

 Initial command:
 /opt/user/program/pg.c01/l1.exe "/scratch/user/pg-18930.inp" -scrdir="/scratch/user/"
 Entering Link 1 = /opt/software/program/pg.c01/l1.exe PID=     18941.
  
 Copyright (c) 1950-2050, program, Inc.  All Rights Reserved.

我需要为 -scrdir="/scratch/user/"PID= 18941 解析文件。

我想要return这样的字典

dict = {"-scrdir=":"/scratch/user/", "PID":18941}

这应该是通用的,因为我想传递一组要搜索的东西,即 -scrdir=, and/or PID and/or other, and, get returned 文件中那些关键字(如果存在)之后的任何内容。

到目前为止,我所做的似乎有效,但逻辑语句似乎很重 作为 MWE,我将信息存储在列表而不是文件中,并且具有以下内容

log = ["this is a line Initial",
   '/opt/user/program/pg.c01/l1.exe "/scratch/user/pg-18930.inp" -scrdir="/scratch/user/"',
   "Entering Link 1 = /opt/software/program/pg.c01/l1.exe PID=     18941.",
   "  ",
   "Copyright (c) 1950-2050, program, Inc.  All Rights Reserved."]
dicti = {}
phrases = ["-scrdir", "PID"]
# with open(file, 'r') as log:# would use in real situation
    for line in log:
        if any(word in line for word in phrases):
            for phrase in phrases:
                try:
                    dicti[phrase]=line.split(phrase+"=")[1]
                except:
                    pass

有没有更简洁的写法?

最后要注意的是,文件通常比 1 MB 小得多,速度不是优先事项。它不需要快速或高效......我想只是优雅。

您可以在您的文本中写下您想要搜索的所有特定正则表达式,然后将它们与 | 交替运算符(相当于 OR 运算符)组合起来:

import re

REGEXES = (
    '(-scrdir)="([/\w]+)"',
    '(PID)=\s*(\d+)',
)

dicti = dict(
    [z for z in w if z != '']  # filter all empty strings in matches
    for y in filter(lambda x: x, map(re.compile("|".join(REGEXES)).findall, log))  # get all matches in a row
    for w in y  # loop over all row matches
)

dicti 是:

{'-scrdir': '/scratch/user/', 'PID': '18941'}

即使您连续有多场比赛,它也能正常工作。例如,如果您有:

log = ["this is a line Initial",
   '/opt/user/program/pg.c01/l1.exe "/scratch/user/pg-18930.inp" -scrdir="/scratch/user/" Entering Link 1 = /opt/software/program/pg.c01/l1.exe PID=     18941.',
   "  ",
   "Copyright (c) 1950-2050, program, Inc.  All Rights Reserved."]

输出将是:

{'-scrdir': '/scratch/user/', 'PID': '18941'}

如果您要查找的文本分布在多行中,则它不起作用。