从数据框中的参数字符串中定位数据

Question

我有一个很大的 csv 文件（大约 2000 个条目），其中包含一个文件列表（第 0 列），这些文件由几个参数（其余列）描述，看起来像这样（第一列只是为了便于阅读） , 它没有明确包含在 csv 文件中):

(i) Filename; File extension; Month created; Year created; Author; Notes;
0   file1; txt; 07; 2015; AB; NaN;
1   file2; txt; 07; 2015; AB; NaN;
2   file2b; txt; 07; 2015; AB; some notes;
3   file3; txt; 06; 2013; CD; some text;
4   file4; txt; 06; 2012; EF; other text;
5   file5; txt; 05; 2011; EF; NaN;
...

我已经用 pandas.read_csv() 将整个文件读入数据框（称为 files_df）。我现在想做的是检索所有符合特定条件的文件。例如。获取作者 AB 于 2015 年 7 月创建且没有任何注释的所有文件应该找到匹配行 0 + 1，但不匹配所有其他行。

我已经可以使用

检索文件

files_df.loc[(files_df['Month created'] == '07') &
             (files_df['Year created'] == '2015') &
             (files_df['Author'] == 'AB') &
             (files_df['Notes'].isnull())]

但是如何在python中自动填写字符串呢？我在 dictionary 类型的变量中存储了一堆用于过滤的键和值组合。但是我想不出一种自动填充字符串的方法。谁能指出我正确的方向？

（我用 Python 的工作不多，字典只是我想到的第一种类型，如果其他类型更适合这个，我就不必使用它们。）

[编辑澄清：]

典型的输入如下所示：

parameters = {'Month created': {'07'},
              'Year created': {'2015'},
              'Author': {'AB'},
              'Notes': {}}

我想做的是，写这样的东西：

def read_files(parameters):
    files = files_df.loc[
           # how to fill parameter keys & values here???
           ]
    return files

Answer 1

经过一段时间的尝试，我找到了这个解决方案。它看起来像一个不漂亮的 hack，但是......

def read_files(files_df, parameters):
    idx = []
    for key in parameters.keys():
        if len(idx) == 0:
            idx = (files_df[key] == parameters[key])
        else:
            idx = idx & (files_df[key] == parameters[key])
    idx = idx & files_df['Notes'].isnull()
    files = files_df.loc[idx]

    return files

从数据框中的参数字符串中定位数据

Locate data from parameter-string in dataframe

syntax

python-3.x

pandas