Python 遍历 CSV 的每一行以从列表中查找部分字符串匹配项
Python iterating through each row of CSV to find partial string matches from list
我正在尝试做一些看似简单的事情,但却给我带来无穷无尽的麻烦。
我想做的事情:
1 for i in nameList
2 Iterate through each row of aggregatedCSV
3 If i is a partial match in current row, append that entire row to a new name-specific CSV
(repeat steps 2 and 3 for remaining i in nameList)
nameList = ['Jon', 'Bob', 'Tim']
aggregatedCSV = [
[1, '3', 'Bob85'],
[2, 'Jon52', '8'],
['Bob1', '14', 3],
['Tim95', 8, '6'],
['8', 11, 'Tim48'],
[10, 'Jon11', '44'],
[26, '21', 'Jon90'],
[99, '23', 'Bob19'],
[7, '24', 'Tim82']
]
所需的输出最终将是三个新的 CSV 文件,但是,为了在这里简单起见,我试图获得如下内容:
JonList = [[2, 'Jon52', '8'], [10, 'Jon11', '44'],[26, '21', 'Jon90']]
BobList = [[1, '3', 'Bob85'], ['Bob1', '14', 3], [99, '23', 'Bob19']]
TimList = [['Tim95', 8, '6'], ['8', 11, 'Tim48'], [7, '24', 'Tim82']]
尽管我已为此示例手动创建 nameList
,但我将从 csv 文件中读取,这些文件的行数未知,每行的值数未知。
感谢任何帮助。
我不知道 python 所以肯定有更快、更有效的方法,但这就是我想出的方法:
from collections import defaultdict
nameSpecificData = defaultdict(list)
for name in nameList:
for row in aggregatedCSV:
for item in row:
if name in str(item):
nameSpecificData[name].append(row)
这会将结果存储在以名称为关键字的字典中,这样您就无需知道名称列表中的内容即可生成输出变量:
当您输入 运行 时,结果为:
{
'Jon': [[2, 'Jon52', '8'], [10, 'Jon11', '44'], [26, '21', 'Jon90']],
'Bob': [[1, '3', 'Bob85'], ['Bob1', '14', 3], [99, '23', 'Bob19']],
'Tim': [['Tim95', 8, '6'], ['8', 11, 'Tim48'], [7, '24', 'Tim82']]
}
如果您真的非常想创建单独的名称特定变量,那么这会起作用:
JonList = []
BobList = []
TimList = []
for name in nameList:
for row in aggregatedCSV:
for item in row:
if name in str(item):
globals()[name+'List'].append(row)
它会产生您想要的输出:
>>> print(JonList)
[[2, 'Jon52', '8'], [10, 'Jon11', '44'], [26, '21', 'Jon90']]
>>> print(BobList)
[[1, '3', 'Bob85'], ['Bob1', '14', 3], [99, '23', 'Bob19']]
>>> print(TimList)
[['Tim95', 8, '6'], ['8', 11, 'Tim48'], [7, '24', 'Tim82']]
我正在尝试做一些看似简单的事情,但却给我带来无穷无尽的麻烦。
我想做的事情:
1 for i in nameList
2 Iterate through each row of aggregatedCSV
3 If i is a partial match in current row, append that entire row to a new name-specific CSV
(repeat steps 2 and 3 for remaining i in nameList)
nameList = ['Jon', 'Bob', 'Tim']
aggregatedCSV = [
[1, '3', 'Bob85'],
[2, 'Jon52', '8'],
['Bob1', '14', 3],
['Tim95', 8, '6'],
['8', 11, 'Tim48'],
[10, 'Jon11', '44'],
[26, '21', 'Jon90'],
[99, '23', 'Bob19'],
[7, '24', 'Tim82']
]
所需的输出最终将是三个新的 CSV 文件,但是,为了在这里简单起见,我试图获得如下内容:
JonList = [[2, 'Jon52', '8'], [10, 'Jon11', '44'],[26, '21', 'Jon90']]
BobList = [[1, '3', 'Bob85'], ['Bob1', '14', 3], [99, '23', 'Bob19']]
TimList = [['Tim95', 8, '6'], ['8', 11, 'Tim48'], [7, '24', 'Tim82']]
尽管我已为此示例手动创建 nameList
,但我将从 csv 文件中读取,这些文件的行数未知,每行的值数未知。
感谢任何帮助。
我不知道 python 所以肯定有更快、更有效的方法,但这就是我想出的方法:
from collections import defaultdict
nameSpecificData = defaultdict(list)
for name in nameList:
for row in aggregatedCSV:
for item in row:
if name in str(item):
nameSpecificData[name].append(row)
这会将结果存储在以名称为关键字的字典中,这样您就无需知道名称列表中的内容即可生成输出变量:
当您输入 运行 时,结果为:
{
'Jon': [[2, 'Jon52', '8'], [10, 'Jon11', '44'], [26, '21', 'Jon90']],
'Bob': [[1, '3', 'Bob85'], ['Bob1', '14', 3], [99, '23', 'Bob19']],
'Tim': [['Tim95', 8, '6'], ['8', 11, 'Tim48'], [7, '24', 'Tim82']]
}
如果您真的非常想创建单独的名称特定变量,那么这会起作用:
JonList = []
BobList = []
TimList = []
for name in nameList:
for row in aggregatedCSV:
for item in row:
if name in str(item):
globals()[name+'List'].append(row)
它会产生您想要的输出:
>>> print(JonList)
[[2, 'Jon52', '8'], [10, 'Jon11', '44'], [26, '21', 'Jon90']]
>>> print(BobList)
[[1, '3', 'Bob85'], ['Bob1', '14', 3], [99, '23', 'Bob19']]
>>> print(TimList)
[['Tim95', 8, '6'], ['8', 11, 'Tim48'], [7, '24', 'Tim82']]