如何遍历包含字典的列表?
How to traverse lists containing a dictionary?
我正在尝试遍历 JSON 数据并放入 Dataframe。
这里是用来导入数据的代码:
df = json_normalize(data['PatentBulkData'])
Dataframe的每个系列都是一个列表。每个列表包含一个字典列表,如下所示。
例如,这是我输入 df['prosecutionHistoryDataBag.prosecutionHistoryData'][i]
时 return 编辑的词典列表:
[{'eventCode': 'PG-ISSUE',
'eventDate': '2020-04-23',
'eventDescriptionText': 'PG-Pub Issue Notification'},
{'eventCode': 'RQPR',
'eventDate': '2020-01-02',
'eventDescriptionText': 'Request for Foreign Priority (Priority Papers May Be Included)'},
{'eventCode': 'M844',
'eventDate': '2020-01-03',
'eventDescriptionText': 'Information Disclosure Statement (IDS) Filed'},
{'eventCode': 'M844',
'eventDate': '2020-01-02',
'eventDescriptionText': 'Information Disclosure Statement (IDS) Filed'},
{'eventCode': 'COMP',
'eventDate': '2020-02-04',
'eventDescriptionText': 'Application Is Now Complete'}]
然后,df['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j]
会 return 字典:
{'eventCode': 'PG-ISSUE',
'eventDate': '2020-04-23',
'eventDescriptionText': 'PG-Pub Issue Notification'}
我想遍历 df['prosecutionHistoryDataBag.prosecutionHistoryData']
中的每个条目以识别包含 'eventDescriptionText'
中特定字符串的行。
在上面的例子中df['prosecutionHistoryDataBag.prosecutionHistoryData']
是一个系列,df['prosecutionHistoryDataBag.prosecutionHistoryData'][i]
是一个列表,['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j]
是一个字典。
我想首先遍历列表 - 并针对每个列表遍历字典以查看 'eventDescriptionText' 是否包含特定字符串。
谢谢!
尝试使用以下代码。
for lst in df['prosecutionHistoryDataBag.prosecutionHistoryData']:
for I in lst:
if I.get("eventDescriptionText").find(your_string) != -1:
# do something
pass
如果我理解正确的话
df['prosecutionHistoryDataBag.prosecutionHistoryData']
实际上是一个列表,其元素是字典列表。另见上文 。如果是这样,那么无聊的方式就是:
lst = df['prosecutionHistoryDataBag.prosecutionHistoryData']
for dicts in lst:
for d in dicts:
if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR':
code = d['eventCode']
date = d['eventDate']
# Do something with code and date.
现在,您可以 flatten that list of lists and use a generator:
lst = df['prosecutionHistoryDataBag.prosecutionHistoryData']
for d in (d for dicts in lst for d in dicts):
if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR':
code = d['eventCode']
date = d['eventDate']
# Do something with code and date.
接下来,将测试也压缩到 lists-flattening-generator 中,以降低代码的可读性:
lst = df['prosecutionHistoryDataBag.prosecutionHistoryData']
for code, date in ((d['eventCode'], d['eventDate']) for dicts in lst for d in dicts if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR'):
# Do something with code and date.
filter() 函数对这里的可读性帮助不大
for code, date in ((d['eventCode'], d['eventDate']) for d in filter(lambda d: d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR', (d for dicts in lst for d in dicts))):
# Do something with code and date.
但其他 itertools or more-itertools may be of use (e.g. the flatten() 函数)。
我正在尝试遍历 JSON 数据并放入 Dataframe。
这里是用来导入数据的代码:
df = json_normalize(data['PatentBulkData'])
Dataframe的每个系列都是一个列表。每个列表包含一个字典列表,如下所示。
例如,这是我输入 df['prosecutionHistoryDataBag.prosecutionHistoryData'][i]
时 return 编辑的词典列表:
[{'eventCode': 'PG-ISSUE',
'eventDate': '2020-04-23',
'eventDescriptionText': 'PG-Pub Issue Notification'},
{'eventCode': 'RQPR',
'eventDate': '2020-01-02',
'eventDescriptionText': 'Request for Foreign Priority (Priority Papers May Be Included)'},
{'eventCode': 'M844',
'eventDate': '2020-01-03',
'eventDescriptionText': 'Information Disclosure Statement (IDS) Filed'},
{'eventCode': 'M844',
'eventDate': '2020-01-02',
'eventDescriptionText': 'Information Disclosure Statement (IDS) Filed'},
{'eventCode': 'COMP',
'eventDate': '2020-02-04',
'eventDescriptionText': 'Application Is Now Complete'}]
然后,df['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j]
会 return 字典:
{'eventCode': 'PG-ISSUE',
'eventDate': '2020-04-23',
'eventDescriptionText': 'PG-Pub Issue Notification'}
我想遍历 df['prosecutionHistoryDataBag.prosecutionHistoryData']
中的每个条目以识别包含 'eventDescriptionText'
中特定字符串的行。
在上面的例子中df['prosecutionHistoryDataBag.prosecutionHistoryData']
是一个系列,df['prosecutionHistoryDataBag.prosecutionHistoryData'][i]
是一个列表,['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j]
是一个字典。
我想首先遍历列表 - 并针对每个列表遍历字典以查看 'eventDescriptionText' 是否包含特定字符串。
谢谢!
尝试使用以下代码。
for lst in df['prosecutionHistoryDataBag.prosecutionHistoryData']:
for I in lst:
if I.get("eventDescriptionText").find(your_string) != -1:
# do something
pass
如果我理解正确的话
df['prosecutionHistoryDataBag.prosecutionHistoryData']
实际上是一个列表,其元素是字典列表。另见上文
lst = df['prosecutionHistoryDataBag.prosecutionHistoryData']
for dicts in lst:
for d in dicts:
if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR':
code = d['eventCode']
date = d['eventDate']
# Do something with code and date.
现在,您可以 flatten that list of lists and use a generator:
lst = df['prosecutionHistoryDataBag.prosecutionHistoryData']
for d in (d for dicts in lst for d in dicts):
if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR':
code = d['eventCode']
date = d['eventDate']
# Do something with code and date.
接下来,将测试也压缩到 lists-flattening-generator 中,以降低代码的可读性:
lst = df['prosecutionHistoryDataBag.prosecutionHistoryData']
for code, date in ((d['eventCode'], d['eventDate']) for dicts in lst for d in dicts if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR'):
# Do something with code and date.
filter() 函数对这里的可读性帮助不大
for code, date in ((d['eventCode'], d['eventDate']) for d in filter(lambda d: d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR', (d for dicts in lst for d in dicts))):
# Do something with code and date.
但其他 itertools or more-itertools may be of use (e.g. the flatten() 函数)。