在 json 中找到正确匹配的递归函数
recursive function to find correct match in json
我有一个循环遍历 search_id 的网络抓取应用程序,它会不时发现具有不同关键字段的重复搜索,称为 tree_id。我正在努力弄清楚如何使用递归函数找到正确的匹配项。在大多数情况下,json 中会有两到三个 tree_id,它需要能够从不同格式的搜索中选择正确的匹配项。
下面是一些带有我的评论的示例代码,它将突出显示问题:
#original json from the web scraping application for a single example
json = {'status': 'multiple',
'searchResult': None,
'spellingResult': None,
'relatedTree': {'paths': [
{'treeid': 'C0.A.01', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'},
{'treeid': 'C0.A.01.A', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS|STOMATOLOGICAL PREPARATIONS'}
]},
'tableResult': None, 'synResult': None}
trees = json['relatedTree']['paths']#.replace(".","") this will cause an error because you can't use replace in a list
tree_id0 = json['relatedTree']['paths'][0]['treeid'].replace(".","") #replaces the string in treeid index position 0 to remove all periods.
print(tree_id0)
tree_id1 = json['relatedTree']['paths'][1]['treeid'].replace(".","") #replaces the string in treeid index position 1 to remove all periods.
print(tree_id1)
search = 'A01A' # example would be to search 'A01A' and then also 'A01' and have it pick the correct substring
search1 = 'A01'
if tree_id0.find(search) != -1: # correct while using 'A01' and works with 'A01A'.
print("Found!")
else:
print("Not found!")
if tree_id1.find(search) != -1: # incorrect while using 'A01' but works with 'A01A'. I need it to find the exact string and nothing to the right of the last letter of search
print("Found!")
else:
print("Not found!")
# my attempt at a recursive function to solve the problem, but I get sting indices must be integers and in it's current form I'm not sure if i'm going about the problem the wrong way.
def search_multi(trees: list, search: str) -> dict:
for tree in trees:
if tree['treeid'].replace(".","") == search:
print(tree['treeid'].replace(".",""))
return tree
if tree['treeid'].replace(".",""):
response = search_multi(tree['treeid'].replace(".",""), search)
if response:
return response
searched_multis = search_multi(trees, search)
print(searched_multis)
如果搜索是 'A01A' 我想要的结果会选择 tree_id C0.A.01.A 如果搜索是 'A01' 它会选择json.
中的 tree_id C0.A.01
if, else 语句将显示它应该如何工作,但它不会给出 A01 的正确结果,因为它查看了最后一个字母。
这是一种方法。 returns 搜索“treeid”的字典:
def get_id(d, search):
if isinstance(d, dict):
for k,v in d.items():
if k == 'treeid' and ''.join(v.split('.')[1:]) == search:
yield d
else:
yield from get_id(v, search)
elif isinstance(d, list):
for i in d:
yield from get_id(i, search)
out = next(get_id(json, 'A01'))
输出:
{'treeid': 'C0.A.01',
'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'}
我有一个循环遍历 search_id 的网络抓取应用程序,它会不时发现具有不同关键字段的重复搜索,称为 tree_id。我正在努力弄清楚如何使用递归函数找到正确的匹配项。在大多数情况下,json 中会有两到三个 tree_id,它需要能够从不同格式的搜索中选择正确的匹配项。
下面是一些带有我的评论的示例代码,它将突出显示问题:
#original json from the web scraping application for a single example
json = {'status': 'multiple',
'searchResult': None,
'spellingResult': None,
'relatedTree': {'paths': [
{'treeid': 'C0.A.01', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'},
{'treeid': 'C0.A.01.A', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS|STOMATOLOGICAL PREPARATIONS'}
]},
'tableResult': None, 'synResult': None}
trees = json['relatedTree']['paths']#.replace(".","") this will cause an error because you can't use replace in a list
tree_id0 = json['relatedTree']['paths'][0]['treeid'].replace(".","") #replaces the string in treeid index position 0 to remove all periods.
print(tree_id0)
tree_id1 = json['relatedTree']['paths'][1]['treeid'].replace(".","") #replaces the string in treeid index position 1 to remove all periods.
print(tree_id1)
search = 'A01A' # example would be to search 'A01A' and then also 'A01' and have it pick the correct substring
search1 = 'A01'
if tree_id0.find(search) != -1: # correct while using 'A01' and works with 'A01A'.
print("Found!")
else:
print("Not found!")
if tree_id1.find(search) != -1: # incorrect while using 'A01' but works with 'A01A'. I need it to find the exact string and nothing to the right of the last letter of search
print("Found!")
else:
print("Not found!")
# my attempt at a recursive function to solve the problem, but I get sting indices must be integers and in it's current form I'm not sure if i'm going about the problem the wrong way.
def search_multi(trees: list, search: str) -> dict:
for tree in trees:
if tree['treeid'].replace(".","") == search:
print(tree['treeid'].replace(".",""))
return tree
if tree['treeid'].replace(".",""):
response = search_multi(tree['treeid'].replace(".",""), search)
if response:
return response
searched_multis = search_multi(trees, search)
print(searched_multis)
如果搜索是 'A01A' 我想要的结果会选择 tree_id C0.A.01.A 如果搜索是 'A01' 它会选择json.
中的 tree_id C0.A.01if, else 语句将显示它应该如何工作,但它不会给出 A01 的正确结果,因为它查看了最后一个字母。
这是一种方法。 returns 搜索“treeid”的字典:
def get_id(d, search):
if isinstance(d, dict):
for k,v in d.items():
if k == 'treeid' and ''.join(v.split('.')[1:]) == search:
yield d
else:
yield from get_id(v, search)
elif isinstance(d, list):
for i in d:
yield from get_id(i, search)
out = next(get_id(json, 'A01'))
输出:
{'treeid': 'C0.A.01',
'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'}