在 json 中找到正确匹配的递归函数

recursive function to find correct match in json

我有一个循环遍历 search_id 的网络抓取应用程序,它会不时发现具有不同关键字段的重复搜索,称为 tree_id。我正在努力弄清楚如何使用递归函数找到正确的匹配项。在大多数情况下,json 中会有两到三个 tree_id,它需要能够从不同格式的搜索中选择正确的匹配项。

下面是一些带有我的评论的示例代码,它将突出显示问题:

#original json from the web scraping application for a single example
json = {'status': 'multiple', 
        'searchResult': None, 
        'spellingResult': None, 
        'relatedTree': {'paths': [
            {'treeid': 'C0.A.01', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'}, 
            {'treeid': 'C0.A.01.A', 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS|STOMATOLOGICAL PREPARATIONS'}
        ]}, 
        'tableResult': None, 'synResult': None}

trees = json['relatedTree']['paths']#.replace(".","")  this will cause an error because you can't use replace in a list
tree_id0 = json['relatedTree']['paths'][0]['treeid'].replace(".","")  #replaces the string in treeid index position 0 to remove all periods.
print(tree_id0)  
tree_id1 = json['relatedTree']['paths'][1]['treeid'].replace(".","") #replaces the string in treeid index position 1 to remove all periods.
print(tree_id1)


search = 'A01A'        # example would be to search 'A01A' and then also 'A01' and have it pick the correct substring
search1 = 'A01'
if tree_id0.find(search) != -1:  # correct while using 'A01' and works with 'A01A'.  
    print("Found!")
else:
    print("Not found!")

if tree_id1.find(search) != -1:  # incorrect while using 'A01' but works with 'A01A'.  I need it to find the exact string and nothing to the right of the last letter of search
    print("Found!")
else:
    print("Not found!")


# my attempt at a recursive function to solve the problem, but I get sting indices must be integers and in it's current form I'm not sure if i'm going about the problem the wrong way.   

def search_multi(trees: list, search: str) -> dict:
    for tree in trees:
        if tree['treeid'].replace(".","") == search:
            print(tree['treeid'].replace(".",""))
            return tree
        if tree['treeid'].replace(".",""):
            response = search_multi(tree['treeid'].replace(".",""), search)
            if response:
                return response

searched_multis = search_multi(trees, search)
print(searched_multis)

如果搜索是 'A01A' 我想要的结果会选择 tree_id C0.A.01.A 如果搜索是 'A01' 它会选择json.

中的 tree_id C0.A.01

if, else 语句将显示它应该如何工作,但它不会给出 A01 的正确结果,因为它查看了最后一个字母。

这是一种方法。 returns 搜索“treeid”的字典:

def get_id(d, search):
    if isinstance(d, dict):
        for k,v in d.items():
            if k == 'treeid' and ''.join(v.split('.')[1:]) == search:
                yield d
            else:
                yield from get_id(v, search)
    elif isinstance(d, list):
        for i in d:
            yield from get_id(i, search)


out = next(get_id(json, 'A01'))

输出:

{'treeid': 'C0.A.01',
 'path': 'ALIMENTARY TRACT AND METABOLISM|STOMATOLOGICAL PREPARATIONS'}