如何搜索以命名元组为键的字典
How to search a dictionary with named tuple as key
我想要一个数据结构有 {(Document_name, term): (term count within document)} 所以我用 namedtuple 创建了字典 as:
Doc_term = namedtuple("Doc_term", ["Doc", "term"])
Doc_term_count = {}
...
Doc_term_count[k] = {Doc_term(Doc_names[start_index + i], vocab[j]): row[j]}
k = k + 1
print Doc_term_count
它给我的数据结构为
{0: {Doc_term(Doc='book1.txt', term='be'): 1},
1: {Doc_term(Doc='book1.txt', term='script'): 1},
2: {Doc_term(Doc='book1.txt', term='this'): 1},
3: {Doc_term(Doc='book1.txt', term='is'): 1},
4: {Doc_term(Doc='book1.txt', term='there'): 1},
5: {Doc_term(Doc='book1.txt', term='wordcount'): 1},
6: {Doc_term(Doc='book2.txt', term='hello'): 2},
7: {Doc_term(Doc='book2.txt', term='to'): 1},
8: {Doc_term(Doc='book2.txt', term='book'): 1},
9: {Doc_term(Doc='book3.txt', term='read'): 1},
10: {Doc_term(Doc='book3.txt', term='by'): 1},
11: {Doc_term(Doc='book3.txt', term='first'): 1}}
我想在多少文档中搜索给定术语,其 filter/search 功能类似于:
Dtn = filter( lambda ndoc: Doc_term.term=='be', Doc_term_count)
print Dtn
它给我空数组。请建议我哪里出错了。根据我的理解,我正在创建索引数组和过滤器 lambda 函数期待列表但是当我尝试
Doc_term_count[(booknames[start_index + i], vocab[j])].append(row[j])
它给我的错误是:KeyError: ('book1.txt', 'be')。我认为它不接受元组作为键。
我认为您生成的 Doc_term_count
不正确 - 您只是希望将 namedtuple 映射到计数。无需深入了解您是如何计算 Doc_names 和行索引的,我认为您正在尝试做的是:
Doc_term_count[Doc_term(Doc_names[start_index + i], vocab[j])] = row[j]
而不是
Doc_term_count[k] = {Doc_term(Doc_names[start_index + i], vocab[j]): row[j]}
第一种方法应该生成如下所示的字典:
Doc_term_count = {
Doc_term(Doc='book1.txt', term='be'): 1,
Doc_term(Doc='book1.txt', term='script'): 1,
Doc_term(Doc='book1.txt', term='this'): 1,
Doc_term(Doc='book1.txt', term='is'): 1,
Doc_term(Doc='book1.txt', term='there'): 1,
Doc_term(Doc='book1.txt', term='wordcount'): 1,
Doc_term(Doc='book2.txt', term='hello'): 2,
Doc_term(Doc='book2.txt', term='to'): 1,
Doc_term(Doc='book2.txt', term='book'): 1,
Doc_term(Doc='book3.txt', term='read'): 1,
Doc_term(Doc='book3.txt', term='by'): 1,
Doc_term(Doc='book3.txt', term='first'): 1
}
然后您可以使用元组查找值:
print Doc_term_count[('book1.txt', 'be')] # prints 1
我想要一个数据结构有 {(Document_name, term): (term count within document)} 所以我用 namedtuple 创建了字典 as:
Doc_term = namedtuple("Doc_term", ["Doc", "term"])
Doc_term_count = {}
...
Doc_term_count[k] = {Doc_term(Doc_names[start_index + i], vocab[j]): row[j]}
k = k + 1
print Doc_term_count
它给我的数据结构为
{0: {Doc_term(Doc='book1.txt', term='be'): 1},
1: {Doc_term(Doc='book1.txt', term='script'): 1},
2: {Doc_term(Doc='book1.txt', term='this'): 1},
3: {Doc_term(Doc='book1.txt', term='is'): 1},
4: {Doc_term(Doc='book1.txt', term='there'): 1},
5: {Doc_term(Doc='book1.txt', term='wordcount'): 1},
6: {Doc_term(Doc='book2.txt', term='hello'): 2},
7: {Doc_term(Doc='book2.txt', term='to'): 1},
8: {Doc_term(Doc='book2.txt', term='book'): 1},
9: {Doc_term(Doc='book3.txt', term='read'): 1},
10: {Doc_term(Doc='book3.txt', term='by'): 1},
11: {Doc_term(Doc='book3.txt', term='first'): 1}}
我想在多少文档中搜索给定术语,其 filter/search 功能类似于:
Dtn = filter( lambda ndoc: Doc_term.term=='be', Doc_term_count)
print Dtn
它给我空数组。请建议我哪里出错了。根据我的理解,我正在创建索引数组和过滤器 lambda 函数期待列表但是当我尝试
Doc_term_count[(booknames[start_index + i], vocab[j])].append(row[j])
它给我的错误是:KeyError: ('book1.txt', 'be')。我认为它不接受元组作为键。
我认为您生成的 Doc_term_count
不正确 - 您只是希望将 namedtuple 映射到计数。无需深入了解您是如何计算 Doc_names 和行索引的,我认为您正在尝试做的是:
Doc_term_count[Doc_term(Doc_names[start_index + i], vocab[j])] = row[j]
而不是
Doc_term_count[k] = {Doc_term(Doc_names[start_index + i], vocab[j]): row[j]}
第一种方法应该生成如下所示的字典:
Doc_term_count = {
Doc_term(Doc='book1.txt', term='be'): 1,
Doc_term(Doc='book1.txt', term='script'): 1,
Doc_term(Doc='book1.txt', term='this'): 1,
Doc_term(Doc='book1.txt', term='is'): 1,
Doc_term(Doc='book1.txt', term='there'): 1,
Doc_term(Doc='book1.txt', term='wordcount'): 1,
Doc_term(Doc='book2.txt', term='hello'): 2,
Doc_term(Doc='book2.txt', term='to'): 1,
Doc_term(Doc='book2.txt', term='book'): 1,
Doc_term(Doc='book3.txt', term='read'): 1,
Doc_term(Doc='book3.txt', term='by'): 1,
Doc_term(Doc='book3.txt', term='first'): 1
}
然后您可以使用元组查找值:
print Doc_term_count[('book1.txt', 'be')] # prints 1