python 3 嵌套理解
python 3 nested comprehension
是否有一种聪明的list/dictionary理解方式来获得下面的预期输出给出以下内容:
import numpy as np
freq_mat = np.random.randint(2,size=(4,5));
tokens = ['a', 'b', 'c', 'd', 'e'];
labels = ['X', 'S', 'Y', 'S'];
freq_mat
的预期输出
array([[1, 0, 0, 1, 1],
[0, 0, 0, 0, 1],
[1, 0, 1, 1, 0],
[0, 1, 0, 0, 0]])
应该喜欢以下内容:
[({'a': True, 'b': False, 'c': False, 'd': True, 'e': True}, 'X'),
({'a': False, 'b': False, 'c': False, 'd': False, 'e': True}, 'S'),
({'a': True, 'b': False, 'c': True, 'd': True, 'e': False}, 'Y'),
({'a': False, 'b': True, 'c': False, 'd': False, 'e': False}, 'S')]
您可以将该代码折叠为:
代码:
featureset = [
({key: val > 0 for val in row for key in tokens}, label)
for row, label in zip(freq_mat, labels)]
测试代码:
freq_mat = np.random.randint(2, size=(4, 5));
tokens = ['a', 'b', 'c', 'd', 'e'];
labels = ['X', 'S', 'Y', 'S'];
featureset2 = []
for row, label in zip(freq_mat, labels):
d = dict()
for key in tokens:
for val in row:
d[key] = val > 0
featureset2.append((d, label))
featureset = [
({key: val > 0 for val in row for key in tokens}, label)
for row, label in zip(freq_mat, labels)]
assert featureset == featureset2
正如您在更新后的 post 中所指出的那样,您的原始代码无法正常工作:它为给定行中的每个键添加相同的值 - 所有 True
或所有 False
。对您的原始代码最简单的更正是这样的:
featureset = []
for row, label in zip(freq_mat, labels):
d = dict()
for key, val in zip(tokens, row): # The critical bit
d[key] = val>0
featureset.append((d,label))
一个更精简的版本,但我认为它仍然比单一理解方法更具可读性:
featureset = []
for row, label in zip(freq_mat, labels):
d = {key: val > 0 for key, val in zip(tokens, row)}
featureset.append((d, label))
或单线:
featureset = [({key:val>0 for key, val in zip(tokens, row)}, label)
for row, label in zip(freq_mat, labels)]
就我个人而言,我可能会选择第二种方法,即简洁性和可读性的折衷方案。但这当然取决于您!
是否有一种聪明的list/dictionary理解方式来获得下面的预期输出给出以下内容:
import numpy as np
freq_mat = np.random.randint(2,size=(4,5));
tokens = ['a', 'b', 'c', 'd', 'e'];
labels = ['X', 'S', 'Y', 'S'];
freq_mat
的预期输出array([[1, 0, 0, 1, 1],
[0, 0, 0, 0, 1],
[1, 0, 1, 1, 0],
[0, 1, 0, 0, 0]])
应该喜欢以下内容:
[({'a': True, 'b': False, 'c': False, 'd': True, 'e': True}, 'X'),
({'a': False, 'b': False, 'c': False, 'd': False, 'e': True}, 'S'),
({'a': True, 'b': False, 'c': True, 'd': True, 'e': False}, 'Y'),
({'a': False, 'b': True, 'c': False, 'd': False, 'e': False}, 'S')]
您可以将该代码折叠为:
代码:
featureset = [
({key: val > 0 for val in row for key in tokens}, label)
for row, label in zip(freq_mat, labels)]
测试代码:
freq_mat = np.random.randint(2, size=(4, 5));
tokens = ['a', 'b', 'c', 'd', 'e'];
labels = ['X', 'S', 'Y', 'S'];
featureset2 = []
for row, label in zip(freq_mat, labels):
d = dict()
for key in tokens:
for val in row:
d[key] = val > 0
featureset2.append((d, label))
featureset = [
({key: val > 0 for val in row for key in tokens}, label)
for row, label in zip(freq_mat, labels)]
assert featureset == featureset2
正如您在更新后的 post 中所指出的那样,您的原始代码无法正常工作:它为给定行中的每个键添加相同的值 - 所有 True
或所有 False
。对您的原始代码最简单的更正是这样的:
featureset = []
for row, label in zip(freq_mat, labels):
d = dict()
for key, val in zip(tokens, row): # The critical bit
d[key] = val>0
featureset.append((d,label))
一个更精简的版本,但我认为它仍然比单一理解方法更具可读性:
featureset = []
for row, label in zip(freq_mat, labels):
d = {key: val > 0 for key, val in zip(tokens, row)}
featureset.append((d, label))
或单线:
featureset = [({key:val>0 for key, val in zip(tokens, row)}, label)
for row, label in zip(freq_mat, labels)]
就我个人而言,我可能会选择第二种方法,即简洁性和可读性的折衷方案。但这当然取决于您!