添加单个字符以在 Counter 中添加键
Adding a single character to add keys in Counter
如果 Counter 对象的键的类型是 str
,即:
我可以这样做:
>>> vocab_counter = Counter("the lazy fox jumps over the brown dog".split())
>>> vocab_counter = Counter({k+u"\uE000":v for k,v in vocab_counter.items()})
>>> vocab_counter
Counter({'brown\ue000': 1,
'dog\ue000': 1,
'fox\ue000': 1,
'jumps\ue000': 1,
'lazy\ue000': 1,
'over\ue000': 1,
'the\ue000': 2})
什么是快速 and/or pythonic 方法来向所有键添加一个字符?
是不是只有上面的方法才能实现最终的计数器全部键都加字符?还有其他方法可以达到相同的目标吗?
更好的方法是在创建计数器对象之前添加该字符。您可以使用 Counter
:
中的生成器表达式来完成此操作
In [15]: vocab_counter = Counter(w + u"\uE000" for w in "the lazy fox jumps over the brown dog".split())
In [16]: vocab_counter
Out[16]: Counter({'the\ue000': 2, 'fox\ue000': 1, 'dog\ue000': 1, 'jumps\ue000': 1, 'lazy\ue000': 1, 'over\ue000': 1, 'brown\ue000': 1})
如果在创建计数器之前无法修改单词,您可以覆盖 Counter
对象以添加特殊字符 during setting the values for keys.
我能想到的唯一其他优化方法是使用 Counter
的子类,在插入键时附加字符:
from collections import Counter
class CustomCounter(Counter):
def __setitem__(self, key, value):
if len(key) > 1 and not key.endswith(u"\uE000"):
key += u"\uE000"
super(CustomCounter, self).__setitem__(key, self.get(key, 0) + value)
演示:
>>> CustomCounter("the lazy fox jumps over the brown dog".split())
CustomCounter({u'the\ue000': 2, u'fox\ue000': 1, u'brown\ue000': 1, u'jumps\ue000': 1, u'dog\ue000': 1, u'over\ue000': 1, u'lazy\ue000': 1})
# With both args and kwargs
>>> CustomCounter("the lazy fox jumps over the brown dog".split(), **{'the': 1, 'fox': 3})
CustomCounter({u'fox\ue000': 4, u'the\ue000': 3, u'brown\ue000': 1, u'jumps\ue000': 1, u'dog\ue000': 1, u'over\ue000': 1, u'lazy\ue000': 1})
我使用的最短路径是,
vocab_counter = Counter("the lazy fox jumps over the brown dog".split())
for key in vocab_counter.keys():
vocab_counter[key+u"\uE000"] = vocab_counter.pop(key)
您可以通过字符串操作来完成:
text = 'the lazy fox jumps over the brown dog'
Counter((text + ' ').replace(' ', '_abc ').strip().split())
如果 Counter 对象的键的类型是 str
,即:
我可以这样做:
>>> vocab_counter = Counter("the lazy fox jumps over the brown dog".split())
>>> vocab_counter = Counter({k+u"\uE000":v for k,v in vocab_counter.items()})
>>> vocab_counter
Counter({'brown\ue000': 1,
'dog\ue000': 1,
'fox\ue000': 1,
'jumps\ue000': 1,
'lazy\ue000': 1,
'over\ue000': 1,
'the\ue000': 2})
什么是快速 and/or pythonic 方法来向所有键添加一个字符?
是不是只有上面的方法才能实现最终的计数器全部键都加字符?还有其他方法可以达到相同的目标吗?
更好的方法是在创建计数器对象之前添加该字符。您可以使用 Counter
:
In [15]: vocab_counter = Counter(w + u"\uE000" for w in "the lazy fox jumps over the brown dog".split())
In [16]: vocab_counter
Out[16]: Counter({'the\ue000': 2, 'fox\ue000': 1, 'dog\ue000': 1, 'jumps\ue000': 1, 'lazy\ue000': 1, 'over\ue000': 1, 'brown\ue000': 1})
如果在创建计数器之前无法修改单词,您可以覆盖 Counter
对象以添加特殊字符 during setting the values for keys.
我能想到的唯一其他优化方法是使用 Counter
的子类,在插入键时附加字符:
from collections import Counter
class CustomCounter(Counter):
def __setitem__(self, key, value):
if len(key) > 1 and not key.endswith(u"\uE000"):
key += u"\uE000"
super(CustomCounter, self).__setitem__(key, self.get(key, 0) + value)
演示:
>>> CustomCounter("the lazy fox jumps over the brown dog".split())
CustomCounter({u'the\ue000': 2, u'fox\ue000': 1, u'brown\ue000': 1, u'jumps\ue000': 1, u'dog\ue000': 1, u'over\ue000': 1, u'lazy\ue000': 1})
# With both args and kwargs
>>> CustomCounter("the lazy fox jumps over the brown dog".split(), **{'the': 1, 'fox': 3})
CustomCounter({u'fox\ue000': 4, u'the\ue000': 3, u'brown\ue000': 1, u'jumps\ue000': 1, u'dog\ue000': 1, u'over\ue000': 1, u'lazy\ue000': 1})
我使用的最短路径是,
vocab_counter = Counter("the lazy fox jumps over the brown dog".split())
for key in vocab_counter.keys():
vocab_counter[key+u"\uE000"] = vocab_counter.pop(key)
您可以通过字符串操作来完成:
text = 'the lazy fox jumps over the brown dog'
Counter((text + ' ').replace(' ', '_abc ').strip().split())