列表元素的计数器
List elements’ counter
这里是 Python 的新手。
我正在寻找一种创建列表(输出)的简单方法,其中 returns 另一个 objective 列表 (MyList) 的元素计数,同时保留索引(?)。
这是我想要得到的:
MyList = ["a", "b", "c", "c", "a", "c"]
Output = [ 2 , 1 , 3 , 3 , 2 , 3 ]
我找到了类似问题的解决方案。计算列表中每个元素出现的次数。
In : Counter(MyList)
Out : Counter({'a': 2, 'b': 1, 'c': 3})
然而,这 returns 一个不保留索引的 Counter 对象。
我假设给定计数器中的键我可以构建我想要的输出,但是我不确定如何继续。
额外信息,我在脚本中导入了 pandas,而 MyList 实际上是 pandas 数据框中的一列。
您可以使用 list.count
method, which will count the amount of times each string takes place in MyList
. You can generate a new list with the counts by using a list comprehension:
MyList = ["a", "b", "c", "c", "a", "c"]
[MyList.count(i) for i in MyList]
# [2, 1, 3, 3, 2, 3]
你只需要实现下面的一段代码
c=Counter(MyList)
lout=[c[i] for i in MyList]
现在列出 lout 是您想要的输出
您可以使用函数 itemgetter
:
而不是另一个解决方案中的 listcomp
from collections import Counter
from operator import itemgetter
lst = ["a", "b", "c", "c", "a", "c"]
c = Counter(lst)
itemgetter(*lst)(c)
# (2, 1, 3, 3, 2, 3)
更新:正如@ALollz 在评论中提到的,这个解决方案似乎是最快的解决方案。如果 OP 需要列表而不是元组,则必须将结果转换为 list
.
这是 hettinger 的经典片段之一:)
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
'Counter that remembers the order elements are first seen'
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__,
OrderedDict(self))
def __reduce__(self):
return self.__class__, (OrderedDict(self),)
x = ["a", "b", "c", "c", "a", "c"]
oc = OrderedCounter(x)
>>> oc
OrderedCounter(OrderedDict([('a', 2), ('b', 1), ('c', 3)]))
>>> oc['a']
2
一个 pandas 解决方案如下所示:
df = pd.DataFrame(data=["a", "b", "c", "c", "a", "c"], columns=['MyList'])
df['Count'] = df.groupby('MyList')['MyList'].transform(len)
编辑:如果这是您唯一想做的事情,则不应使用 pandas。我只是因为 pandas 标签才回答了这个问题。
性能取决于组数:
MyList = np.random.randint(1, 10, 10000).tolist()
df = pd.DataFrame(MyList)
%timeit [MyList.count(i) for i in MyList]
# 1.32 s ± 15.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.groupby(0)[0].transform(len)
# 3.89 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
MyList = np.random.randint(1, 9000, 10000).tolist()
df = pd.DataFrame(MyList)
%timeit [MyList.count(i) for i in MyList]
# 1.36 s ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.groupby(0)[0].transform(len)
# 1.33 s ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
使用 np.unique
创建值计数字典并映射值。这会很快,但不如计数器方法快:
import numpy as np
list(map(dict(zip(*np.unique(MyList, return_counts=True))).get, MyList))
#[2, 1, 3, 3, 2, 3]
中等大小列表的一些时间安排:
MyList = np.random.randint(1, 2000, 5000).tolist()
%timeit [MyList.count(i) for i in MyList]
#413 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit list(map(dict(zip(*np.unique(MyList, return_counts=True))).get, MyList))
#1.89 ms ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit pd.DataFrame(MyList).groupby(MyList).transform(len)[0].tolist()
#2.18 s ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
c=Counter(MyList)
%timeit lout=[c[i] for i in MyList]
#679 µs ± 2.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
c = Counter(MyList)
%timeit list(itemgetter(*MyList)(c))
#503 µs ± 162 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
更大的列表:
MyList = np.random.randint(1, 2000, 50000).tolist()
%timeit [MyList.count(i) for i in MyList]
#41.2 s ± 5.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit list(map(dict(zip(*np.unique(MyList, return_counts=True))).get, MyList))
#18 ms ± 56.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.DataFrame(MyList).groupby(MyList).transform(len)[0].tolist()
#2.44 s ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
c=Counter(MyList)
%timeit lout=[c[i] for i in MyList]
#6.89 ms ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
c = Counter(MyList)
%timeit list(itemgetter(*MyList)(c))
#5.27 ms ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
请注意,@Gio 指出列表是 pandas 系列对象。在这种情况下,您可以将 Series 对象转换为列表:
import pandas as pd
l = ["a", "b", "c", "c", "a", "c"]
ds = pd.Series(l)
l=ds.tolist()
[l.count(i) for i in ds]
# [2, 1, 3, 3, 2, 3]
但是,一旦你有了系列,你就可以通过 value_counts
来计算元素了。
l = ["a", "b", "c", "c", "a", "c"]
s = pd.Series(l) #Series object
c=s.value_counts() #c is Series again
[c[i] for i in s]
# [2, 1, 3, 3, 2, 3]
这里是 Python 的新手。
我正在寻找一种创建列表(输出)的简单方法,其中 returns 另一个 objective 列表 (MyList) 的元素计数,同时保留索引(?)。
这是我想要得到的:
MyList = ["a", "b", "c", "c", "a", "c"]
Output = [ 2 , 1 , 3 , 3 , 2 , 3 ]
我找到了类似问题的解决方案。计算列表中每个元素出现的次数。
In : Counter(MyList)
Out : Counter({'a': 2, 'b': 1, 'c': 3})
然而,这 returns 一个不保留索引的 Counter 对象。
我假设给定计数器中的键我可以构建我想要的输出,但是我不确定如何继续。
额外信息,我在脚本中导入了 pandas,而 MyList 实际上是 pandas 数据框中的一列。
您可以使用 list.count
method, which will count the amount of times each string takes place in MyList
. You can generate a new list with the counts by using a list comprehension:
MyList = ["a", "b", "c", "c", "a", "c"]
[MyList.count(i) for i in MyList]
# [2, 1, 3, 3, 2, 3]
你只需要实现下面的一段代码
c=Counter(MyList)
lout=[c[i] for i in MyList]
现在列出 lout 是您想要的输出
您可以使用函数 itemgetter
:
from collections import Counter
from operator import itemgetter
lst = ["a", "b", "c", "c", "a", "c"]
c = Counter(lst)
itemgetter(*lst)(c)
# (2, 1, 3, 3, 2, 3)
更新:正如@ALollz 在评论中提到的,这个解决方案似乎是最快的解决方案。如果 OP 需要列表而不是元组,则必须将结果转换为 list
.
这是 hettinger 的经典片段之一:)
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
'Counter that remembers the order elements are first seen'
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__,
OrderedDict(self))
def __reduce__(self):
return self.__class__, (OrderedDict(self),)
x = ["a", "b", "c", "c", "a", "c"]
oc = OrderedCounter(x)
>>> oc
OrderedCounter(OrderedDict([('a', 2), ('b', 1), ('c', 3)]))
>>> oc['a']
2
一个 pandas 解决方案如下所示:
df = pd.DataFrame(data=["a", "b", "c", "c", "a", "c"], columns=['MyList'])
df['Count'] = df.groupby('MyList')['MyList'].transform(len)
编辑:如果这是您唯一想做的事情,则不应使用 pandas。我只是因为 pandas 标签才回答了这个问题。
性能取决于组数:
MyList = np.random.randint(1, 10, 10000).tolist()
df = pd.DataFrame(MyList)
%timeit [MyList.count(i) for i in MyList]
# 1.32 s ± 15.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.groupby(0)[0].transform(len)
# 3.89 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
MyList = np.random.randint(1, 9000, 10000).tolist()
df = pd.DataFrame(MyList)
%timeit [MyList.count(i) for i in MyList]
# 1.36 s ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.groupby(0)[0].transform(len)
# 1.33 s ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
使用 np.unique
创建值计数字典并映射值。这会很快,但不如计数器方法快:
import numpy as np
list(map(dict(zip(*np.unique(MyList, return_counts=True))).get, MyList))
#[2, 1, 3, 3, 2, 3]
中等大小列表的一些时间安排:
MyList = np.random.randint(1, 2000, 5000).tolist()
%timeit [MyList.count(i) for i in MyList]
#413 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit list(map(dict(zip(*np.unique(MyList, return_counts=True))).get, MyList))
#1.89 ms ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit pd.DataFrame(MyList).groupby(MyList).transform(len)[0].tolist()
#2.18 s ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
c=Counter(MyList)
%timeit lout=[c[i] for i in MyList]
#679 µs ± 2.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
c = Counter(MyList)
%timeit list(itemgetter(*MyList)(c))
#503 µs ± 162 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
更大的列表:
MyList = np.random.randint(1, 2000, 50000).tolist()
%timeit [MyList.count(i) for i in MyList]
#41.2 s ± 5.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit list(map(dict(zip(*np.unique(MyList, return_counts=True))).get, MyList))
#18 ms ± 56.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.DataFrame(MyList).groupby(MyList).transform(len)[0].tolist()
#2.44 s ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
c=Counter(MyList)
%timeit lout=[c[i] for i in MyList]
#6.89 ms ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
c = Counter(MyList)
%timeit list(itemgetter(*MyList)(c))
#5.27 ms ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
请注意,@Gio 指出列表是 pandas 系列对象。在这种情况下,您可以将 Series 对象转换为列表:
import pandas as pd
l = ["a", "b", "c", "c", "a", "c"]
ds = pd.Series(l)
l=ds.tolist()
[l.count(i) for i in ds]
# [2, 1, 3, 3, 2, 3]
但是,一旦你有了系列,你就可以通过 value_counts
来计算元素了。
l = ["a", "b", "c", "c", "a", "c"]
s = pd.Series(l) #Series object
c=s.value_counts() #c is Series again
[c[i] for i in s]
# [2, 1, 3, 3, 2, 3]