嵌套字典中的平均值
Average values in nested dictionary
我想创建一个新的值列表,my_qty
其中每个项目等于 d[key]['qty']
中所有值的平均值,其中 d[key]['start date']
匹配 my_dates
。我想我已经接近了,但是我被嵌套部分挂断了。
import datetime
import numpy as np
my_dates = [datetime.datetime(2014, 10, 12, 0, 0), datetime.datetime(2014, 10, 13, 0, 0), datetime.datetime(2014, 10, 14, 0, 0)]
d = {
'ID1' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 12},
'ID2' : {'start date': datetime.datetime(2014, 10, 13, 0, 0) , 'qty': 34},
'ID3' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 35},
'ID4' : {'start date': datetime.datetime(2014, 10, 11, 0, 0) , 'qty': 40},
}
my_qty = []
for item in my_dates:
my_qty.append([np.mean(x for x in d[key]['qty']) if d[key]['start date'] == my_dates[item]])
print my_qty
期望的输出:
[23.5,34,0]
阐明每个请求的输出:
[average of d[key]['qty'] where d[key]['start date '] == my_dates[0], average of d[key]['qty'] where d[key]['start date '] == my_dates[1], average of d[key]['qty'] where d[key]['start date '] == my_dates[2],]
纯python
简单的方法是将数量按日期分组到字典中:
import collections
quantities = collections.defaultdict(lambda: [])
for k,v in d.iteritems():
quantities[v["start date"]].append(v["qty"])
然后 运行 在该字典上计算均值:
means = {k: float(sum(q))/len(q) for k,q in quantities.iteritems()}
给予:
>>> means
{datetime.datetime(2014, 10, 11, 0, 0): 40.0,
datetime.datetime(2014, 10, 12, 0, 0): 23.5,
datetime.datetime(2014, 10, 13, 0, 0): 34.0}
如果您想变得聪明,可以通过保持当前平均值和您所看到的值的总数来一次计算平均值。您甚至可以将其抽象为 class:
class RunningMean(object):
def __init__(self, mean=None, n=0):
self.mean = mean
self.n = n
def insert(self, other):
if self.mean is None:
self.mean = 0.0
self.mean = (self.mean * self.n + other) / (self.n + 1)
self.n += 1
def __repr__(self):
args = (self.__class__.__name__, self.mean, self.n)
return "{}(mean={}, n={})".format(*args)
通过你的数据一次就会给你答案:
import collections
means = collections.defaultdict(lambda: RunningMean())
for k,v in d.iteritems():
means[v["start date"]].insert(v["qty"])
与pandas
真正 简单的方法是使用 pandas
库,因为它是为这样的事情而制作的。这是一些代码:
import pandas as pd
df = pd.DataFrame.from_dict(d, orient="index")
means = df.groupby("start date").aggregate(np.mean)
给予:
>>> means
qty
start date
2014-10-11 40.0
2014-10-12 23.5
2014-10-13 34.0
下面是一些可以帮助您的工作代码:
for item in my_dates:
nums = [ d[key]['qty'] for key in d if d[key]['start date'] == item ]
if len(nums):
avg = np.mean(nums)
else:
avg = 0
print item, nums, avg
请注意,np.mean
不适用于空列表,因此您必须检查要平均的数字的长度。
一行答案:
mean_qty = [np.mean([i['qty'] for i in d.values()\
if i.get('start date') == day] or 0) for day in my_dates]
In [12]: mean_qty
Out[12]: [23.5, 34.0, 0.0]
or 0
的目的是 return 0 作为 OP 想要的,如果没有 qty
因为 np.mean 在空列表 returns nan
默认。
如果你需要速度,那么在 jme 出色的第二部分的基础上,你可以这样做(我通过在需要时不重新计算平均值将他的时间缩短了 3 倍):
class RunningMean(object):
def __init__(self, total=0.0, n=0):
self.total=total
self.n = n
def __iadd__(self, other):
self.total += other
self.n += 1
return self
def mean(self):
return (self.total/self.n if self.n else 0)
def __repr__(self):
return "RunningMean(total=%f, n=%i)" %(self.total, self.n)
means = defaultdict(RunningMean)
for v in d.values():
means[v["start date"]] += (v["qty"])
Out[351]:
[RunningMean(mean= 40.000000),
RunningMean(mean= 34.000000),
RunningMean(mean= 23.500000)]
我想创建一个新的值列表,my_qty
其中每个项目等于 d[key]['qty']
中所有值的平均值,其中 d[key]['start date']
匹配 my_dates
。我想我已经接近了,但是我被嵌套部分挂断了。
import datetime
import numpy as np
my_dates = [datetime.datetime(2014, 10, 12, 0, 0), datetime.datetime(2014, 10, 13, 0, 0), datetime.datetime(2014, 10, 14, 0, 0)]
d = {
'ID1' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 12},
'ID2' : {'start date': datetime.datetime(2014, 10, 13, 0, 0) , 'qty': 34},
'ID3' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 35},
'ID4' : {'start date': datetime.datetime(2014, 10, 11, 0, 0) , 'qty': 40},
}
my_qty = []
for item in my_dates:
my_qty.append([np.mean(x for x in d[key]['qty']) if d[key]['start date'] == my_dates[item]])
print my_qty
期望的输出:
[23.5,34,0]
阐明每个请求的输出:
[average of d[key]['qty'] where d[key]['start date '] == my_dates[0], average of d[key]['qty'] where d[key]['start date '] == my_dates[1], average of d[key]['qty'] where d[key]['start date '] == my_dates[2],]
纯python
简单的方法是将数量按日期分组到字典中:
import collections
quantities = collections.defaultdict(lambda: [])
for k,v in d.iteritems():
quantities[v["start date"]].append(v["qty"])
然后 运行 在该字典上计算均值:
means = {k: float(sum(q))/len(q) for k,q in quantities.iteritems()}
给予:
>>> means
{datetime.datetime(2014, 10, 11, 0, 0): 40.0,
datetime.datetime(2014, 10, 12, 0, 0): 23.5,
datetime.datetime(2014, 10, 13, 0, 0): 34.0}
如果您想变得聪明,可以通过保持当前平均值和您所看到的值的总数来一次计算平均值。您甚至可以将其抽象为 class:
class RunningMean(object):
def __init__(self, mean=None, n=0):
self.mean = mean
self.n = n
def insert(self, other):
if self.mean is None:
self.mean = 0.0
self.mean = (self.mean * self.n + other) / (self.n + 1)
self.n += 1
def __repr__(self):
args = (self.__class__.__name__, self.mean, self.n)
return "{}(mean={}, n={})".format(*args)
通过你的数据一次就会给你答案:
import collections
means = collections.defaultdict(lambda: RunningMean())
for k,v in d.iteritems():
means[v["start date"]].insert(v["qty"])
与pandas
真正 简单的方法是使用 pandas
库,因为它是为这样的事情而制作的。这是一些代码:
import pandas as pd
df = pd.DataFrame.from_dict(d, orient="index")
means = df.groupby("start date").aggregate(np.mean)
给予:
>>> means
qty
start date
2014-10-11 40.0
2014-10-12 23.5
2014-10-13 34.0
下面是一些可以帮助您的工作代码:
for item in my_dates:
nums = [ d[key]['qty'] for key in d if d[key]['start date'] == item ]
if len(nums):
avg = np.mean(nums)
else:
avg = 0
print item, nums, avg
请注意,np.mean
不适用于空列表,因此您必须检查要平均的数字的长度。
一行答案:
mean_qty = [np.mean([i['qty'] for i in d.values()\
if i.get('start date') == day] or 0) for day in my_dates]
In [12]: mean_qty
Out[12]: [23.5, 34.0, 0.0]
or 0
的目的是 return 0 作为 OP 想要的,如果没有 qty
因为 np.mean 在空列表 returns nan
默认。
如果你需要速度,那么在 jme 出色的第二部分的基础上,你可以这样做(我通过在需要时不重新计算平均值将他的时间缩短了 3 倍):
class RunningMean(object):
def __init__(self, total=0.0, n=0):
self.total=total
self.n = n
def __iadd__(self, other):
self.total += other
self.n += 1
return self
def mean(self):
return (self.total/self.n if self.n else 0)
def __repr__(self):
return "RunningMean(total=%f, n=%i)" %(self.total, self.n)
means = defaultdict(RunningMean)
for v in d.values():
means[v["start date"]] += (v["qty"])
Out[351]:
[RunningMean(mean= 40.000000),
RunningMean(mean= 34.000000),
RunningMean(mean= 23.500000)]