在 Python 中,什么是一种清晰、有效的方法来计算区域中的事物?
In Python, what would be a clear, efficient way to count things in regions?
我正在遍历称为事件的对象。每个事件中都有一个特定的对象。我正在计算具有特定特征的对象的分数。将方法想象成如下所示:
for event in events:
countCars =+ 1
if event.car.isBlue() is True:
countCarsBlue =+ 1
print("fraction of cars that are blue: {fraction}".format(
fraction = countCarsBlue / countCars))
现在,假设我想计算具有特定特征 的对象在另一个对象特征 区域中所占的比例。因此,在我的示例中,我正在计算蓝色汽车的比例。现在,我想计算汽车长度从 0 米到 1 米范围内蓝色汽车的比例,汽车长度从 1 米到 2 米范围内蓝色汽车的比例,从 2 米到3米和从3米到4米等等。
考虑到我正在处理大量统计数据并且比我的简单示例中的 4 个箱子更多的箱子,假设箱子宽度恒定,为此类计算构建代码的好方法是什么?
(对于 可变 bin 宽度是否有明智的方法来做到这一点?)
如果您正在处理 Python 3.4+,枚举实际上对此非常有用。以下是您可以执行的操作的几个示例:
import random
from enum import Enum
from collections import namedtuple
class Color(Enum):
blue = 'blue'
red = 'red'
green = 'green'
Car = namedtuple('Car', ('color', 'length'))
cars = [Car(Color.blue, 10),
Car(Color.blue, 3),
Car(Color.blue, 9),
Car(Color.red, 9),
Car(Color.red, 7),
Car(Color.red, 8),
Car(Color.green, 3),
Car(Color.green, 7),
Car(Color.green, 2),
Car(Color.green, 8),
]
print('# of blue cars:', sum(1 for car in cars if car.color == Color.blue))
print('# of cars with length between 3 and 7:',
sum(1 for car in cars if 3 <= car.length <= 7))
random_color = random.choice(tuple(Color))
lower_limit = random.randint(1,10)
upper_limit = random.randint(lower_limit,10)
print('# of {} cars with length {} to {} (inclusive):'.format(random_color.name,
lower_limit,
upper_limit),
sum(1 for car in cars if car.color == random_color
and lower_limit <= car.length <= upper_limit))
important_colors = (Color.blue, Color.green)
important_lengths = (1,2,3,5,7)
print('Number of cars that match some contrived criteria:',
sum(1 for car in cars if car.color in important_colors
and car.length in important_lengths))
如果您谈论的是连续范围,lower < value < upper
是一种很好的检查方法。如果您有离散值(如颜色),您可以创建一个有趣的颜色集合并检查该集合中的成员资格。另请注意,您可以轻松使用可变 bin 大小。
如果您不仅仅对简单的计数感兴趣,也可以使用 itertools.groupby
。请注意,如果您的项目是参考对象,那么在一个集合中更改某些内容会在另一个集合中更改它:
In [15]: class Simple:
....: def __init__(self, name):
....: self.name = name
....: def __repr__(self):
....: return 'Simple(name={!r})'.format(self.name)
....:
In [16]: values = [Simple('one'), Simple('two'), Simple('three')]
In [17]: one = (values[0], values[-1])
In [18]: two = tuple(values[:2])
In [19]: one
Out[19]: (Simple(name='one'), Simple(name='three'))
In [20]: two
Out[20]: (Simple(name='one'), Simple(name='two'))
In [21]: one[0].name = '**changed**'
In [22]: one
Out[22]: (Simple(name='**changed**'), Simple(name='three'))
In [23]: two
Out[23]: (Simple(name='**changed**'), Simple(name='two'))
首先,一些代码可以重新创建您的示例:
import random
class Event(object):
def __init__(self):
self.car = None
class Car(object):
def __init__(self, isBlue, length):
self._isBlue = isBlue
self._length = length
def isBlue(self):
return self._isBlue
def length(self):
return self._length
def __str__(self):
return '{} car of {} m long.'.format('blue' if self.isBlue() else 'non-blue ', self.length())
OK,现在我随机创建十个car
对象并将它们添加到一个event
:
totalNumberOfCars = 10
events = []
for _ in range(totalNumberOfCars):
car = Car(random.choice([True, False]), random.randrange(5, 40)/10.)
print car
event = Event()
event.car = car
events.append(event)
对我来说,输出如下(你的输出当然可以不同):
blue car of 0.5 m long.
non-blue car of 2.3 m long.
non-blue car of 3.8 m long.
blue car of 2.1 m long.
non-blue car of 0.6 m long.
blue car of 0.8 m long.
blue car of 0.5 m long.
blue car of 2.3 m long.
blue car of 3.3 m long.
blue car of 2.1 m long.
现在,如果我们想按地区统计我们的事件,您可以按如下方式进行:
allBlueCars = sum(1 for event in events if event.car.isBlue())
print "Number of blue cars: {}".format(allBlueCars)
maxCarLen = 4
for region in zip(range(maxCarLen ), range(1, maxCarLen +1)):
minlen, maxlen = region
print "Cars between {} and {} m that are blue:".format(minlen, maxlen)
blueCarsInRegion = [str(event.car) for event in events if event.car.isBlue() and minlen <= event.car.length() < maxlen]
if blueCarsInRegion:
print '\n'.join(['\t{}'.format(car) for car in blueCarsInRegion])
else:
print 'no blue cars in this region'
fraction = float(len(blueCarsInRegion)) / allBlueCars
print "fraction of cars that are blue and between {} and {} m long: {}".format(minlen, maxlen, fraction)
print
对于上面的示例数据,将打印:
Number of blue cars: 7
Cars between 0 and 1 m that are blue:
blue car of 0.5 m long.
blue car of 0.8 m long.
blue car of 0.5 m long.
fraction of cars that are blue and between 0 and 1 m long: 0.428571428571
Cars between 1 and 2 m that are blue:
no blue cars in this region
fraction of cars that are blue and between 1 and 2 m long: 0.0
Cars between 2 and 3 m that are blue:
blue car of 2.1 m long.
blue car of 2.3 m long.
blue car of 2.1 m long.
fraction of cars that are blue and between 2 and 3 m long: 0.428571428571
Cars between 3 and 4 m that are blue:
blue car of 3.3 m long.
fraction of cars that are blue and between 3 and 4 m long: 0.142857142857
我正在遍历称为事件的对象。每个事件中都有一个特定的对象。我正在计算具有特定特征的对象的分数。将方法想象成如下所示:
for event in events:
countCars =+ 1
if event.car.isBlue() is True:
countCarsBlue =+ 1
print("fraction of cars that are blue: {fraction}".format(
fraction = countCarsBlue / countCars))
现在,假设我想计算具有特定特征 的对象在另一个对象特征 区域中所占的比例。因此,在我的示例中,我正在计算蓝色汽车的比例。现在,我想计算汽车长度从 0 米到 1 米范围内蓝色汽车的比例,汽车长度从 1 米到 2 米范围内蓝色汽车的比例,从 2 米到3米和从3米到4米等等。
考虑到我正在处理大量统计数据并且比我的简单示例中的 4 个箱子更多的箱子,假设箱子宽度恒定,为此类计算构建代码的好方法是什么?
(对于 可变 bin 宽度是否有明智的方法来做到这一点?)
如果您正在处理 Python 3.4+,枚举实际上对此非常有用。以下是您可以执行的操作的几个示例:
import random
from enum import Enum
from collections import namedtuple
class Color(Enum):
blue = 'blue'
red = 'red'
green = 'green'
Car = namedtuple('Car', ('color', 'length'))
cars = [Car(Color.blue, 10),
Car(Color.blue, 3),
Car(Color.blue, 9),
Car(Color.red, 9),
Car(Color.red, 7),
Car(Color.red, 8),
Car(Color.green, 3),
Car(Color.green, 7),
Car(Color.green, 2),
Car(Color.green, 8),
]
print('# of blue cars:', sum(1 for car in cars if car.color == Color.blue))
print('# of cars with length between 3 and 7:',
sum(1 for car in cars if 3 <= car.length <= 7))
random_color = random.choice(tuple(Color))
lower_limit = random.randint(1,10)
upper_limit = random.randint(lower_limit,10)
print('# of {} cars with length {} to {} (inclusive):'.format(random_color.name,
lower_limit,
upper_limit),
sum(1 for car in cars if car.color == random_color
and lower_limit <= car.length <= upper_limit))
important_colors = (Color.blue, Color.green)
important_lengths = (1,2,3,5,7)
print('Number of cars that match some contrived criteria:',
sum(1 for car in cars if car.color in important_colors
and car.length in important_lengths))
如果您谈论的是连续范围,lower < value < upper
是一种很好的检查方法。如果您有离散值(如颜色),您可以创建一个有趣的颜色集合并检查该集合中的成员资格。另请注意,您可以轻松使用可变 bin 大小。
如果您不仅仅对简单的计数感兴趣,也可以使用 itertools.groupby
。请注意,如果您的项目是参考对象,那么在一个集合中更改某些内容会在另一个集合中更改它:
In [15]: class Simple:
....: def __init__(self, name):
....: self.name = name
....: def __repr__(self):
....: return 'Simple(name={!r})'.format(self.name)
....:
In [16]: values = [Simple('one'), Simple('two'), Simple('three')]
In [17]: one = (values[0], values[-1])
In [18]: two = tuple(values[:2])
In [19]: one
Out[19]: (Simple(name='one'), Simple(name='three'))
In [20]: two
Out[20]: (Simple(name='one'), Simple(name='two'))
In [21]: one[0].name = '**changed**'
In [22]: one
Out[22]: (Simple(name='**changed**'), Simple(name='three'))
In [23]: two
Out[23]: (Simple(name='**changed**'), Simple(name='two'))
首先,一些代码可以重新创建您的示例:
import random
class Event(object):
def __init__(self):
self.car = None
class Car(object):
def __init__(self, isBlue, length):
self._isBlue = isBlue
self._length = length
def isBlue(self):
return self._isBlue
def length(self):
return self._length
def __str__(self):
return '{} car of {} m long.'.format('blue' if self.isBlue() else 'non-blue ', self.length())
OK,现在我随机创建十个car
对象并将它们添加到一个event
:
totalNumberOfCars = 10
events = []
for _ in range(totalNumberOfCars):
car = Car(random.choice([True, False]), random.randrange(5, 40)/10.)
print car
event = Event()
event.car = car
events.append(event)
对我来说,输出如下(你的输出当然可以不同):
blue car of 0.5 m long.
non-blue car of 2.3 m long.
non-blue car of 3.8 m long.
blue car of 2.1 m long.
non-blue car of 0.6 m long.
blue car of 0.8 m long.
blue car of 0.5 m long.
blue car of 2.3 m long.
blue car of 3.3 m long.
blue car of 2.1 m long.
现在,如果我们想按地区统计我们的事件,您可以按如下方式进行:
allBlueCars = sum(1 for event in events if event.car.isBlue())
print "Number of blue cars: {}".format(allBlueCars)
maxCarLen = 4
for region in zip(range(maxCarLen ), range(1, maxCarLen +1)):
minlen, maxlen = region
print "Cars between {} and {} m that are blue:".format(minlen, maxlen)
blueCarsInRegion = [str(event.car) for event in events if event.car.isBlue() and minlen <= event.car.length() < maxlen]
if blueCarsInRegion:
print '\n'.join(['\t{}'.format(car) for car in blueCarsInRegion])
else:
print 'no blue cars in this region'
fraction = float(len(blueCarsInRegion)) / allBlueCars
print "fraction of cars that are blue and between {} and {} m long: {}".format(minlen, maxlen, fraction)
print
对于上面的示例数据,将打印:
Number of blue cars: 7
Cars between 0 and 1 m that are blue:
blue car of 0.5 m long.
blue car of 0.8 m long.
blue car of 0.5 m long.
fraction of cars that are blue and between 0 and 1 m long: 0.428571428571
Cars between 1 and 2 m that are blue:
no blue cars in this region
fraction of cars that are blue and between 1 and 2 m long: 0.0
Cars between 2 and 3 m that are blue:
blue car of 2.1 m long.
blue car of 2.3 m long.
blue car of 2.1 m long.
fraction of cars that are blue and between 2 and 3 m long: 0.428571428571
Cars between 3 and 4 m that are blue:
blue car of 3.3 m long.
fraction of cars that are blue and between 3 and 4 m long: 0.142857142857