re/use prometheus django exporter 的 redis 连接的最佳方式
Best way to re/use redis connections for prometheus django exporter
我收到一个错误
redis.exceptions.ConnectionError: Error 24 connecting to redis-service:6379. Too many open files.
...
OSError: [Errno 24] Too many open files
我知道这可以通过增加 ulimit
来解决,但我认为这不是这里的问题,而且这是容器上的服务 运行。
应用程序正常启动并正常运行 48 小时,然后出现上述错误。
这意味着连接随着时间的推移呈指数增长。
我的应用程序基本上在做什么
- background_task (ran using celery) -> collects data from postgres and sets it on redis
- prometheus reaches the app at '/metrics' which is a django view -> collects data from redis and serves the data using django prometheus exporter
代码看起来像这样
views.py
from prometheus_client.core import GaugeMetricFamily, REGISTRY
from my_awesome_app.taskbroker.celery import app
class SomeMetricCollector:
def get_sample_metrics(self):
with app.connection_or_acquire() as conn:
client = conn.channel().client
result = client.get('some_metric_key')
return {'some_metric_key': result}
def collect(self):
sample_metrics = self.get_sample_metrics()
for key, value in sample_metrics.items():
yield GaugeMetricFamily(key, 'This is a custom metric', value=value)
REGISTRY.register(SomeMetricCollector())
tasks.py
# This is my boilerplate taskbroker app
from my_awesome_app.taskbroker.celery import app
# How it's collecting data from postgres is trivial to this issue.
from my_awesome_app.utility_app.utility import some_value_calculated_from_query
@app.task()
def app_metrics_sync_periodic():
with app.connection_or_acquire() as conn:
client = conn.channel().client
client.set('some_metric_key', some_value_calculated_from_query(), ex=21600)
return True
我不认为 tasks.py
中的后台数据收集导致 Redis 连接呈指数增长,而是 views.py
中的 Django 视图 '/metrics'
导致。
你能告诉我我做错了什么吗?
如果有更好的方法从 Django 视图读取 Redis。 Prometheus 实例每隔 5s
.
抓取 Django 应用程序
这个答案是根据我的用例和研究得出的。
根据我的说法,这里的问题是 /metrics
的每个请求都会启动一个新线程,其中 views.py
在 Celery
代理的连接池中创建新连接。
让 Django
通过 cache backend 管理它自己的 Redis
连接池,让 Celery
管理它自己的 Redis
连接池和不从各自的线程使用彼此的连接池。
Django 端
config.py
# CACHES
# ------------------------------------------------------------------------------
# For more details on options for your cache backend please refer
# https://docs.djangoproject.com/en/3.1/ref/settings/#backend
CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": "redis://localhost:6379/0",
"OPTIONS": {
"CLIENT_CLASS": "django_redis.client.DefaultClient",
},
}
}
views.py
from prometheus_client.core import GaugeMetricFamily, REGISTRY
# *: Replacing celery app with Django cache backend
from django.core.cache import cache
class SomeMetricCollector:
def get_sample_metrics(self):
# *: This is how you will get the new client, which is still context managed.
with cache.client.get_client() as client:
result = client.get('some_metric_key')
return {'some_metric_key': result}
def collect(self):
sample_metrics = self.get_sample_metrics()
for key, value in sample_metrics.items():
yield GaugeMetricFamily(key, 'This is a custom metric', value=value)
REGISTRY.register(SomeMetricCollector())
This will ensure that Django
will maintain it's Redis
connection pool and not cause new connections to be spun up unnecessarily.
芹菜面
tasks.py
# This is my boilerplate taskbroker app
from my_awesome_app.taskbroker.celery import app
# How it's collecting data from postgres is trivial to this issue.
from my_awesome_app.utility_app.utility import some_value_calculated_from_query
@app.task()
def app_metrics_sync_periodic():
with app.connection_or_acquire() as conn:
# *: This will force celery to always look into the existing connection pool for connection.
client = conn.default_channel.client
client.set('some_metric_key', some_value_calculated_from_query(), ex=21600)
return True
如何监控连接?
- 有一个很好的 prometheus celery exporter 可以帮助您监控您的 celery 任务 activity 不确定如何向其添加连接池和连接监控。
- 每次在 Web 应用程序上点击
/metrics
时,手动验证连接是否在增长的最简单方法是:
$ redis-cli
127.0.0.1:6379> CLIENT LIST
...
- client list 命令将帮助您查看连接数是否在增长。
- 遗憾的是我不使用队列,但我会推荐使用队列。这是我的工人的运行方式:
$ celery -A my_awesome_app.taskbroker worker --concurrency=20 -l ERROR -E
我收到一个错误
redis.exceptions.ConnectionError: Error 24 connecting to redis-service:6379. Too many open files.
...
OSError: [Errno 24] Too many open files
我知道这可以通过增加 ulimit
来解决,但我认为这不是这里的问题,而且这是容器上的服务 运行。
应用程序正常启动并正常运行 48 小时,然后出现上述错误。
这意味着连接随着时间的推移呈指数增长。
我的应用程序基本上在做什么
- background_task (ran using celery) -> collects data from postgres and sets it on redis
- prometheus reaches the app at '/metrics' which is a django view -> collects data from redis and serves the data using django prometheus exporter
代码看起来像这样
views.py
from prometheus_client.core import GaugeMetricFamily, REGISTRY
from my_awesome_app.taskbroker.celery import app
class SomeMetricCollector:
def get_sample_metrics(self):
with app.connection_or_acquire() as conn:
client = conn.channel().client
result = client.get('some_metric_key')
return {'some_metric_key': result}
def collect(self):
sample_metrics = self.get_sample_metrics()
for key, value in sample_metrics.items():
yield GaugeMetricFamily(key, 'This is a custom metric', value=value)
REGISTRY.register(SomeMetricCollector())
tasks.py
# This is my boilerplate taskbroker app
from my_awesome_app.taskbroker.celery import app
# How it's collecting data from postgres is trivial to this issue.
from my_awesome_app.utility_app.utility import some_value_calculated_from_query
@app.task()
def app_metrics_sync_periodic():
with app.connection_or_acquire() as conn:
client = conn.channel().client
client.set('some_metric_key', some_value_calculated_from_query(), ex=21600)
return True
我不认为 tasks.py
中的后台数据收集导致 Redis 连接呈指数增长,而是 views.py
中的 Django 视图 '/metrics'
导致。
你能告诉我我做错了什么吗?
如果有更好的方法从 Django 视图读取 Redis。 Prometheus 实例每隔 5s
.
这个答案是根据我的用例和研究得出的。
根据我的说法,这里的问题是 /metrics
的每个请求都会启动一个新线程,其中 views.py
在 Celery
代理的连接池中创建新连接。
让 Django
通过 cache backend 管理它自己的 Redis
连接池,让 Celery
管理它自己的 Redis
连接池和不从各自的线程使用彼此的连接池。
Django 端
config.py
# CACHES
# ------------------------------------------------------------------------------
# For more details on options for your cache backend please refer
# https://docs.djangoproject.com/en/3.1/ref/settings/#backend
CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": "redis://localhost:6379/0",
"OPTIONS": {
"CLIENT_CLASS": "django_redis.client.DefaultClient",
},
}
}
views.py
from prometheus_client.core import GaugeMetricFamily, REGISTRY
# *: Replacing celery app with Django cache backend
from django.core.cache import cache
class SomeMetricCollector:
def get_sample_metrics(self):
# *: This is how you will get the new client, which is still context managed.
with cache.client.get_client() as client:
result = client.get('some_metric_key')
return {'some_metric_key': result}
def collect(self):
sample_metrics = self.get_sample_metrics()
for key, value in sample_metrics.items():
yield GaugeMetricFamily(key, 'This is a custom metric', value=value)
REGISTRY.register(SomeMetricCollector())
This will ensure that
Django
will maintain it'sRedis
connection pool and not cause new connections to be spun up unnecessarily.
芹菜面
tasks.py
# This is my boilerplate taskbroker app
from my_awesome_app.taskbroker.celery import app
# How it's collecting data from postgres is trivial to this issue.
from my_awesome_app.utility_app.utility import some_value_calculated_from_query
@app.task()
def app_metrics_sync_periodic():
with app.connection_or_acquire() as conn:
# *: This will force celery to always look into the existing connection pool for connection.
client = conn.default_channel.client
client.set('some_metric_key', some_value_calculated_from_query(), ex=21600)
return True
如何监控连接?
- 有一个很好的 prometheus celery exporter 可以帮助您监控您的 celery 任务 activity 不确定如何向其添加连接池和连接监控。
- 每次在 Web 应用程序上点击
/metrics
时,手动验证连接是否在增长的最简单方法是:$ redis-cli 127.0.0.1:6379> CLIENT LIST ...
- client list 命令将帮助您查看连接数是否在增长。
- 遗憾的是我不使用队列,但我会推荐使用队列。这是我的工人的运行方式:
$ celery -A my_awesome_app.taskbroker worker --concurrency=20 -l ERROR -E