无法在 AKS 群集中启动 RabbitMQ 映像

Unable to start RabbitMQ image in AKS Cluster

我正在尝试在我的 AKS 集群中启动 RabbitMQ 映像。组成群集的 VM 位于专用 VNET 上,并具有适当的防火墙规则。

尚不清楚需要允许通过防火墙的内容(或者甚至是问题所在)。

这是 pod 启动时的输出:

BOOT FAILED

Config file generation failed: Failed to create dirty io scheduler thread 6, error = 11

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...Segmentation fault (core dumped)

{"init terminating in do_boot",generate_config_file} init terminating in do_boot (generate_config_file)

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

我已将持久卷附加到 /var/log 和 /var/lib/rabbitmq,但没有日志文件或任何其他有助于调试此问题的内容。 Schema、lost+found 和其他 rabbitmq 文件夹和文件已创建,所以 reading/writing 没问题。

这是我用来创建 pod 的 YAML:

   apiVersion: extensions/v1beta1
   kind: Deployment
   metadata:
     name: mayan-broker
   spec:
     replicas: 1
     template:
      metadata:
       labels:
         app: mayan-broker
      spec:
        containers:                           
         - name: mayan-broker
           image: rabbitmq:3
           volumeMounts:
           - name: broker-storage
             mountPath: /var/lib/rabbitmq
           - name: broker-logging
             mountPath: /var/log/rabbitmq
           ports:
             - containerPort: 5672
           env:
               -  name: RABBITMQ_DEFAULT_USER
                  value: mayan
               -  name: RABBITMQ_DEFAULT_PASS
                  value: mayan
               -  name: RABBITMQ_DEFAULT_VHOST
                  value: mayan      
        volumes:
         - name: broker-storage
           persistentVolumeClaim:
             claimName: rabbit-claim    
         - name: broker-logging
           persistentVolumeClaim:
             claimName: logging-claim

每个请求没有卷和装载的 YAML,产生相同的结果:

   apiVersion: extensions/v1beta1
   kind: Deployment
   metadata:
     name: mayan-broker
   spec:
     replicas: 1
     template:
      metadata:
       labels:
         app: mayan-broker
      spec:
        containers:                           
         - name: mayan-broker
           image: rabbitmq:3
           ports:
             - containerPort: 5672
           env:
               -  name: RABBITMQ_DEFAULT_USER
                  value: mayan
               -  name: RABBITMQ_DEFAULT_PASS
                  value: mayan
               -  name: RABBITMQ_DEFAULT_VHOST
                  value: MAYAN     

我在使用 AKS 时遇到了同样的问题(我开始认为这是 AKS 的问题)。

基本上 AKS 限制了 pod 可以使用的线程数,而 rabbitmq(以及一般的 Erlang 的所有东西)使用了很多 线程。

您可以在您的 yaml 中使用环境变量来减少线程数,就像在我的配置中一样:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq
spec:
  serviceName: "rabbitmq"
  replicas: 1
  selector:
    matchLabels:
      app: rabbitmq
  template:
    metadata:
      labels:
        app: rabbitmq
    spec:
      containers:
      - name: rabbitmq
        image: rabbitmq:3.7-management
        env:
            # this needs to be there because AKS (as of 1.14.3)
            # limits the number of thread a pod can use
            - name: RABBITMQ_IO_THREAD_POOL_SIZE
              value: "30"
        ports:
        - containerPort: 5672
          name: amqp
        resources:
          limits:
            memory: 4Gi
          requests:
            cpu: "1"
            memory: 1Gi

我使用的是有状态集,但部署的修复是相同的。