如何在删除模型后释放 Python 中的 TF/Keras 内存，而其他模型仍在内存中并在使用中？

Question

我有一个 Python 服务器应用程序，它提供 TensorFlow / Keras 模型推理服务。可以为多个不同的客户端同时加载和使用多个不同的此类模型。一个客户端可以请求加载另一个模型，但这对其他客户端没有影响（即他们的模型保留在内存中并按原样使用，因此每个客户端都可以请求加载另一个模型，而不管任何其他客户端的状态）。

逻辑和实现有效，但是，我不确定如何在此设置中正确释放内存。当客户端请求加载新模型时，先前加载的模型将简单地从内存中删除（通过 Python del 命令），然后通过 [=13= 加载新模型].

根据我在 Keras documentation 中读到的内容，有人可能想要清除 Keras 会话，以便通过调用 tf.keras.backend.clear_session() 释放内存。但是，这似乎释放了 所有 TF 内存，这对我来说是个问题，因为其他客户端的其他 Keras 模型仍在同时使用，如上所述。

此外，我似乎无法将每个模型放入它们自己的进程中，因为我无法从不同的运行进程并行（或根本无法）访问单个 GPU。

所以换句话说：当加载一个新的 TensorFlow/Keras 模型时，而其他模型也在内存中并正在使用中，我如何才能从预先加载的模型中释放 TF 内存，而不干扰当前加载的其他模型？

Answer 1

客户可以fork新内核。每个进程将执行操作和彼此分离的环境。这是更安全和孤立的方式。

我创建了一个包含两部分的基本场景。主要部分负责启动、执行和终止进程。客户端部分的职责是根据服务器的命令执行操作。每个客户端等待带有 HTTP 请求的订单。

main.py

import subprocess
import sys
import requests

class ClientOperator:
    def __init__(self, name, port, model):
        self.name = name
        self.port = port
        self.proc = subprocess.Popen([sys.executable, 'client.py', 
                                f'--port={port}', f'--model={model}'])
    
    def process(self, a, b):
        response = requests.get(f'http://localhost:{self.port}/process', 
                                params={'a': a, 'b': b}).json()

        print(f'{self.name} process {a} + {b} = {response}')

    def close(self):
        print(f'{self.name} is closing')
        self.proc.terminate()


customer1 = ClientOperator('John', 20001, 'model1.hdf5')
customer2 = ClientOperator('Oscar', 20002, 'model2.hdf5')

customer1.process(5, 10)
customer2.process(4, 6)

# stop customer1
customer1.close()

client.py

import argparse
from flask import Flask, request, jsonify

# parse arguments
parser = argparse.ArgumentParser()
parser.add_argument('--port', '-p', type=int)
parser.add_argument('--model', '-m', type=str)
args = parser.parse_args()

model = args.model

app = Flask(__name__)

@app.route('/process', methods=['GET'])
def process():
    result = int(request.args['a']) + int(request.args['b'])
    return jsonify({'result': result, 'model': model})


if __name__ == '__main__':
    app.run(host="localhost", port=args.port)

输出：

$ python main.py

 * Serving Flask app "client" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://localhost:20002/ (Press CTRL+C to quit)
 * Serving Flask app "client" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://localhost:20001/ (Press CTRL+C to quit)


127.0.0.1 - - [22/Jan/2021 16:31:26] "?[37mGET /process?a=5&b=10 HTTP/1.1?[0m" 200 -
John process 5 + 10 = {'model': 'model1.hdf5', 'result': 15}

127.0.0.1 - - [22/Jan/2021 16:31:27] "?[37mGET /process?a=4&b=6 HTTP/1.1?[0m" 200 -
Oscar process 4 + 6 = {'model': 'model2.hdf5', 'result': 10}

John is closing

Answer 2

当 Tensorflow 会话启动时，它将尝试分配所有可用的 GPU 内存。这就是阻止多个进程运行ning 会话的原因。阻止这种情况的理想方法是确保 tf 会话只分配一部分内存。从 docs 开始，有两种方法可以做到这一点（取决于您的 tf 版本）

简单的方法是 (tf 2.2+)

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
  tf.config.experimental.set_memory_growth(gpu, True)

tf 2.0/2.1

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth(True)

for tf 1.*（为每个进程分配 30% 的内存百分比）

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

恕我直言，另一种方法更受控制并且扩展性更好。它要求您创建逻辑设备并手动控制每个逻辑设备的放置。

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
    try:
        tf.config.experimental.set_virtual_device_configuration(
            gpus[0],
            [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),
             tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]
     except RuntimeError as e:
            # Virtual devices must be set before GPUs have been initialized
         print(e)

现在您必须使用 with

手动控制位置

gpus = tf.config.experimental.list_logical_devices('GPU')
if gpus:
  # Replicate your computation on multiple GPUs
  c = []
  for gpu in gpus:
    with tf.device(gpu.name):
      a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
      b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
      c.append(tf.matmul(a, b))

  with tf.device('/CPU:0'):
    matmul_sum = tf.add_n(c)

  print(matmul_sum)

使用它你不会运行内存不足并且可以一次运行多个进程。

如何在删除模型后释放 Python 中的 TF/Keras 内存，而其他模型仍在内存中并在使用中？

How to free TF/Keras memory in Python after a model has been deleted, while other models are still in memory and in use?

python

memory-management

keras

tensorflow