默认情况下,TensorFlow 是否使用机器中所有可用的 GPU?
Does TensorFlow by default use all available GPUs in the machine?
我的机器上有 3 个 GTX Titan GPU。我 运行 Cifar10 中提供的示例 cifar10_train.py 并得到以下输出:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:60] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:60] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:694] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:694] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN, pci bus id: 0000:84:00.0)
在我看来,TensorFlow 正在尝试在两个设备(gpu0 和 gpu1)上进行自我初始化。
我的问题是为什么它只在两台设备上这样做,有什么办法可以防止这种情况发生吗? (我只想运行就好像只有一个GPU一样)
参见:Using GPUs
手动放置设备
如果您希望在您选择的设备上对 运行 进行特定操作而不是自动为您选择的操作,您可以使用 with tf.device
创建设备上下文,以便所有该上下文中的操作将具有相同的设备分配。
# Creates a graph.
with tf.device('/cpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
您会看到现在 a 和 b 已分配给 cpu:0
。由于没有为 MatMul
操作明确指定设备,TensorFlow 运行time 将根据操作和可用设备(本例中为 gpu:0)选择一个设备,并在需要时自动在设备之间复制张量.
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
较早的答案 2.
参见:Using GPUs
在 multi-GPU 系统上使用单个 GPU
如果您的系统中有多个 GPU,默认情况下会选择 ID 最小的 GPU。如果您想 运行 在不同的 GPU 上,您需要明确指定首选项:
# Creates a graph.
with tf.device('/gpu:2'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)
较早的答案 1.
来自CUDA_VISIBLE_DEVICES – Masking GPUs
Does your CUDA application need to target a specific GPU? If you are
writing GPU enabled code, you would typically use a device query to
select the desired GPUs. However, a quick and easy solution for
testing is to use the environment variable CUDA_VISIBLE_DEVICES to
restrict the devices that your CUDA application sees. This can be
useful if you are attempting to share resources on a node or you want
your GPU enabled executable to target a specific GPU.
Environment Variable Syntax
Results
CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen
CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible
CUDA_VISIBLE_DEVICES=”0,1” Same as above, quotation marks are optional
CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1
is masked
CUDA will enumerate the visible devices starting at zero. In the last
case, devices 0, 2, 3 will appear as devices 0, 1, 2. If you change
the order of the string to “2,3,0”, devices 2,3,0 will be enumerated
as 0,1,2 respectively. If CUDA_VISIBLE_DEVICES is set to a device that
does not exist, all devices will be masked. You can specify a mix of
valid and invalid device numbers. All devices before the invalid value
will be enumerated, while all devices after the invalid value will be
masked.
To determine the device ID for the available hardware in your system,
you can run NVIDIA’s deviceQuery executable included in the CUDA SDK.
Happy programming!
Chris Mason
我的机器上有 3 个 GTX Titan GPU。我 运行 Cifar10 中提供的示例 cifar10_train.py 并得到以下输出:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:60] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:60] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:694] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:694] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN, pci bus id: 0000:84:00.0)
在我看来,TensorFlow 正在尝试在两个设备(gpu0 和 gpu1)上进行自我初始化。
我的问题是为什么它只在两台设备上这样做,有什么办法可以防止这种情况发生吗? (我只想运行就好像只有一个GPU一样)
参见:Using GPUs
手动放置设备
如果您希望在您选择的设备上对 运行 进行特定操作而不是自动为您选择的操作,您可以使用 with tf.device
创建设备上下文,以便所有该上下文中的操作将具有相同的设备分配。
# Creates a graph.
with tf.device('/cpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
您会看到现在 a 和 b 已分配给 cpu:0
。由于没有为 MatMul
操作明确指定设备,TensorFlow 运行time 将根据操作和可用设备(本例中为 gpu:0)选择一个设备,并在需要时自动在设备之间复制张量.
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
较早的答案 2.
参见:Using GPUs
在 multi-GPU 系统上使用单个 GPU
如果您的系统中有多个 GPU,默认情况下会选择 ID 最小的 GPU。如果您想 运行 在不同的 GPU 上,您需要明确指定首选项:
# Creates a graph.
with tf.device('/gpu:2'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)
较早的答案 1.
来自CUDA_VISIBLE_DEVICES – Masking GPUs
Does your CUDA application need to target a specific GPU? If you are writing GPU enabled code, you would typically use a device query to select the desired GPUs. However, a quick and easy solution for testing is to use the environment variable CUDA_VISIBLE_DEVICES to restrict the devices that your CUDA application sees. This can be useful if you are attempting to share resources on a node or you want your GPU enabled executable to target a specific GPU.
Environment Variable Syntax
Results
CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible CUDA_VISIBLE_DEVICES=”0,1” Same as above, quotation marks are optional CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked
CUDA will enumerate the visible devices starting at zero. In the last case, devices 0, 2, 3 will appear as devices 0, 1, 2. If you change the order of the string to “2,3,0”, devices 2,3,0 will be enumerated as 0,1,2 respectively. If CUDA_VISIBLE_DEVICES is set to a device that does not exist, all devices will be masked. You can specify a mix of valid and invalid device numbers. All devices before the invalid value will be enumerated, while all devices after the invalid value will be masked.
To determine the device ID for the available hardware in your system, you can run NVIDIA’s deviceQuery executable included in the CUDA SDK. Happy programming!
Chris Mason