TensorFlow 单个和多个 GPU

17 Mar 2025 | 4 分钟阅读

我们常用的系统可以包含多个用于计算的设备，并且我们已经知道 TensorFlow 支持 CPU 和 GPU，我们用字符串表示它们。

例如

如果我们有一个 CPU，它可以被寻址为 "/cpu:0"。
TensorFlow GPU 字符串的索引从零开始。
类似地，第二个 GPU 是 "/device:GPU:1"。

设备放置日志记录

我们可以通过创建一个会话来找出哪些设备处理特定的操作，其中预设了 log_device_placementconfiguration 选项。

# Graph creation.
x = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='x')
y = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='y')
z = tf.matmul(x, y)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operation.
print(sess.run(z))

TensorFlow GPU 设备放置日志记录的输出如下所示

/log:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/device:GPU:0
a: /job:localhost/replica:0/task:0/device:GPU:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
[ 49.  64.]]

手动设备放置

有时我们可能想决定我们的操作应该在哪个设备上运行，我们可以通过使用 tf.device 创建一个上下文来实现，我们在其中分配特定的设备，例如。应该执行计算的 CPU 或 GPU，如下所示

# Graph Creation.
with tf.device('/cpu:0'):
x = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='x')
y = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='y')
z = tf.matmul(x, y)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operation.
print(sess.run(z))

上面的 TensorFlow GPU 代码将常量 a 和 b 分配给 cpu:o。在代码的第二部分中，由于没有明确声明哪个设备要执行任务，因此默认选择 GPU（如果可用），并且它在设备之间复制多维数组。

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
y: /job:localhost/replica:0/task:0/cpu:0
x: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22. 28.]
[ 49. 64.]]

优化 TensorFlow GPU 内存

进行内存碎片整理是为了通过映射处理器可见的几乎所有 TensorFlow GPU 内存来优化内存资源，从而节省大量潜在资源。 TensorFlow GPU 提供了两个配置选项来控制处理器在需要时分配的内存量，以节省内存，以下描述了这些 TensorFlow GPU 优化

ConfigProto 用于此目的

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

per_process_gpu_memory_fraction 是第二种选择，它决定了应为每个使用的 GPU 分配的总内存的片段。下面给出的示例将用于 tensorflow 分配 40% 的内存

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

它仅在已经指定计算并且确定在处理期间不会更改的情况下使用。

多 GPU 系统中的单个 GPU

在多 TensorFlow GPU 系统中，默认情况下选择具有最低标识的设备，用户不需要它。

# Creates a graph.
with tf.device('/device:GPU:2'):
x = tf.constant([1, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='x')
y = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='y')
z= tf.matmul(x, y)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c=z))

当用户指定的 TensorFlow GPU 不存在时，会获得 InvalidArgumentError，如下所示

InvalidArgumentError: Invalid argument: Cannot assign a device to node 'y':
Could not satisfy explicit device specification '/device:GPU:2'
[[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
values: 1 2 3...>, _device="/device:GPU:2"]()]]

如果我们想指定默认值

在 TensorFlow 中使用多个 GPU

我们已经了解 TensorFlow 中的塔，并且可以将每个塔分配给一个 GPU，从而创建一个用于使用 TensorFlow 多个 GPU 的多塔结构模型。

z= []
for d in ['/device:GPU:2', '/device:GPU:3']:
with tf.device(d):
x = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
y = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
z.append(tf.matmul(x, y))
with tf.device('/cpu:0'):
sum = tf.add_n(z)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operations.
print(sess.run(sum))

TensorFlow GPU 的输出如下

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: 2/job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[ 44. 56.]
[ 98.  128.]]

我们可以使用简单的数据集（例如 CIFAR10）来测试此多 GPU 模型，以进行实验并了解如何使用 GPU。

下一主题TensorFlow Mobile

TensorFlow 单个和多个 GPU

设备放置日志记录

手动设备放置

优化 TensorFlow GPU 内存

多 GPU 系统中的单个 GPU

在 TensorFlow 中使用多个 GPU

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

TensorFlow 教程

TensorFlow 基础

TensorFlow 感知器

TensorFlow 中的 ANN

线性回归

TensorFlow 中的 CNN

TensorFlow 中的 RNN

风格迁移

TensorBoard

差异

目标检测

TensorFlow 调试

其他主题

TensorFlow 单个和多个 GPU

设备放置日志记录

手动设备放置

优化 TensorFlow GPU 内存

多 GPU 系统中的单个 GPU

在 TensorFlow 中使用多个 GPU

相关帖子

TensorFlow Lite 简介

Tensorflow APIs

音频识别

TensorFlow Adam 优化器

形成图

Tensorflow 安全

Tensorflow Mobile

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器