Keras 模型

2025年3月17日 | 阅读 12 分钟

Keras 提供了两种内置模型：Sequential 模型和带有函数式 API 的高级模型类。Sequential 模型往往是最简单的模型之一，因为它构成了一个线性层集合，而函数式 API 模型可以创建任意网络结构。

Keras Sequential 模型

Sequential 模型中的层按顺序排列，因此被称为 Sequential API。在大多数人工神经网络中，层按顺序排列，层之间的数据流以指定的顺序流动，直到到达输出层。

开始使用 Keras Sequential 模型

可以通过将层实例列表传递给构造函数来简单地创建 sequential 模型

from keras.models import Sequential 
from keras.layers import Dense, Activation 
model = Sequential([
          Dense(32, inpuit_shape=(784,)),
          Activation('relu'),
         Dense(10), 
         Activation('softmax'), 
])

.add() 方法用于添加层

model = Sequential()
model.add(Dense(32, input_dim=784 )) 
model.add(Activation('relu')) 

指定输入形状

由于模型必须知道它期望的输入大小，因此 sequential 模型中的第一层需要指定其输入形状，以便其他层可以自动推测形状。这可以通过以下方式完成

input_shape 参数被传递给最前面的层。它包含一个元组形状，即整数或 None 的元组，其中 None 表示可以预期任何正整数）。它不包括批次维度。
一些 2D 层，例如 Dense，支持通过 input_dim 参数指定输入形状，而一些 3D 时序层支持 input_dim 和 input_length
batch_size 参数被传递给层，以定义输入的批次大小。如果将 batch_size=32 和 input_shape=(6,8) 传递给一个层，那么在这种情况下，预期每个输入批次都将具有批次形状 (32,6,8)。

以下代码片段是严格等效的

model=Sequential ()
model.add(Dense(32, input_shape=(784,))) 

model=Sequential ()
model.add(Dense(32, input_dim=784))

编译

首先编译模型，为此，使用 compile 过程来构建学习过程，然后模型在下一步进行训练。编译包括三个参数，如下所示

一个优化器：顾名思义，优化器可以是现有优化器的字符串（例如 rmsprop 或 adagrad），或者只是类 optimizer 的一个实例。
一个损失函数：损失函数充当每个模型试图最小化的目标，例如 categorical_crossentropy 或 mse。它也被称为目标函数。
一个指标列表：指标列表是指现有指标或自定义指标函数的标识符字符串。建议为任何分类问题设置为 metrics=['accuracy']。

#for a multi-class classification problem
model.compile(optimizer='rmsprop',
                          loss='categorical_crossentropy',
                          metrics=['accuracy'])
#for a binary classification problem 
model.compile(optimizer='rmsprop',
	              loss='binary_crossentropy',
                           metrics=['accuracy']
#for a mean squared error regression problem
model.compile(optimizer='rmsprop',
                          loss='mse')
#for custom metrics 
import keras.backend as K
def mean _pred(y_true, y_pred):
      return K.mean(y_pred)
model.compile(optimizer='rsmprop',
                          loss='binary_crossentropy'
                          metrics=['accuracy', mean_pred])

训练

输入数据或标签的 Numpy 数组被合并用于训练 Keras 模型，因此它使用 fit 函数。

#for a single-input model with 2 classes (binary classification)
model = Sequential ()
model.add(Dense(32, activation='relu', input_dim=100 )) 
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
                         loss='binary_crossentropy',
                         metrics=['accuracy'])
#generate dummy data
import numpy as np
data =  np.random.random((1000, 100))
labels =  np.random.randint(2, size=(1000, 1))
#train the model, iterating on the data in batches of 32 samples 
model.fit(data, labels, epochs=10, batch_size=32)

#for a single input model with 10 classes (categorical classification)
model = Sequential ()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
                         loss='categorical_crossentropy',
                         metrics=['accuracy'])
#generate dummy data
import numpy as np
data = np.random.random((1000,100))
labels=np.random.randint(10, size=(1000,1))
#convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)
#train the model, iterating on the data in the batches of 32 samples
model.fit(data, one_hot_labels, epochs=10, batch_size=32)

示例：在 MNIST 数据集上训练一个简单的深度学习神经网络

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop

batch_size = 128
num_classes = 10
epochs = 20

# split the data between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

用于序列分类的堆叠 LSTM

为了使模型能够学习高级时间表示，3 个 LSTM 层相互堆叠。

这些层以这样的方式堆叠，即前两层产生完整的输出序列，第三层在其输出序列中产生最终阶段，这有助于成功地将输入序列转换为单个向量（即，时间维度下降）。

from keras.models import Sequential 
from keras.layers import LSTM, Dense 
import numpy as np
data_dim = 16
timesteps = 8
num_classes = 10
#expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential ()
model.add(LSTM(32, return_sequences=True,
                  input_shape(timesteps, data_dim))) #returns a sequence of sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True)) #returns a sequence of vectors of dimension 32
model.add(LSTM(32)) #return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
                         optimizer='rmsprop',
                         metrics=['accuracy'])
#generate dummy training data
x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, num_classes))
#generate dummy validation data
x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, num_classes))
model.fit(x_train, y_train, 
                batch_size=64, epochs=5,
                validation_data=(x_val, y_val))

相同的堆叠 LSTM 模型，渲染为“有状态”

一个模型，它的中心（内部）状态被再次用作另一个批次样本的初始状态，这些状态是在处理一批样本后获得的，被称为“有状态循环模型”。它不仅管理了计算复杂度，而且允许处理更长的序列。

from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
data_dim = 16
timesteps = 8
num_classes = 10
batch_size = 32
# Expected input batch shape: (batch_size, timesteps, data_dim)
# Note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Generate dummy training data
x_train = np.random.random((batch_size * 10, timesteps, data_dim))
y_train = np.random.random((batch_size * 10, num_classes))

# Generate dummy validation data
x_val = np.random.random((batch_size * 3, timesteps, data_dim))
y_val = np.random.random((batch_size * 3, num_classes))

model.fit(x_train, y_train,
          batch_size=batch_size, epochs=5, shuffle=False,
          validation_data=(x_val, y_val))

Keras 函数式 API

Keras 函数式 API 用于描绘复杂的模型，例如，多输出模型、有向无环模型或带有共享层的图。换句话说，可以说函数式 API 允许您概述那些共享层的输入或输出。

第一个例子：一个密集连接的网络

要实现一个密集连接的网络，Sequential 模型的结果更好，但如果我们尝试使用另一个模型，这也不是一个坏主意。

Keras 函数式 API 的实现与 Keras Sequential 模型类似。

实例层被张量调用，并返回一个张量作为输出。
要定义一个模型，需要同时使用输入张量和输出张量。

from keras.layers import Input, Dense
from keras.models import Model
# Returns a Tensor
inputs = Input(shape=(784,))
# An instance layer is callable on a tensor and returns a tensor
output_1 = Dense(64, activation='relu')(inputs)
output_2 = Dense(64, activation='relu')(output_1)
predictions = Dense(10, activation='softmax')(output_2)
# Creates a model that includes the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(data, labels)  # start training

所有模型都是可调用的，就像层一样

由于我们正在讨论函数式 API 模型，我们可以简单地通过将任何此类模型视为一个层来重用经过训练的模型。这可以通过在一个张量上调用一个模型来完成。

当我们在一个张量上调用一个模型时，应该注意的是，我们不仅重用了它的架构，而且还重用了它的权重。

x = Input(shape=(784,))
# It works, and returns the 10-way softmax we defined above.
y = model(x)

上面给出的代码允许一个实例构建一个用于处理输入序列的模型。此外，借助单独的一行代码，我们可以将图像分类模型转换为视频分类模型。

from keras.layers import TimeDistributed

# Input tensor for sequences of 20 timesteps, such that each contains a 784-dimensional vector
input_sequences = Input(shape=(20, 784))

# It applies our previous model to every timestep in the input sequences. The output of the previous model was a 10-way softmax, so the output of the layer given below will be a sequence of 20 vectors of size 10.
processed_sequences = TimeDistributed(model)(input_sequences)

多输入和多输出模型

由于函数式 API 很好地解释了多输入和多输出模型，因此它可以通过操作来处理大量相互交织的数据流。让我们看一个下面给出的例子，以更简洁地了解它的概念。基本上，我们将预测社交媒体（如 Twitter）上的新闻标题将获得多少转发和喜欢。

标题（这是一个单词序列）和一个辅助输入将被提供给接受数据的模型，例如，标题发布的时间或日期等。这两个损失函数也被用于监督模型，这样，如果我们首先使用主损失函数，这将是正则化深度学习模型的最佳选择。

这里的 main_input 获取标题作为一个整数序列，其中每个整数将编码每个单词。整数的范围从 1 到 10,000，序列由 100 个单词组成。

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
import numpy as np
np.random.seed(0)  # Sets a random seed for reproducibility.

# Headline input receive sequences of 100 integers in between 1 and 10000.
# Here we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')

# The embedding layer encodes the input sequence into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)

# The LSTM transforms the vector sequence into a single vector that contains the information about an entire sequence.
lstm_out = LSTM(32)(x)

然后将插入辅助损失，这将允许 LSTM 和 Embedding 层平稳地训练自身，即使模型中的主损失较高也是如此。

接下来我们将 aux_input 输入到我们的模型中，这是通过将其与 LSTM 输出连接起来完成的。

auxiliary_input = Input(shape=(5,), name='aux_input')
x = keras.layers.concatenate([lstm_out, auxiliary_input])

# Stacks a densely-connected deep network on the top.
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

# Then add the main logistic regression layer.
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

# Defines a model with two inputs and outputs
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])

此后，我们将通过在辅助损失上分配 0.2 的权重来编译我们的模型。然后我们将使用列表或目录来识别所有不同输出的 loss 或 loss_weight。要在每个输出上使用相同的损失，将传递一个单一的损失参数（loss）。

model.compile(optimizer='rmsprop', loss='binary_crossentropy',
              loss_weights=[1., 0.2])

接下来我们将通过传递输入数组和目标数组的列表来训练我们的模型。

headline_data = np.round(np.abs(np.random.rand(12, 100) * 100))
additional_data = np.random.randn(12, 5)
headline_labels = np.random.randn(12, 1)
additional_labels = np.random.randn(12, 1)
model.fit([headline_data, additional_data], [headline_labels, additional_labels],
          epochs=50, batch_size=32) 

由于我们已经命名了输入和输出，因此该模型将按如下方式编译；

model.compile(optimizer='rmsprop',
              loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1., 'aux_output': 0.2})

# And train it through:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
          {'main_output': headline_labels, 'aux_output': additional_labels},
          epochs=50, batch_size=32)

该模型可以通过以下方式进行推断；

或者，

共享层

要考虑的另一个例子是为了理解函数式 API 模型，将会是共享层。为此，我们将检查推特的数据库。由于我们愿意组成这样一个模型，它可以确定两个推文是否属于同一个人，这将使一个实例可以根据推文的相似性轻松地比较用户。

我们将构建一个模型，该模型将通过将两个推文编码为向量，然后将它们连接起来，然后我们将包括逻辑回归。该模型将输出两个推文属于同一个人的概率。接下来，我们将用成对的正面和负面推文来训练我们的模型。

由于这里我们选择的问题是对称的，我们的机制必须重用第一个编码的推文，以便对另一个推文进行编码，为此我们将使用一个共享的 LSTM 层。

要使用函数式 API 构建此模型，我们将输入一个形状为 (280,256) 的推文二进制矩阵。这里 280 是大小为 256 的向量序列，使得每个 256 维向量将编码一个字符的存在与否。

import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model

tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))

接下来我们将输入一个层，然后根据需要将其调用在各种输入上，以便我们可以在多个输入上共享一个层。

# The layer takes an input as a matrix and will return a vector of size 64
shared_lstm = LSTM(64)

# When we reuse the same layer instance multiple times, the weights of the layer will also be reused (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)

# Next we will concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)

# And then we will add logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)

# After that we will define a trainable model by linking the tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)

现在为了理解如何读取共享层的输出或输出形状，我们将简要地看一下 层“节点”的概念。

在调用任何输入上的层时，我们实际上是通过将一个节点附加到层并将输入张量链接到输出张量来生成新的张量。如果同一个层被调用多次，那么该层将拥有很多节点，这些节点将被索引为 0,1,2,..

为了获得层实例的张量输出，我们在旧版本的 Keras 中使用了 layer.get_output()，而对于其输出形状，我们使用了 layer.output_shape。但现在 get_output() 已经被输出替换。

只要一个层连接到一个输入，该层就会返回该层的一个输出。

 a = Input(shape=(280, 256))

lstm = LSTM(32)
encoded_a = lstm(a)

assert lstm.output == encoded_a

如果该层包含多个输入；

 a = Input(shape=(280, 256))
b = Input(shape=(280, 256))

lstm = LSTM(32)
encoded_a = lstm(a)
encoded_b = lstm(b)

lstm.output

输出

>> AttributeError: Layer lstm_1 has multiple inbound nodes,
hence the notion of "layer output" is ill-defined.
Use `get_output_at(node_index)` instead.

现在以下内容将执行它；

assert lstm.get_output_at(0) == encoded_a
assert lstm.get_output_at(1) == encoded_b

因此，字符（例如 input_shape 和 output_shape）也一样。如果一个层包含单独的层或所有节点都具有相似的输入/输出，只有这样我们才能说“层输入/输出形状”的概念被完全定义，并且该形状将被 layer.output_shape/ layer.input_shape 返回。

如果我们对形状为 (32, 32, 3) 的输入应用 conv2D 层，然后对 (64, 64, 3) 应用 conv2D 层，那么该层将包含几种输入/输出形状。为了获取它们，我们将需要指定它们所属节点的索引。

a = Input(shape=(32, 32, 3))
b = Input(shape=(64, 64, 3))

conv = Conv2D(16, (3, 3), padding='same')
conved_a = conv(a)

# Only one input so far, the following will work:
assert conv.input_shape == (None, 32, 32, 3)
	
conved_b = conv(b)
# now the "input_shape" property wouldn't work, but this does:
assert conv.get_input_shape_at(0) == (None, 32, 32, 3)
assert conv.get_input_shape_at(1) == (None, 64, 64, 3)

下一个主题Keras 层

我们提供所有技术（如 Java 教程、Android、Java 框架）的教程和面试问题

联系信息

G-13, 2nd Floor, Sec-3, Noida, UP, 201301, India

hr@tpointtech.com

+91-9599086977

关注我们

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Keras 教程

Keras 模型

Keras 层

深度学习库

Keras 模型

Keras Sequential 模型

开始使用 Keras Sequential 模型

指定输入形状

编译

训练

用于序列分类的堆叠 LSTM

相同的堆叠 LSTM 模型，渲染为“有状态”

Keras 函数式 API

第一个例子：一个密集连接的网络

所有模型都是可调用的，就像层一样

多输入和多输出模型

共享层

相关帖子

Keras 层

在 Anaconda 中安装 Keras 库

Keras 后端

Keras 教程

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器