自动编码器

2025年3月17日 | 阅读16分钟

自动编码器是一种神经网络，它学习输入数据的稀疏表示。换句话说，一旦在充分的训练数据上进行训练，自动编码器就可以用来生成输入数据点的压缩副本，这些副本保留了输入的大部分信息（特征），但只使用了少得多的信息比特。

自动编码器是一种神经网络，它包含三个组成部分：编码函数（将数据转换为一个更简单的空间，例如一个节点数量少于输入层的隐藏层），解码函数（反转这个过程，例如一个节点数量与输入层相同的输出层），以及一个距离度量（将损失衡量为原始输入与学习表示之间的距离）。受限玻尔兹曼机是第一种自动编码器，它们在深度学习历史上的重要性足以拥有自己的名称。

自动编码器是与更简单的变量压缩算法（如 PCA 和 LDA）相对应的神经网络。自动编码器通常用于深度学习应用中的预训练，它涉及提前为神经网络设置“正确”的权重，以便算法不需要像从完全随机的权重开始时那样努力工作才能收敛。实际上，更快速的收敛算法的发展已经消除了预训练的必要性和价值。

因此，自动编码器已不再用于前沿的深度学习研究。在商业应用中，它们也无法超越更简单的（通常是信息论的）压缩技术，如 JPEG 和 MG3。它们现在主要用作高维数据在输入到 T-SNE 算法之前的预处理步骤。

自动编码器架构

典型的自动编码器架构包含三个关键组成部分

编码器架构： 编码器架构由一系列节点数量递减的层组成，最终导向一个潜在视图表示。
潜在视图表示： 潜在视图代表输入被压缩但信息得以保留的最低级空间。
解码器架构： 解码器架构是编码器设计的镜像，但每层的节点数量增加，从而产生一个与输入相似（几乎）的输出。

一个高度精调的自动编码器模型应该能够重建在第一层输入的相同输入。

自动编码器的工作原理

让我们来探索自动编码器背后的数学原理。自动编码器的主要思想是学习高维输入的低级表示。让我们尝试通过一个例子来理解编码过程。考虑一个数据表示空间（用于表示数据的 N 维空间），并考虑由两个变量 x1 和 x2 表示的数据点。数据流形是在数据表示空间中真实数据存在的区域。

from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
import numpy as np
init_notebook_mode(connected=True)

## generate random data
N = 50
random_x = np.linspace(2, 10, N)
random_y1 = np.linspace(2, 10, N)
random_y2 = np.linspace(2, 10, N)

trace1 = go.Scatter(x = random_x, y = random_y1, mode="markers", name="Actual Data")
trace2 = go.Scatter(x = random_x, y = random_y2, mode="lines", name="Model")
layout = go.Layout(title="2D Data Repersentation Space", xaxis=dict(title="x2", range=(0,12)), 
                   yaxis=dict(title="x1", range=(0,12)), height=400, 
                   annotations=[dict(x=5, y=5, xref='x', yref='y', text='This 1D line is the Data Manifold (where data resides)',
                   showarrow=True, align='center', arrowhead=2, arrowsize=1, arrowwidth=2, arrowcolor='#636363',
                   ax=-120, ay=-30, bordercolor='#c7c7c7', borderwidth=2, borderpad=4, bgcolor='orange', opacity=0.8)])
figure = go.Figure(data = [trace1], layout = layout)
iplot(figure)

输出

现在我们使用两个维度来表示这些数据：X 和 Y。然而，这个空间的维度可以减小到更低的维度，例如 1D。如果我们能够定义以下内容

直线 A 上的参考点
与水平轴的夹角 L

那么线上任意一点，例如 B，都可以通过从 A 点开始的距离“d”和角度 L 来表示。

random_y3 = [2 for i in range(100)]
random_y4 = random_y2 + 1
trace4 = go.Scatter(x = random_x[4:24], y = random_y4[4:300], mode="lines")
trace3 = go.Scatter(x = random_x, y = random_y3, mode="lines")
trace1 = go.Scatter(x = random_x, y = random_y1, mode="markers")
trace2 = go.Scatter(x = random_x, y = random_y2, mode="lines")
layout = go.Layout(xaxis=dict(title="x1", range=(0,12)), yaxis=dict(title="x2", range=(0,12)), height=400,
                   annotations=[dict(x=2, y=2, xref='x', yref='y', text='A', showarrow=True, align='center', arrowhead=2, arrowsize=1, arrowwidth=2, 
                                     arrowcolor='#636363', ax=20, ay=-30, bordercolor='#c7c7c7', borderwidth=2, borderpad=4, bgcolor='orange', opacity=0.8), 
                                dict(x=6, y=6, xref='x', yref='y', text='B', showarrow=True, align='center', arrowhead=2, arrowsize=1, arrowwidth=2, arrowcolor='#636363',
                                     ax=20, ay=-30, bordercolor='#c7c7c7', borderwidth=2, borderpad=4, bgcolor='yellow', opacity=0.8), dict(
                                     x=4, y=5, xref='x', yref='y',text='d', ay=-40), 
                                dict(x=2, y=2, xref='x', yref='y', text='angle L', ax=80, ay=-10)], title="2D Data Repersentation Space", showlegend=False)
data = [trace1, trace2, trace3, trace4]
figure = go.Figure(data = data, layout = layout)
iplot(figure)

#################

random_y3 = [2 for i in range(100)]
random_y4 = random_y2 + 1
trace4 = go.Scatter(x = random_x[4:24], y = random_y4[4:300], mode="lines")
trace3 = go.Scatter(x = random_x, y = random_y3, mode="lines")
trace1 = go.Scatter(x = random_x, y = random_y1, mode="markers")
trace2 = go.Scatter(x = random_x, y = random_y2, mode="lines")
layout = go.Layout(xaxis=dict(title="u1", range=(1.5,12)), yaxis=dict(title="u2", range=(1.5,12)), height=400,
                   annotations=[dict(x=2, y=2, xref='x', yref='y', text='A', showarrow=True, align='center', arrowhead=2, arrowsize=1, arrowwidth=2, 
                                     arrowcolor='#636363', ax=20, ay=-30, bordercolor='#c7c7c7', borderwidth=2, borderpad=4, bgcolor='orange', opacity=0.8), 
                                dict(x=6, y=6, xref='x', yref='y', text='B', showarrow=True, align='center', arrowhead=2, arrowsize=1, arrowwidth=2, arrowcolor='#636363',
                                     ax=20, ay=-30, bordercolor='#c7c7c7', borderwidth=2, borderpad=4, bgcolor='yellow', opacity=0.8), dict(
                                     x=4, y=5, xref='x', yref='y',text='d', ay=-40), 
                                dict(x=2, y=2, xref='x', yref='y', text='angle L', ax=80, ay=-10)], title="Latent Distance View Space", showlegend=False)
data = [trace1, trace2, trace3, trace4]
figure = go.Figure(data = data, layout = layout)
iplot(figure)

输出

但是，这里关键的问题是，可以用什么逻辑或规则来根据 A 和角度 L 来表达点 B？答案很简单：没有固定的方程，但无监督学习过程可以产生最好的方程。简单来说，学习过程就是一个将 B 转换为 A 和 L 的公式或方程。让我们从自动编码器的角度来看待这个问题。

考虑没有隐藏层的自动编码器；输入 x1 和 x2 被解码为较低的表示 d，然后 d 被投影回 x1 和 x2。

现在我们将探讨稀疏自动编码器、堆叠自动编码器和变分自动编码器。我们还将通过使用 TSNE 将自动编码器的潜在编码映射到二维来可视化它们。这将帮助我们识别数据中的唯一簇。

代码

导入库

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
from tensorflow import keras
import tensorflow as tf
import matplotlib.pyplot as plt
import tensorflow.keras.backend as K
from sklearn.manifold import TSNE

读取数据集

train  =  pd.read_csv(r'/kaggle/input/mnist-in-csv/mnist_train.csv')
test  =  pd.read_csv(r'/kaggle/input/mnist-in-csv/mnist_test.csv')

输出

y_train = train.label
X_train =train.drop('label', axis=1)/255

X_test = test.drop('label', axis = 1)/255
y_test = test['label']

X_train = X_train.values.reshape(X_train.shape[0], 28, 28)
X_test = X_test.values.reshape(X_test.shape[0], 28, 28)

堆叠自动编码器

它是一种神经网络，由许多层自动编码器堆叠而成。自动编码器是用于学习特征和降维的无监督学习算法。

stacked_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape = (28, 28)),
    keras.layers.Dense(128, activation = 'selu'),
    keras.layers.Dense(32, activation = 'selu'),
])

stacked_decoder = keras.models.Sequential([
    keras.layers.Dense(128, activation = 'selu', input_shape = (32, )),
    keras.layers.Dense(28*28, activation = 'sigmoid'),
    keras.layers.Reshape([28, 28])
])

stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])

stacked_ae.compile(loss = 'binary_crossentropy', optimizer = keras.optimizers.SGD(lr=.1))

history  =  stacked_ae.fit(X_train, X_train, epochs = 25, validation_data = (X_test, X_test))

输出

我们将从测试数据中进行重建

to_predict = X_test[:6]
prediction = stacked_ae.predict(to_predict)

def visualize_predictions(predictions, data):
    fig, axes = plt.subplots(2, predictions.shape[0], figsize = (predictions.shape[0]*5, 5))
    for i, ax in zip(range(predictions.shape[0]), axes[0, :]):
        ax.imshow(predictions[i], cmap = 'Greys')
        ax.set_title(y_test[i])
    for i, ax in zip(range(predictions.shape[0]), axes[1, :]):
        ax.imshow(X_test[i], cmap = 'Greys')
        ax.set_title(y_test[i])    
    return plt

visualize_predictions(prediction, to_predict)

输出

现在，我们将潜在表示映射到二维。

# Select 1000 samples from the data
X_test_sampled = X_test[:1000, :]
y_test_sampled = y_test.iloc[:1000]

# Visualizing the Latent representation - Taking the average of the volume
x_compressed = stacked_encoder.predict(X_test_sampled).mean(axis = -1).reshape(1000, -1)

# Use TSNE
tsne = TSNE(n_jobs=-1)
X_compressed_2d  =  tsne.fit_transform(x_compressed)

# Plot in scatter plot and color by digit name
plt.figure(figsize = (12, 7))
cmap = plt.get_cmap('RdBu', 10)
sc = plt.scatter(X_compressed_2d[:, 0], X_compressed_2d[:, 1], c = y_test_sampled, alpha = .85, cmap = cmap)
cax = plt.colorbar(sc, ticks=np.arange(0,10))
plt.xlabel('tsne 1')
plt.ylabel('tsne 2')
#plt.colorbar(sc)

输出

它重建得很好，但我们可以添加卷积来提高质量。首先，我们将通过创建稀疏自动编码器来添加正则化。这种正则化可以应用于编码层。

稀疏自动编码器

它是一种神经网络，其功能类似于常规自动编码器，但具有额外的约束，以在学习的表示中生成稀疏性。稀疏自动编码器用于特征学习和降维任务，目标是学习输入数据的紧凑且稀疏的表示。

在稀疏自动编码器中，我们通过向中间层的激活添加 L1 惩罚来限制中间层的激活稀疏。这意味着——中间层的许多激活将为零——自动编码器将被迫仅将非零值分配给数据最重要的属性。

stacked_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape = (28, 28)),
    keras.layers.Dense(128, activation = 'selu'),
    keras.layers.Dense(32, activation = 'selu'),
    keras.layers.ActivityRegularization(l1=1e-3)
])

stacked_decoder = keras.models.Sequential([
    keras.layers.Dense(128, activation = 'selu', input_shape = (32, )),
    keras.layers.Dense(28*28, activation = 'sigmoid'),
    keras.layers.Reshape([28, 28])
])

stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])

stacked_ae.compile(loss = 'binary_crossentropy', optimizer = keras.optimizers.SGD(lr=.1))

history  =  stacked_ae.fit(X_train, X_train, epochs = 30, validation_data = (X_test, X_test))

输出

to_predict = X_test[:6]
prediction = stacked_ae.predict(to_predict)

def visualize_predictions(predictions, data):
    fig, axes = plt.subplots(2, predictions.shape[0], figsize = (predictions.shape[0]*5, 5))
    for i, ax in zip(range(predictions.shape[0]), axes[0, :]):
        ax.imshow(predictions[i], cmap = 'Greys')
        ax.set_title(y_test[i])
    for i, ax in zip(range(predictions.shape[0]), axes[1, :]):
        ax.imshow(X_test[i], cmap = 'Greys')
        ax.set_title(y_test[i])    
    return plt

visualize_predictions(prediction, to_predict)

输出

现在我们将通过将 TSNE 应用于潜在表示来查看自动编码器编码的潜在表示。

# Select 1000 samples from the data
X_test_sampled = X_test[:1000, :]
y_test_sampled = y_test.iloc[:1000]

# Visualizing the Latent representation - Taking the average of the volume
x_compressed = stacked_encoder.predict(X_test_sampled).mean(axis = -1).reshape(1000, -1)

# Use TSNE
tsne = TSNE(n_jobs=-1)
X_compressed_2d  =  tsne.fit_transform(x_compressed)

# Plot in scatter plot and color by digit name
plt.figure(figsize = (12, 7))
cmap = plt.get_cmap('RdBu', 10)
sc = plt.scatter(X_compressed_2d[:, 0], X_compressed_2d[:, 1], c = y_test_sampled, alpha = .85, cmap = cmap)
cax = plt.colorbar(sc, ticks=np.arange(0,10))
plt.xlabel('tsne 1')
plt.ylabel('tsne 2')
#plt.colorbar(sc)

输出

使用 CNN 构建自动编码器

y_train = train.label
X_train =train.drop('label', axis=1)/255

X_test = test.drop('label', axis = 1)/255
y_test = test['label']

X_train = X_train.values.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.values.reshape(X_test.shape[0], 28, 28, 1)

stacked_encoder = keras.models.Sequential([
    keras.layers.Conv2D(32, kernel_size = (3,3), activation = 'selu', input_shape = (28, 28, 1), padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', padding = 'same'),

])

stacked_decoder = keras.models.Sequential([
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', input_shape = (7,7,8), padding = 'same'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(32, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.Conv2D(1, kernel_size = (3,3), activation = 'sigmoid', padding = 'same'),

    
])

stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])

stacked_ae.compile(loss = 'binary_crossentropy', optimizer = keras.optimizers.SGD(lr=.1))

history  =  stacked_ae.fit(X_train, X_train, epochs = 10, validation_data = (X_test, X_test))

输出

to_predict = X_test[:6]
prediction = stacked_ae.predict(to_predict)

def visualize_predictions(predictions, data):
    fig, axes = plt.subplots(2, predictions.shape[0], figsize = (predictions.shape[0]*5, 5))
    for i, ax in zip(range(predictions.shape[0]), axes[0, :]):
        ax.imshow(predictions[i], cmap = 'Greys')
        ax.set_title(y_test[i])
    for i, ax in zip(range(predictions.shape[0]), axes[1, :]):
        ax.imshow(X_test[i], cmap = 'Greys')
        ax.set_title(y_test[i])    
    return plt

visualize_predictions(prediction, to_predict)

输出

现在我们将可视化数据的低维表示——使用编码器的输出并应用 TSNE。

# Select 1000 samples from the data
X_test_sampled = X_test[:1000, :]
y_test_sampled = y_test.iloc[:1000]

# Visualizing the Latent representation - Taking the average of the volume
x_compressed = stacked_encoder.predict(X_test_sampled).mean(axis = -1).reshape(1000, -1)

# Use TSNE
tsne = TSNE(n_jobs=-1)
X_compressed_2d  =  tsne.fit_transform(x_compressed)

# Plot in scatter plot and color by digit name
plt.figure(figsize = (12, 7))
cmap = plt.get_cmap('RdBu', 10)
sc = plt.scatter(X_compressed_2d[:, 0], X_compressed_2d[:, 1], c = y_test_sampled, alpha = .85, cmap = cmap)
cax = plt.colorbar(sc, ticks=np.arange(0,10))
plt.xlabel('tsne 1')
plt.ylabel('tsne 2')
#plt.colorbar(sc)

输出

簇似乎非常清晰。4 的数字离 9 的数字相当近，这很有道理，因为它们的顶部是相似的。同样，9 和 7 的数字也显得接近，这也有道理，因为它们的结构相似。因此，我们的隐藏表示是有意义的。

另一个卷积自动编码器（使用不同的滤波器）

stacked_encoder = keras.models.Sequential([
    keras.layers.Conv2D(16, kernel_size = (3,3), activation = 'selu', input_shape = (28, 28, 1), padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.Conv2D(32, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.Conv2D(64, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
])

stacked_decoder = keras.models.Sequential([
    keras.layers.Conv2D(64, kernel_size = (3,3), activation = 'selu', input_shape = (7,7,64), padding = 'same'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(32, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(16, kernel_size = (3,3), activation = 'selu', padding = 'valid'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(1, kernel_size = (3,3), activation = 'sigmoid', padding = 'same'),    
])

stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])
optimizer = keras.optimizers.SGD(lr=.1)
optimizer = 'adam'
stacked_ae.compile(loss = 'binary_crossentropy', optimizer = optimizer)

history  =  stacked_ae.fit(X_train, X_train, epochs = 10, validation_data = (X_test, X_test))

输出

# Select 1000 samples from the data
X_test_sampled = X_test[:1000, :]
y_test_sampled = y_test.iloc[:1000]

# Visualizing the Latent representation - Taking the average of the volume
x_compressed = stacked_encoder.predict(X_test_sampled).mean(axis = -1).reshape(1000, -1)

# Use TSNE
tsne = TSNE(n_jobs=-1)
X_compressed_2d  =  tsne.fit_transform(x_compressed)


# Plot in scatter plot and color by digit name
plt.figure(figsize = (12, 7))
cmap = plt.get_cmap('RdBu', 10)
sc = plt.scatter(X_compressed_2d[:, 0], X_compressed_2d[:, 1], c = y_test_sampled, alpha = .85, cmap = cmap)
cax = plt.colorbar(sc, ticks=np.arange(0,10))
plt.xlabel('tsne 1')
plt.ylabel('tsne 2')
#plt.colorbar(sc)

输出

稀疏自动编码器 - 使用卷积

stacked_encoder = keras.models.Sequential([
    keras.layers.Conv2D(16, kernel_size = (3,3), activation = 'selu', input_shape = (28, 28, 1), padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.Conv2D(32, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.Conv2D(64, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.ActivityRegularization(l1=1e-1)
])

stacked_decoder = keras.models.Sequential([
    keras.layers.Conv2D(64, kernel_size = (3,3), activation = 'selu', input_shape = (7,7,64), padding = 'same'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(32, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(16, kernel_size = (3,3), activation = 'selu', padding = 'valid'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(1, kernel_size = (3,3), activation = 'sigmoid', padding = 'same'),    
])

stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])
optimizer = keras.optimizers.SGD(lr=.1)
optimizer = 'adam'
stacked_ae.compile(loss = 'binary_crossentropy', optimizer = optimizer)

history  =  stacked_ae.fit(X_train, X_train, epochs = 10, validation_data = (X_test, X_test))

输出

# Select 1000 samples from the data
X_test_sampled = X_test[:1000, :]
y_test_sampled = y_test.iloc[:1000]

# Visualizing the Latent representation - Taking the average of the volume
x_compressed = stacked_encoder.predict(X_test_sampled).mean(axis = -1).reshape(1000, -1)

# Use TSNE
tsne = TSNE(n_jobs=-1)
X_compressed_2d  =  tsne.fit_transform(x_compressed)


# Plot in scatter plot and color by digit name
plt.figure(figsize = (12, 7))
cmap = plt.get_cmap('RdBu', 10)
sc = plt.scatter(X_compressed_2d[:, 0], X_compressed_2d[:, 1], c = y_test_sampled, alpha = .85, cmap = cmap)
cax = plt.colorbar(sc, ticks=np.arange(0,10))
plt.xlabel('tsne 1')
plt.ylabel('tsne 2')
#plt.colorbar(sc)

输出

4 的数字离 9 的数字相当近，这很有道理，因为它们的顶部是相似的。同样，9 和 7 的数字也显得接近，这也有道理，因为它们的结构相似。因此，我们的隐藏表示是有意义的。

去噪自动编码器

目标是用噪声污染训练数据，然后使用自动编码器对其进行恢复。在此方法中，自动编码器学会对数据进行去噪，从而使其能够理解数据的关键属性。

使用随机噪声去噪

X_train_noisy = X_train + np.random.normal(0, .05, (X_train.shape[0], 28, 28, 1))
X_test_noisy = X_test + np.random.normal(0, .05, (X_test.shape[0], 28, 28, 1))

# Plot one noisy data
plt.imshow(X_train_noisy[10].reshape(28, 28), cmap = 'Greys')

输出

stacked_encoder = keras.models.Sequential([
    keras.layers.Conv2D(32, kernel_size = (3,3), activation = 'selu', input_shape = (28, 28, 1), padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', padding = 'same'),

])

stacked_decoder = keras.models.Sequential([
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', input_shape = (7,7,8), padding = 'same'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(32, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.Conv2D(1, kernel_size = (3,3), activation = 'sigmoid', padding = 'same'),

    
])

stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])

stacked_ae.compile(loss = 'binary_crossentropy', optimizer = keras.optimizers.SGD(lr=.01))

history  =  stacked_ae.fit(X_train_noisy, X_train, batch_size = 128, epochs = 10,
                           validation_data = (X_test_noisy, X_test))

输出

# Select 1000 samples from the data
X_test_sampled = X_test[:1000, :]
y_test_sampled = y_test.iloc[:1000]

# Visualizing the Latent representation - Taking the average of the volume
x_compressed = stacked_encoder.predict(X_test_sampled).mean(axis = -1).reshape(1000, -1)

# Use TSNE
tsne = TSNE(n_jobs=-1)
X_compressed_2d  =  tsne.fit_transform(x_compressed)


# Plot in scatter plot and color by digit name
plt.figure(figsize = (12, 7))
cmap = plt.get_cmap('RdBu', 10)
sc = plt.scatter(X_compressed_2d[:, 0], X_compressed_2d[:, 1], c = y_test_sampled, alpha = .85, cmap = cmap)
cax = plt.colorbar(sc, ticks=np.arange(0,10))
plt.xlabel('tsne 1')
plt.ylabel('tsne 2')
#plt.colorbar(sc)

输出

我们观察到 9、7 和 4 之间的重叠，正如我们之前描述的。我们还看到 2 和 7 之间有轻微的重叠，这是合理的。去噪使神经网络能够正确地编码输入中最关键的方面。由于数据有噪声，因此无法依赖不重要的特征。

去噪自动编码器不仅通过关注数据最相关的方面来改进降维，而且由于它们经过有噪声数据的训练以恢复原始数据，因此还可以用于数据去噪。

使用 Dropout 去噪

在这里，我们将 dropout 应用于输入像素，并让网络重建它们。这是另一种类型的去噪自动编码器。我们期望在此过程结束时，网络将学会如何重建缺失的像素。

stacked_encoder = keras.models.Sequential([
    keras.layers.Dropout(.1),
    keras.layers.Conv2D(32, kernel_size = (3,3), activation = 'selu', input_shape = (28, 28, 1), padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.MaxPooling2D(2, 2, padding = 'same'),
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', padding = 'same'),

])

stacked_decoder = keras.models.Sequential([
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', input_shape = (7,7,8), padding = 'same'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(8, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.UpSampling2D((2,2)),
    keras.layers.Conv2D(32, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    keras.layers.Conv2D(1, kernel_size = (3,3), activation = 'sigmoid', padding = 'same'),

    
])

stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])

stacked_ae.compile(loss = 'binary_crossentropy', optimizer = keras.optimizers.SGD(lr=.01))

history  =  stacked_ae.fit(X_train, X_train, batch_size = 128, epochs = 10,
                           validation_data = (X_test, X_test))

输出

# Select 1000 samples from the data
X_test_sampled = X_test[:1000, :]
y_test_sampled = y_test.iloc[:1000]

# Visualizing the Latent representation - Taking the average of the volume
x_compressed = stacked_encoder.predict(X_test_sampled).mean(axis = -1).reshape(1000, -1)

# Use TSNE
tsne = TSNE(n_jobs=-1)
X_compressed_2d  =  tsne.fit_transform(x_compressed)


# Plot in scatter plot and color by digit name
plt.figure(figsize = (12, 7))
cmap = plt.get_cmap('RdBu', 10)
sc = plt.scatter(X_compressed_2d[:, 0], X_compressed_2d[:, 1], c = y_test_sampled, alpha = .85, cmap = cmap)
cax = plt.colorbar(sc, ticks=np.arange(0,10))
plt.xlabel('tsne 1')
plt.ylabel('tsne 2')
#plt.colorbar(sc)

输出

去噪自动编码器改善了簇之间的区分度。它显示了簇之间的重叠，例如 4、9 和 3、8 之间的重叠，这是合理的。

生成建模

我们最初构建的自动编码器作为生成模型的性能不佳。因此，如果我们从潜在空间中采样随机向量，我们很可能会得到一个看起来不像 0-9 中任何数字的图像。常规自动编码器在执行异常检测方面效果很好，但作为生成模型效果不佳。

要构建一个好的生成模型，我们需要一个变分自动编码器。

stacked_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape = (28, 28)),
    keras.layers.Dense(128, activation = 'selu'),
    keras.layers.Dense(32, activation = 'selu'),
])

stacked_decoder = keras.models.Sequential([
    keras.layers.Dense(128, activation = 'selu', input_shape = (32, )),
    keras.layers.Dense(28*28, activation = 'sigmoid'),
    keras.layers.Reshape([28, 28])
])

stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])

stacked_ae.compile(loss = 'binary_crossentropy', optimizer = keras.optimizers.SGD(lr=.1))

history  =  stacked_ae.fit(X_train, X_train, epochs = 25, validation_data = (X_test, X_test))

输出

让我们存储训练数据中存在的潜在空间的统计特性。

# Extract the minimum and maximum values of the latent space. 
latent = pd.DataFrame(stacked_encoder.predict(X_train), columns = [str(i) for i in range(32)])
mins = latent.min(axis = 0).values
maxs = latent.max(axis = 0).values
mean = latent.mean(axis = 0).values
stddev = latent.std(axis = 0).values
#print(latent.describe())

# Create some data by sampling at random from the latent space - from a normal distribution
codings = tf.random.normal(mean = mean, stddev = stddev, shape = [12, 32])
#codings = tf.random.uniform(minval = mins, maxval = maxs, shape = [12, 32])
images = stacked_decoder(codings).numpy()
def plot(mat):
    n = mat.shape[0]
    fig, axes = plt.subplots(1, n, figsize = (20, 4))
    for image, ax in zip(mat, axes):
        ax.imshow(image, cmap = 'Greys')
        
plot(images)

输出

正如我们所见，自动编码器作为生成模型的性能不佳，因为生成的虚假图像看起来不自然。然而，我们可以看到它捕捉了大致的轮廓。

变分自动编码器 (VAE)

变分自动编码器更适合生成建模。我们可以利用它们来创建新数据。正如我们将看到的，VAE 创建的数据看起来会更逼真。

y_train = train.label
X_train =train.drop('label', axis=1)/255

X_test = test.drop('label', axis = 1)/255
y_test = test['label']

X_train = X_train.values.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.values.reshape(X_test.shape[0], 28, 28, 1)

class sampling(keras.layers.Layer):
    def call(self, inputs):
        mean, log_var = inputs
        return K.random_normal(tf.shape(log_var))*K.exp(log_var/2)+mean

########### Encoder Part ###########
codings_size = 10
inputs = keras.layers.Input(shape = [28, 28])
z = keras.layers.Flatten()(inputs)
z = keras.layers.Dense(150, activation = 'selu')(z)
z = keras.layers.Dense(100, activation = 'selu')(z)
codings_mean = keras.layers.Dense(codings_size)(z) #Mean Encoding 
codings_log_var = keras.layers.Dense(codings_size)(z) #LogVar Encoding
codings = sampling()([codings_mean,  codings_log_var])
variational_encoder = keras.Model(inputs = [inputs], 
                                  outputs = [codings_mean, codings_log_var, codings])

########### Decoder Part ###########
decoder_inputs = keras.layers.Input(shape=[codings_size])
x = keras.layers.Dense(100, activation='selu')(decoder_inputs)
x = keras.layers.Dense(150, activation='selu')(x)
x = keras.layers.Dense(28*28, activation='sigmoid')(x)
outputs = keras.layers.Reshape([28, 28])(x)
variational_decoder = keras.Model(inputs = [decoder_inputs], outputs = [outputs])

########### Formalize autoencoder Model ##########
_, _, codings = variational_encoder(inputs)
reconstructions = variational_decoder(codings)
variational_ae = keras.Model(inputs = [inputs], outputs = [reconstructions])

########## Define Latent Loss ##############
latent_loss = -0.5 *K.sum(1 + codings_log_var - K.exp(codings_log_var) - K.square(codings_mean), 
                          axis = -1)
variational_ae.add_loss(K.mean(latent_loss)/784.)
variational_ae.compile(loss='binary_crossentropy', optimizer='adam')

########## Fit Model ###########
history = variational_ae.fit(X_train, X_train, epochs=70, batch_size=256,
                            validation_data=(X_test, X_test))

输出

codings = tf.random.normal(shape = [12, codings_size])
images = variational_decoder(codings).numpy()
def plot(mat):
    n = mat.shape[0]
    fig, axes = plt.subplots(1, n, figsize = (20, 4))
    for image, ax in zip(mat, axes):
        ax.imshow(image, cmap = 'Greys')
        
plot(images)

输出

正如我们所见，变分自动编码器生成的这些图像比通过常规自动编码器生成的图像看起来更好。

下一个主题使用机器学习进行猫分类

自动编码器

自动编码器架构

自动编码器的工作原理

导入库

读取数据集

堆叠自动编码器

稀疏自动编码器

使用 CNN 构建自动编码器

另一个卷积自动编码器（使用不同的滤波器）

稀疏自动编码器 - 使用卷积

去噪自动编码器

使用随机噪声去噪

使用 Dropout 去噪

生成建模

变分自动编码器 (VAE)

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

机器学习

监督式学习

分类

杂项

相关教程

面试题

自动编码器

自动编码器架构

自动编码器的工作原理

导入库

读取数据集

堆叠自动编码器

稀疏自动编码器

使用 CNN 构建自动编码器

另一个卷积自动编码器（使用不同的滤波器）

稀疏自动编码器 - 使用卷积

去噪自动编码器

使用随机噪声去噪

使用 Dropout 去噪

生成建模

变分自动编码器 (VAE)

相关帖子

机器学习中的数据分析

降维技术

产品推荐机器学习

使用 scikit-learn 的 train_test_split() 分割数据集

机器学习在医疗保健领域的应用

传统特征工程模型

聚类算法的评估指标

线性回归与逻辑回归

学习人工智能和机器学习的先决条件

机器学习中的数据可视化工具

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器