Интуитивное понимание 1D, 2D и 3D сверток в сверточных нейронных сетях

Может ли кто-нибудь, пожалуйста, четко объяснить разницу между 1D, 2D и 3D свертками в CNN (Deep Learning) с примерами?

Ответы

Ответ 1

Я хочу объяснить с изображением C3D.

В двух словах важно сверточное направление и форма вывода!

↑↑↑↑↑ 1D Convolutions - Basic ↑↑↑↑↑

только 1 - направление (ось времени) для вычисления conv
input = [W], filter = [k], output = [W]
ex) input = [1,1,1,1,1], filter = [0.25,0.5,0.25], output = [1,1,1,1,1]
output-shape - 1D-массив
пример) сглаживание графика

tf.nn.conv1d код Пример игры

import tensorflow as tf
import numpy as np

sess = tf.Session()

ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1

in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)

in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])

input_1d   = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print sess.run(output_1d)

↑↑↑↑↑ 2D Convolutions - Basic ↑↑↑↑↑

2 - направление (x, y) для вычисления conv
Выходная форма 2D Матрица
input = [W, H], filter = [k, k] output = [W, H]
пример) Sobel Egde Fllter

tf.nn.conv2d - Пример игры

ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d   = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

↑↑↑↑↑ 3D Convolutions - Basic ↑↑↑↑↑

3 - направление (x, y, z) для вычисления conv
Выходная форма 3D Громкость
input = [W, H, L], filter = [k, k, d] output = [W, H, M]
d < L важно! для вывода объемного объема
пример) C3D

tf.nn.conv3d - Пример игры

ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d   = tf.reshape(in_3d, [1, in_depth, in_height, in_depth, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print sess.run(output_3d)

↑↑↑↑↑ 2D-свертки с трехмерным входом - LeNet, VGG,..., ↑↑↑↑↑

Несмотря на то, что ввод представляет собой 3D ex) 224x224x3, 112x112x32
Форма вывода не 3D Громкость, но 2D Матрица
потому что глубина фильтра = L должна совпадать с входными каналами = L
2 - направление (x, y) для вычисления conv! не 3D
input = [W, H, L], filter = [k, k, L] output = [W, H]
Выходная форма 2D Матрица
что, если мы хотим обучить N фильтров (N - количество фильтров)
тогда форма вывода (сложена 2D) 3D = 2D x N.

conv2d - LeNet, VGG,... для 1 фильтра

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels)) 
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

conv2d - LeNet, VGG,... для N фильтров

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

↑↑↑↑↑ Бонус 1x1 conv в CNN - GoogLeNet,..., ↑↑↑↑↑

1x1 conv запутан, когда вы думаете, что это как фильтр 2D-изображений, например sobel
для 1x1 conv в CNN, вход 3D-формы, как показано на рисунке выше.
он вычисляет глубинную фильтрацию
input = [W, H, L], filter = [1,1, L] output = [W, H]
Выходная сложная форма 3D = 2D x N.

tf.nn.conv2d - специальный случай 1x1 conv

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

Анимация (2D-конвертер с 3D-входами)

- Оригинальная ссылка: ССЫЛКА
- Автор: Мартин Гёрнер
- Twitter: @martin_gorner
- Google +: plus.google.com/+MartinGorne

Бонус 1D Сверты с двумерным входом

↑↑↑↑↑ 1D Convolutions с 1D-входом ↑↑↑↑↑

↑↑↑↑↑ 1D Convolution с двумерным входом ↑↑↑↑↑

Введенное значение представляет собой 2D ex) 20x14
Форма вывода не 2D, но 1D Матрица
потому что высота фильтра = L должна соответствовать высоте ввода = L
1 - направление (x) для вычисления conv! не 2D
input = [W, L], filter = [k, L] output = [W]
Выходная форма 1D Матрица
что, если мы хотим обучить N фильтров (N - количество фильтров)
тогда форма вывода (сложена 1D) 2D = 1D x N.

Бонус C3D

in_channels = 32 # 3, 32, 64, 128, ... 
out_channels = 64 # 3, 32, 64, 128, ... 
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]

in_4d = tf.constant(ones_4d, dtype=tf.float32)
filter_5d = tf.constant(weight_5d, dtype=tf.float32)

in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])
filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])

input_4d   = tf.reshape(in_4d, [1, in_depth, in_height, in_depth, in_channels])
kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels])

output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME')
print sess.run(output_4d)

sess.close()

Вход и выход в Tensorflow

Резюме

Ответ 2

1, CNN 1D, 2D или 3D относится к направлению свертки, а не к входному или фильтрующему размеру. 2. Для 1-канального входа CNN2D, равный CNN1D, представляет собой длину ядра = длину входа. (1 направление движения)