Pooling

Pooling#

The images that we work with are often very high in resolution. However, this can introduce a problem for convolutional neural networks as there is a need for large convolutional filters with many parameters. To reduce both computational load and the number of parameters, pooling is often used. This reduces the size of the feature maps (the images) while preserving the most important features. As well as helping with the computational cost of training, the use of pooling can also make the network more robust to overfitting by removing unnecessary details and making the network more robust to small changes in the image, so-called shift-invariant.

How Does Pooling Work?#

Pooling works by sliding a window of some given size over the data and extracting a single value for each part. For example, consider the following 3×3 matrix,

\[\begin{split} \begin{bmatrix} 3 & 4 & 1 \\ 1 & 2 & 9 \\ 4 & 2 & 3 \end{bmatrix}. \end{split}\]

If we pass a 2\times;2 window over this, the first set of values would be those shown in bold below,

\[\begin{split} \begin{bmatrix} \mathbf{3} & \mathbf{4} & 1 \\ \mathbf{1} & \mathbf{2} & 9 \\ 4 & 2 & 3 \end{bmatrix}. \end{split}\]

Then, what is the reduction method for the pooling? The most popular pooling is probably maximum pooling, where the maximum value is extracted, but other approaches exist, such as mean pooling. In maximum pooling, once the maximum value in the window is found, it is slid right and down as appropriate. The result for the above matrix with maximum pooling would be,

\[\begin{split} \begin{bmatrix} 4 & 9 \\ 4 & 9 \end{bmatrix}. \end{split}\]

Implementation in Python#

The implementation of a maximum pooling in Python is shown below.

import numpy as np

def max_pooling(feature_map, size=1, stride=1):
    h, w = feature_map.shape
    pooled_height = h // stride
    pooled_width = w // stride
    pooled = np.zeros((pooled_height, pooled_width))

    for i in range(0, h, stride):
        for j in range(0, w, stride):
            pooled[i // stride, j // stride] = np.max(feature_map[i:i+size, j:j+size])

    return pooled

We can apply this to the cute dog image from earlier.

import matplotlib.pyplot as plt

pepe = np.loadtxt('../data/pepe.txt')
pepe_pooled = max_pooling(pepe, size=20)

fig, ax = plt.subplots(1, 2, figsize=(10, 5))

ax[0].imshow(pepe, cmap='gray')
ax[1].imshow(pepe_pooled, cmap='gray')
ax[0].axis('off')
ax[1].axis('off')
ax[0].set_title('Original')
ax[1].set_title('Maximum Pooling')
plt.show()

../_images/06bc73ba81bfbb9eae3f2c5580a39b0de8252b1adc170d1c378873f162007acd.png

We can see that a lot of the fine detail is lost with the maximum pooling approach; the image appears to become pixelated.

Pooling

Contents

Pooling#

How Does Pooling Work?#

Implementation in Python#