Convolutional Neural Networks

Buse Köseoğlu
4 min readMar 21, 2021

Convolutional neural networks are a deep learning algorithm that can distinguish one photograph from another by taking a photograph as input and assigning weights to various objects and directions in this photograph. Today, it is used in many areas such as photo search services, autonomous vehicles, automatic video classification.

The reason for the convolution called “convolutional neural network” indicates that at least one of the layers of the network uses convolution instead of general matrix multiplication.

What is Convolution?

Convolution is a mathematical operation that takes two functions (f and g) to produce a third function (f * g) and express how one shape changes by the other. The convolution process refers to both the result function and the process of computing it. If we examine through the picture;

“x” is an input that is 2-dimensional representing the image and whose pixels are made up of different color channels.

“w” is the matrix that carries the weights to be applied to the input, also called the feature detector. It is called a “filter” or a “kernel”.

As a result of these mathematical operations, a “feature map” is formed.

Convolutional Layer

CNN’s most important block is the convolution layer. Neurons in the first convolutional layer are not linked to every pixel given in the introduction photo. Each neuron in the second layer is connected to a small part in the first layer. This structure allows the network to first concentrate on the low-level features on the first layer and then combine them with the higher-level features on the next layer.

Padding (Pixel Adding)

After convolution, there is a size difference between input and output. To fix this, extra pixels are added. When the input matrix size (n x n) and filter / kernel matrix size (f x f), we get a matrix of size (n-f + 1) x (n-f + 1) as a result of the convolution. For example, when we apply a filter (5 x 5) to a photo (16 x 16), the size we get after convolution becomes (12 x 12). As we can see from here, the photograph shrinks as a result of each convolution. Also, pixels in the corners and edges are used less.

Padding is done to prevent the problems described above. Padding simply adds a zero layer to the input image.

When the zero layer is added, the size of our image will be (n + 2p) x (n + 2p), and when we add a (f x f) size filter, we will get an image of (n + 2p — f + 1) x (n + 2p — f + 1). Where p is the added pixel size. If we examine the above example again, we will obtain an image (16 x 16) as a result.

There are two types of padding operations.

  1. Valid: Does not do padding.
  2. Same: “p” layer is added.

The following equation is used to determine the “p” value.

Stride

Stride operation is used to shift the matrix (filter) carrying the weights mentioned above on the image. The scrolling process is made pixel by pixel and the number of values ​​given to the stride parameter shifts the number of steps. This parameter directly affects the output size. For example, if we give the stride parameter a value of 2, the output size will be half the input size

Pooling

Pooling makes the output image smaller. It is a step used to reduce computational complexity. For example, at an input of 100x100, if the pool size is given 2, the output will be 50x50.

Maximum Pooling

Average Pooling

Turkish version.

References

Hands on Machine Learning with Scikit-Learn and Tensorflow

Deep Learning — Ian Goodfellow

https://www.geeksforgeeks.org/cnn-introduction-to-padding/

https://ayyucekizrak.medium.com/deri̇ne-daha-deri̇ne-evrişimli-sinir-ağları-2813a2c8b2a9

https://en.wikipedia.org/wiki/Convolution

https://medium.datadriveninvestor.com/convolutional-neural-network-cnn-simplified-ecafd4ee52c5

--

--