One Layer Neural Network From Scratch — Classification

7 min readJul 28, 2023

Classification is a supervised learning method. In supervised learning, we have labels for our data. Algorithms learn from these labels and make predictions accordingly. Classification aims to divide the dataset into classes. For example, let’s consider data of people applying for credit. Deciding whom to grant credit or not based on past granted/not granted credits is called binary classification.

In this article, we will build a single-layer artificial neural network from scratch for binary classification. But before that, we need to understand what artificial neural networks are and the mathematics behind them

Artificial Neural Networks

Artificial Neural Networks are an artificial intelligence method inspired by the human brain. They consist of interconnected neurons and layers, similar to the human brain. Each neuron performs specific mathematical operations, and we reach the final result in the output layer. They are very beneficial for problems involving non-linear and complex inputs and outputs.

Neural Network Structure for Classification Model

The input layer takes two inputs, X1 and X2. Bias (b) and Weights (W1, W2) are parameters and get updated during the training phase. The equation for z in the hidden layer is expanded as follows:

This equation goes into the sigmoid activation function. The sigmoid function is defined as follows:

The following graph belongs to the sigmoid function. The sigmoid function assists us in obtaining outputs for classification problems. It considers 0.5 as the threshold value. If a value is above 0.5, it is assigned 1; otherwise, if it is below 0.5, it is assigned 0, thus classifying the data.

Forward Propagation

In artificial neural networks, our main goal is to find the most optimized weight and bias parameters. Since we don’t know the optimal values when we start training, we initialize these parameters randomly and perform forward computations from left to right to calculate the output. This output becomes the model’s prediction. Then, we need to assess how good this prediction is. Here, for classification tasks, the commonly used loss function is the Log Loss function.

Log Loss Function

Log Loss, also known as Cross-Entropy Loss, is a metric commonly used for evaluating classification models that make predictions based on probability values. It measures the difference between the predicted probabilities and the true labels of the data. The lower the Log Loss value, the more successful the model is.

Backward Propagation

In the backpropagation phase, our goal is to update the parameters in a way that minimizes the error. We use gradient descent to update the parameters.

Gradient Descent

Gradient descent aims to reach the global minimum using an initially chosen random value. During this process, parameters get updated.

1. The derivative of the loss function is calculated for each parameter. The loss function is the Log Loss mentioned above.

2. The new parameters are calculated using the following formula:

The learning rate parameter specifies the magnitude of the steps taken to approach the minimum point. If the learning rate is small, the process may take too long, whereas if it is large, the minimum point may be missed. Therefore, selecting this parameter optimally is crucial. As a result, parameter update operations are performed using the following functions:

Building an Artificial Neural Network from Scratch

1.Let’s define the sigmoid activation function that we will use.

def sigmoid(z):
    return 1/(1 + np.exp(-z))

2.Let’s determine the model structure. For this, we need to learn the input and output dimensions. The X parameter given to the function specifies the independent variables, and the Y parameter specifies the dependent variable (target variable).

def layer_sizes(X, Y):
    n_x = X.shape[0]
    n_y = Y.shape[0]
    
    return (n_x, n_y)

3. Then we will do the parameter assignment operations that we mentioned in the forward propagation section. When there will be matrix multiplication, we give the input and output dimensions as parameters to the function. Weight matrix and bias are given according to these dimensions.

def initialize_parameters(n_x, n_y):
    W = np.random.randn(n_y, n_x) * 0.01
    b = np.zeros((n_y, 1))

    parameters = {"W": W,
                  "b": b}
    
    return parameters

4. After defining the parameters, we can perform forward propagation. In the forward propagation process, the result of the equation z = w1x1 + w2x2 + b was given into the sigmoid function and the model estimate was obtained. The np.matmul() function is the matrix product of two arrays.

def forward_propagation(X, parameters):
    W = parameters["W"]
    b = parameters["b"]
    
    Z = np.matmul(W, X) + b
    A = sigmoid(Z)

    return A

5. The next thing we need to do is to define the log loss (cost) function we mentioned above. In fact, what we’re doing is just putting the formula into code.

def compute_cost(A, Y):
    m = Y.shape[1]

    
    logprobs = - np.multiply(np.log(A),Y) - np.multiply(np.log(1 - A),1 - Y)
    cost = 1/m * np.sum(logprobs)
    
    return cost

6. After we’ve done our cost function, it’s time to do backward propagation. Let’s take another look at what we need to calculate.

Our goal in backward propagation was to minimize the cost function by updating the weights. For this, we need to calculate the derivatives after the upper alpha parameter. Values we give as parameters:
• A: Output of forward propagation function
• X: Input data
• Y: Target

As an output, our function returns gradient values for weights and bias.


def backward_propagation(A, X, Y):
    m = X.shape[1]

    dZ = A - Y
    dW = 1/m * np.dot(dZ, X.T)
    db = 1/m * np.sum(dZ, axis = 1, keepdims = True)
    
    grads = {"dW": dW,
             "db": db}
    
    return grads

7. Now that we have found our parameters, let’s write the function that will update the old ones. The function takes the old parameters, the gradients obtained after backward propagation, and the learning rate parameter, which determines the step span. After finding all these values, all we have to do is calculate the equations in the figure in the 6th item and update the parameters.

def update_parameters(parameters, grads, learning_rate=1.2):
    
    W = parameters["W"]
    b = parameters["b"]
    
    dW = grads["dW"]
    db = grads["db"]
    
    W = W - learning_rate * dW
    b = b - learning_rate * db
    
    parameters = {"W": W,
                  "b": b}
    
    return parameters

8. We wrote all our auxiliary functions, now it’s time to combine them to create our artificial neural network model. First, we get the dimensions of the features and our target variable. The num_iterations parameter here is the same as the epoch used in neural networks. Epoch is a training cycle in which all samples in the dataset are shown to the network and the network is updated based on this data. The function returns the parameters learned by the model. These are used to make predictions.

def nn_model(X, Y, num_iterations=10, learning_rate=1.2, print_cost=False):
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[1]
    
    parameters = initialize_parameters(n_x, n_y)
    
    for i in range(0, num_iterations):
         
        A = forward_propagation(X, parameters)
        
        cost = compute_cost(A, Y)
        
        grads = backward_propagation(A, X, Y)
    
        parameters = update_parameters(parameters, grads, learning_rate)
        
        if print_cost:
            print ("Cost after iteration %i: %f" %(i, cost))

9. Our neural network model is complete. Now we can move on to the function that will perform the estimation with the new data. This function uses parameters to predict which class the sent data belongs to. Since this is a binary classification, the class of the data will be 0 if less than 1 if the probability is greater than 0.5.

def predict(X, parameters):
    A = forward_propagation(X, parameters)
    predictions = A > 0.5
    
    return predictions

That’s it. A more complex model can be created by adding hidden layers to these. When we use libraries such as Tensorflow and PyTorch, these operations are performed in the background. In the next article we will look at how we can add hidden layers. Thank you for reading.