# Welcome!

Welcome to neural network simulator. This application will cover multiple topics regarding neural networks: simple one neuron model, linear regression, multi-class classification, gradient decent and linearity, multi- and non-linearity, image recognition. To continue with learning choose to either follow a tutorial or explore on your own if you already have prior knowledge regarding these topics.

# Computer Gardener

Your friend, a true green thumb enthusiast, entrusts you with the care of their lush apartment filled with a variety of plants while they travel for a long vacation in a different country. As you stand amidst the greenery, a sense of bewilderment washes over you. How much sunlight does each plant crave? How do you measure the perfect amount of water without drowning them? As a good friend you want to do everything possible for these plants to have the best chance of survival.

After a bit of research, you discover a baseline - 10 hours of sunshine and 500ml of water ensure the survival of the plants. However, reality isn't always that straightforward - what about cloudy days or unexpected shade or what if you miss a day of watering? How do you adapt and find the optimal balance to keep these delicate plants thriving without constant monitoring?

Luckily your friend is also a software engineer and has sent you this simple model to your email. Try changing the percentage of the sunlight and water you are able to provide and see what likelihood the plant will have to survive.

# Take a Look Inside the Neuron

You notice the purple circle in the middle of the model called a neuron. You probably have heard before about the neurons in a human brain, but what does this one do? To understand how the model can take the water and sunlight amounts and turn it into something like a percentage we need to take a closer look at the training process.

The training has 4 phases: firstly calculating the weighted sum, then using the sigmoid function on the weighted sum to get a prediction. After that the error amount will be calculated to determine how off the model was and lastly weights and bias will be corrected for the next cycle for a more accurate model. Go ahead and start by trying to input some random numbers into the highlighted fields for weights and bias. Once you go through one training iteration (aka epoch) you'll have the option to fast-forward through as many as you want. Below the training section you will find a list of terms and explanations for any questions you might have.

**Expected output ** – 0 means dead, 1 means alive.

**Accuracy ** – How much has your current model in training predicted correctly.

**Weights and biases ** – In a neural network, weights determine the importance of each input feature. A higher weight means the
corresponding input feature has a greater impact on the output. The bias allows the model to shift the
activation function to fit the data better. By adjusting weights and bias, the neural network learns to
make accurate predictions.

**Sigmoid function ** – The sigmoid function is a mathematical function that maps any real-valued
number into a value between 0 and 1. It is often used in neural networks to convert the weighted sum
of inputs into a probability-like output. The sigmoid function is defined as: σ(y) = 1 / (1 + e^{-y})

**Error calculation ** – When training a neural network, error calculation measures the difference
between the expected output and the actual prediction. It plays a crucial role in adjusting the model's weights and biases during the
learning process.

**Learning rate ** – The learning rate is a hyperparameter that controls how much to change the model
in response to the estimated error each time the model weights are updated. A higher learning rate
means the model weights will be updated more significantly. It’s a crucial factor that can affect the speed and quality of learning. Too
high a learning rate can cause the model to converge too quickly to a suboptimal solution, whereas too
low a learning rate can make the training process excessively slow. For this example the learning
rate has been fixed to the value 5.

**Updating weights and bias ** – This is a fundamental step in the training of neural networks. It
involves adjusting these parameters based on the calculated error, learning rate, and input values.
By iteratively updating weights and bias, the neural network aims to minimize prediction errors and
improve its accuracy over time.

**Epochs** – In the context of neural networks, an epoch refers to one complete pass through the entire
training dataset. During each epoch, the model's parameters are adjusted based on the training data to
minimize the loss function. Multiple epochs are often required to adequately train a neural network,
allowing it to learn and generalize from the data.

# The Final Model

After many cycles the training will result in desired weights and bias that give the most accurate results based on the training data. Using those we can make a model that will make predictions about the likely-hood of your plants surviving. This is what would be happening inside the example models neuron.

**Use the inputs to calculate weighted sum: ** y = 29,33 * sunlight amount + 29,89 * water amount - 35,12

**Make final prediction by using the weighted sum in the sigmoid function:** σ(y) = 1 / (1 + e^{-y})

**The final output:** the percentage of the likely-hood of survival.

# How Much Should I Study?

The morning before an important exam you find yourself thinking: "I wish I knew how much I needed to study to get a good grade in this class." Luckily your professor is a machine learning fan and has left his class this model that estimates based on your participation throughout the course and the hours you spend studying for the exam, what grade are you most likely to get in their class.

To get an estimate for your grade you need to input how many assignments for this class have you completed (there was a lot of them, and they were mainly to help students grasp different concepts on their own). In addition, figure out how many lessons you missed. Then finally you can adjust the hours you will be studying today and see how it affects your grade. Maybe you already put in so much work during the term that studying today won't affect it much, or maybe you have missed so much material that even studying the whole day won't save you from a bad grade.

# Training a Linear Regression Model

You probably noticed this model also had a neuron, but this time it works a little differently. This model uses something called linear regression and operates under the simple presumption that overall the more work you put in and the more you study, the more likely you are to get a good grade. The goal of the model is to find a line that would accurately represent the correlation between studying and score.

The training process now only has three phases: firstly using coefficients and intercept to determine the value of y. Here instead of a probability based prediction it will predict the exact score for a student. Below you will also see a visual representation of y. Next the model will calculate the difference between the actual and the predicted score. Using the error, coefficients and intercept will be recalculated. Go ahead and start by trying to input some random numbers into the highlighted fields for coefficients and intercept. Once you go through one training iteration (aka epoch) you'll have the option to fast-forward through as many as you want. Below the training section you will find a list of terms and explanations for any questions you might have.

**Expected result ** – The test score this particular student in the training data got.

**Average error ** – On average how many points off was the model from the true test score.

**Coefficients and intercept ** – In a linear regression model, coefficients determine the relationship between each input feature and the
output. The intercept allows the model to shift the prediction to fit the data better. By adjusting
coefficients and intercept, the model learns to make accurate predictions.

**Linear regression ** – Linear regression is a statistical method used to model the relationship between a dependent variable and
one or more independent variables. The model aims to find the best-fit line through the data points that
minimizes the sum of squared differences between observed and predicted values.

**Error calculation ** – When training a linear regression model, error calculation measures the
difference between the expected result and the predicted score. It plays a crucial role in adjusting
the model's coefficients and intercept during the learning process.

**Learning rate ** – The learning rate is a hyperparameter that controls how much to change the model
in response to the estimated error each time the model weights are updated. A higher learning rate
means the model weights will be updated more significantly. It’s a crucial factor that can affect the speed and quality of learning. Too
high a learning rate can cause the model to converge too quickly to a suboptimal solution, whereas too
low a learning rate can make the training process excessively slow. For this example the learning
rate has been fixed to the value 0,01.

**Updating coefficients and intercept ** – This is a fundamental step in the training of linear regression models.
It involves adjusting these parameters based on the calculated error, learning rate, and input values.
By iteratively updating coefficients and intercept, the model aims to minimize prediction errors and
improve its accuracy over time.

**Epochs** – In the context of neural networks, an epoch refers to one complete pass through the entire
training dataset. During each epoch, the model's parameters are adjusted based on the training data to
minimize the loss function. Multiple epochs are often required to adequately train a neural network,
allowing it to learn and generalize from the data.

# The Final Model

After many cycles the training will result in desired coefficients and intercept that give the most accurate score predictions based on the training data. Using those we can make a model that will make predictions about the score and grade that will most likely be achieved. This is what would be happening inside the example models neuron.

**Use the inputs to calculate score: ** score = 1,33 * hours studied - 4.08 * lessons missed + 2.14 *
assignments completed + 94

**The final output:** the score and grade most likely to be received.

# What's for Dinner?

You probably have faced the problem of not knowing what to eat. However, you usually probably know whether you want it to be sweet, spicy, salty etc. So you decide to make your life a little easier today.

Now that you know a little bit about neural networks already, you have implemented this model below. It takes in your preferences, so check the flavors based on what you are feeling up to most. For the output it has some common foods and based on your choices the model will give you a food you most likely should eat.

# But Something Seems Off

The model seems to be making predictions, but they are not very good. This can only mean that you need to train it more to learn how to better classify these foods as accurate to you as possible.

Firstly, you will have a list of all the foods the model can give as output. For each food you must tick the flavors you think apply the most. There are no right or wrong answers, this is just to initialise the flavors of the food for the model. Secondly, after clicking submit you will be taken to the training process, where you can follow along as the model trains based on the data you provided in the previous step. You may also adjust the learning rate and epochs to speed up or slow down the training process. To view a speed-up version of the training click go, otherwise click next to see it step-by-step. Top of the page you can hover over the food items to see their parameters. Below the training section you will find a list of terms and explanations for any questions you might have.

**Learning rate ** – The learning rate is a hyperparameter that controls how much to change the model
in response to the estimated error each time the model weights are updated. A higher learning rate
means the model weights will be updated more significantly. It’s a crucial factor that can affect the speed and quality of learning. Too
high a learning rate can cause the model to converge too quickly to a suboptimal solution, whereas too
low a learning rate can make the training process excessively slow.

**Epochs** – In the context of neural networks, an epoch refers to one complete pass through the entire
training dataset. During each epoch, the model's parameters are adjusted based on the training data to
minimize the loss function. Multiple epochs are often required to adequately train a neural network,
allowing it to learn and generalize from the data.

**Weights and biases in a multi-class model** – In a multi-class model, a weighted sum is calculated for each class. This involves multiplying the input
features by a unique set of weights and adding a bias term specific to each class. Thus, there is a
different set of weights and biases for each class, allowing the model to learn distinct patterns and make
accurate predictions for each category.

# Separating the Dead From the Alive

For the one neuron model, we had an example of predicting the likely-hood of a plants' survival. Here displayed is the training data for it: the green dots represent all the plants that grew and the red ones died. Is there a way to have a line that will separate the green dots to one side and red to the other?

Try changing the slope and intercept of this line to correctly classify these plants. Aim for the score 100, which means all points have been classified correctly. If you want to try again generate some new data. Below you will find some useful terms and explanations for any questions you might have.

**Slope ** – The slope of a line indicates how steep the line is. In mathematical terms, it
represents the rate of change between the two variables on the graph. If you're adjusting the
slope in our tasks, you're changing how quickly the value on the y-axis (e.g., plant growth)
increases or decreases as the value on the x-axis (e.g., sunshine or water) changes.

**Intercept ** – The intercept is the point where the line crosses the y-axis. It represents the
value of the y-variable when the x-variable is zero. Adjusting the intercept in our tasks shifts the
entire line up or down without changing its slope, allowing you to fine-tune your predictions.

**Classification ** – Classification is a fundamental technique in data science used to assign items
into predefined categories. In our plant growth task, the goal is to classify whether a plant will
survive or not based on the amount of sunshine and water it receives. By adjusting the slope and
intercept of the line, you create a boundary that separates the categories.

# Predict the Test Scores

For the linear regression model, it calculated y, which represented the predicted score. The goal of the model was to find such coefficients and intercept that would result in the most accurate scores, meaning the error was as small as possible. Here is a representation of some data and the goal is to find a line that has the smallest error. The ideal line would be such that perfectly goes through all the individual data points, but for this test score data it is simply unobtainable.

Try changing the slope and intercept of this line to get the error down as small as possible. If you want to try again generate some new data. Below you will find some useful terms and explanations for any questions you might have.

**Mean Square Error** (MSE) is a metric used to measure the accuracy of a predictive model, particularly
in regression tasks. It quantifies the difference between the predicted values and the actual values
by averaging the squares of these differences. Here's the formula for MSE:

MSE = (1/n) ∑_{i=1}^{n} (y_{i} - ŷ_{i})^{2}, where n is the number
of data points, y_{i} is the actual value for the i-th data point, and ŷ_{i} is the
predicted value for the i-th data point.

MSE is important because it provides a single value representing the overall error of a model, making it easier to compare different models. It also penalizes larger errors, encouraging the model to minimize significant deviations. In our linear regression task, MSE helps us adjust the line to fit the data points as closely as possible, ensuring precise predictions based on the input data by minimizing the MSE.

# Gradient Descent

Gradient descent is an optimization algorithm used to minimize a function by iteratively moving towards the minimum value of the function. In simpler terms, it's like walking down a hill, always taking steps in the direction that decreases your elevation the most until you reach the lowest point.

This is what you intuitively used for these tasks – adjusting the line little by little until achieving your desired goal. This is also what the one neuron and linear regression model used when correcting weights and bias or coefficients and intercept each cycle, moving closer to the desired line step by step.

# Cutting Shapes

In the gradient decent section we viewed the visual representation of the training data for the plant model and test score predicting model. The data there was linear, meaning we could either fit one line as close to all the data points as possible, or separate the dead plants from living with a singular line.

However, not all data can be separated with one line. For our food predicting model there were much more than two possible classes, meaning it would be impossible to divide all the data into two, however we could divide for example salad and ice cream with a line. In addition, even if we do have two classes they can still be distributed in a way where we might need more than one line to make a clear separation.

Take a look at some of these pictures below with different sets of data. Try and figure out how you would draw a line or lines to separate the data there properly.

# Sometimes One Is Not Enough

When a single line isn't enough to separate data, we must consider more complex decision boundaries involving multiple lines. This scenario arises when data points are distributed in such a way that a single linear boundary cannot adequately divide the classes. For instance, in multi-class classification problems like the food predictor, there are more than two classes (one for each food), and separating each class from the others might require multiple lines or more complex shapes.

Neural networks excel in such tasks by using multiple layers to learn complex patterns. Each layer in a neural network can be thought of as contributing a piece of the decision boundary, and when combined, they form a boundary capable of accurately separating the classes. This multi-layer approach enables neural networks to handle scenarios where multiple lines are necessary for effective classification.

Here you have some data and the ability to create more layers, therefore trying to solve the problem of separating this data correctly. An example solution is provided as well. Below you will find some useful terms and explanations for any questions you might have.

**Layers** – Layers in neural networks are the building blocks of the network. Each layer consists
of a set of neurons, which receive input from the previous layer and transform it to produce an output
that is passed to the next layer. The depth of a neural network is defined by the number of layers it
has, including input, hidden, and output layers.

**Epochs** – In the context of neural networks, an epoch refers to one complete pass through the entire
training dataset. During each epoch, the model's parameters are adjusted based on the training data to
minimize the loss function. Multiple epochs are often required to adequately train a neural network,
allowing it to learn and generalize from the data.

# XOR Problem

The XOR (exclusive OR) problem is a classic example in the field of machine learning that illustrates the limitation of linear classifiers. In the XOR problem, data points are arranged in a way that makes it impossible to separate them with a single linear boundary. Specifically, the data points form an "X" shape, with each diagonal representing a different class. This configuration means that no straight line can separate the classes without error.

To solve the XOR problem, non-linear decision boundaries are required. Neural networks provide a robust solution to this problem by leveraging their ability to model complex, non-linear relationships. By using hidden layers, a neural network can learn to transform the input space into a representation where the classes become linearly separable. This process involves adjusting the weights of the connections between neurons to minimize the prediction error through a series of iterations, known as epochs.

# Are You Happy or Sad?

Have you ever wondered how we, as humans, can instantly tell if a face is happy or sad, even with just a quick glance? It’s a fascinating ability that we often take for granted. Our brains are incredibly skilled at picking up on subtle cues—whether it's the curve of a smile, the position of the eyebrows, or the shape of the eyes. This process is so intuitive for us that it seems almost automatic.

But how does this work in the digital world? When you design emojis or any other image that represents emotions, you need to translate those human intuition cues into something a computer can understand.

You have been given a task to design some new emojis. All emojis must be distributed between sad or happy depending on which category would fit them better. Luckily for you, this is not something you need to think about since you were also provided with a model that is capable of classifying anything you draw into the 6x6 grid to happy or sad.

To draw a face you need to click on the square you want to color black. If you wish to see some examples of a stereotypically sad or a happy face, click on the respective buttons.

# But a Computer Doesn't Have Eyes...

We know a computer cannot actually see what you drew the way we humans see. For us, it is easy to recognize the smile curving up or down. A computer relies on math.

You should be familiar with coefficients from previous models. Here the coefficients have already been fixed as how they were used in the model above. For the input each white square represents 0 and each black square represents 1. When clicking the next button this model will go through the grid square by square and calculate along the way until finally making a prediction. If you continue clicking next after that a new input face will appear and the cycle will begin again. Below you will find some useful terms and explanations for any questions you might have.

**Input grid** – each white square is 0, black square 1, uses a random sample face from this models
training data.

**Coefficients grid** – This grid contains the coefficients (weights) associated with each pixel in
the input grid. Each coefficient represents the importance or influence of its corresponding pixel value
in the final prediction. The coefficients are used in the calculation to determine how each pixel contributes
to the overall classification.

**Calculation grid** – This grid displays the intermediate results of the calculation process. Each
cell shows the product of the input pixel value and its corresponding coefficient. This helps visualize
how each pixel's value, when multiplied by its weight, contributes to the overall sum used in the prediction.
Only the cells where the value does not equal zero will be highlighted in purple.

**Intercept** – The intercept is a constant value added to the weighted sum of the pixels. It helps
adjust the final result to better fit the data. In a mathematical model, it shifts the sum to account
for any baseline value that isn't captured by the coefficients alone. Here it is fixed to be 0.38.

**Weighted sum** – sum of the intercept and all the cells in the calculation grid.

**Prediction** – The prediction is the final output of the model after applying the sigmoid function
to the calculated sum. It represents the probability that the input image (emoji) belongs to a particular
category. A value greater than 0.5 indicates a positive classification (happy), while a value below 0.5
indicates a negative classification (sad).