Steps of Machine Learning

7 steps of machine learning

From skin cancer detection to sorting out crabs, to finding escalators in need of maintenance, machine learning has given computer systems completely new capabilities.

But how does it really work under the hood? Let’s walk through a basic example, and use it as an excuse for the process of getting answers from your data using machine learning.

We pretend that we are asked to create a system that answers the question of whether the beverage is wine or beer. The question-answer system we build is called a "model", and this model is created through a process called "training". The goal of training is to create an accurate model that answers most of our questions. But to train a model, we need to collect data on the train. This is where we start.

If you are new to machine learning and want a quick overview first, check out this article before releasing:

Wine or beer?

Our data will be collected from glass wine and beer. There are many aspects of beverages that we can collect data on, from the amount of foam to the size of the glass.

For our purposes, we will choose only two simple ones: color (as a wavelength of light) and alcohol content (as a percentage). The hope is that we can split our two types of drinks with these two factors. We now call these our "features".

The first step in our process is to run out at a local grocery store and buy a bunch of different beers and wines, as well as get some tools to measure us - a spectrometer to measure color, and a hydrometer to measure alcohol content. Our grocery store has an electronics hardware segment

Collecting data

Once we have our tools and burdens, it's time for the first real phase of our machine learning: data collection. This step is very important because the quality and quantity of data you collect directly determine how good your predictive model might be. In this case, the data we collect will be for each drink, and the color and alcohol content.

It produces a table of color, wine% yield, and whether it is beer or wine. This will be our training data.

Data preparation

After a few hours of measurement, we gathered our training data. Now is the time for the next stage of machine learning: data preparation, where we load data into the appropriate data and prepare it for use in our machine learning training.

We will first put all our data together and then randomize the order. We don't want our data to affect the order we want, because that's not part of determining whether the drink is beer or wine. In other words, we decide what the drink is, before or after drinking it.

This is a good time to take a relevant look at any of your data, to show if there is any relevant relationship between the different variables that could benefit you, and also to show if there is any data imbalance. For example, if we collected more data points about beer than alcohol, the model we trained would be biased to assume that everything it sees is beer because it is almost all justified. However, in the real world, models can see beer and wine in equal proportions, which means that guessing "beer" would be half the time.

We also need to divide the data into two parts. The first part will be the majority of the datasets used in the training of our model. The second part will be used to demonstrate our trained model. We do not want to use the same data that the model was trained to evaluate, as it can only memorize "questions", as you will not use the same questions from your math homework in the exam.

Sometimes the data we collect requires other forms of adjustment and manipulation. Things like day-duping, generalization, error correction, and more. All this data would be in the preparation phase. In our case, we don't need data preparation ahead, so let's move on.

Choosing a model

The next step in our workflow is choosing a model. There are many models that researchers and data scientists have created over the years. Some are very suitable for image data, for other sequences (such as text, or music), some for numerical data, some for other text-based data. In our case, we have only two features, and color and alcohol%, we can use a small linear model, which is a very simple task to complete.

Training

Now we often move on to a large part of machine learning - training. At this stage, we will use our data to improve our model's ability to predict whether a given drink is wine or beer.

In some ways, it's similar to learning to drive first. At first, they did not know how any of the paddles, swings, and switches would work, or when any of them would be used. Anyway, after a lot of practice and right and wrong for them, a licensed driver emerges. Moreover, after a year of driving off, they have become quite efficient. The act of driving and responding to real-time data optimizes their driving abilities, respecting their art.

We will do this in very small with our beverages. Specifically, the formula for a straight line is y = m * x + b, where x is the input, m is the slope of that line, b is the y-intercept, and y is the value of the line in position. The values available to us for x adjustment, or "training", are m and b. There is no other way to affect the position of the line since the other variables are only x, our input, and y, our output.
In machine learning, there are many m because there can be many features. These m values are usually formed in a matrix, which we refer to as W, for the "weight" matrix. Similarly, for b, we arrange them together and call it bias.

The training process involves introducing some random values for W and B and attempting to predict the output with those values. As you can imagine, it does good and bad. But we can compare the predictions of our model with the output it produces, and adjust the values in W and b so that we have a more accurate prediction.

This process is then repeated. Each repetition or cycle of updating weights and biases is called a training “step”.

Let's see what this means for our dataset in this case. When we first start training, it is that we draw a random line through the data. So as each phase of training progresses, the line gradually moves forward, closer to an ideal division of wine and beer.

Evaluation

Once the training is complete, it is time to use this assessment to see if the model is any better. This is where the dataset that we have separated comes into play first. Evaluation allows us to test our model against data that has never been used for training. This metric helps us to see how the model performs against data that it has not yet seen. This is to be able to represent the model in the real world.

The best rule of thumb I used to split the training-evaluation somewhere in the order of Thumb0 / 20 or / 0/300. Much depends on the size of the original source dataset. If you have a lot of data, you don't need as much as a fraction for an evaluation dataset.

Parameter Tuning.

Once you have done that it is possible that you want to see if you can improve your training in any way. We can do this by tuning our parameters. There were some parameters that we explicitly accepted when we did our training, and now is a good time to go back and test those assumptions and use other values.

An example is how many times we gave in the training dataset during the training. What I mean is that we can show the model as "complete", not just once, but a complete dataset. This can sometimes cost more.

Another parameter is the "teaching rate". It defines how far we move the line during each phase, based on information from the previous training phase. All of these values play a role in how accurate our model can be, and how long it takes to train.

For more complex models, the initial condition can play an important role in determining the outcome of the training. The differences can be seen in the fact that a model starts with training against some distribution of values at the beginning of zero, which raises the question of which distribution to use.

The possible long journey of parameter tuning

As you can see there are a lot of ideas at this stage of training, and it is important that you define what makes a model “very good”, otherwise you may find yourself tweaking parameters for a very long time.

These parameters are commonly known as "hyperparameters". Adjustment, or tuning, of these hyperparameters, remains a point of art and is a more experimental process depending on the specificity of your dataset, model, and training process.

Once you are happy with your training and hyperparameters, guided by this evaluation step, it is finally time to use your model for something useful!

Prediction

Using data to answer machine learning questions. So prediction, or conjecture, is the stage where we answer some questions. This is the point of all work, where the value of machine learning is gained.

We can finally use our model whether a drink is a wine or beer, by giving it's color and alcohol percentage.

Big picture