Skip to main content

Understanding Machine Learning

Introduction into Machine Learning and why we need it.

Updated over a year ago

What is machine learning?

Machine learning is a key discipline in the extraction of useful things from annoyingly large amounts of data. It is in the family of statistical analysis. While other methods involve manually-designed rules created by a human, machine learning methods involve algorithms that “discover” rules through maths.

In the real world, there are some things that we can measure and some things that we can’t. Things we can measure include: sizes and weights, colours and brightnesses, times, MRI scans of people’s heads, audio recordings of woodland animals, today’s house prices in Blackpool. Thing’s we can’t measure include: what someone is thinking, which animals are living in a woodland, house prices in Blackpool in the year 2072. The world does it’s thing, including having physical laws and complicated psychological entities (aka people) that continually change the state and composition of the world.

That all sounds rather complicated though, so scientists make simplified models in an attempt to approximately explain how the things we can measure can predict the things that we can’t. They use their experience (aka empirical evidence) and mathematical wit (aka theoretical evidence) to sculpt their models according to their best ability and then hope they behave on the whole like the real world does

That also sounds rather complicated though, so machine learning scientists take a step outside of their personal experiences and instead let the models build themselves by exposing them to as much experience of the real world as they can in the form of data. The scientists spend most of their time gathering and preparing this data and mathematically formulating what the “right outcome” is of a model is: it should obviously behave as much like the real world as possible. They then press run and hope for the best (in a rigorous, scientific manner of course).

In the below items, a ( * ) signifies that we’ll look more at this item later in this guide.

Is machine learning better than other methods?

The machine learning process of “discovering” takes a long time( * ), can be hard (often impossible) to interrogate( * ), and can lead to very different results depending on the example data used( * ). Under the right conditions( * ) it can perform significantly better than human-designed models.

How can machine learning do better than human-designed models?

Human-designed models inherently need to be expressable by a person—for example, they might need to be written down as an equation or described in words—since that’s the way we think: symbolically (see Wittgenstein, Chomsky, and others). Machine learning avoids this constraint by minimising the human design involved( * ) and hence allowing for much more sophisticated models than could be described by a person.


How does machine learning work?

There are two completely distinct phases involved when creating a machine learning model:

  • Training and Trained

Machine learning is only used during the training phase. Machine learning is used to train a model.

During training, the neural network needs to get a grip on how the big complicated world works. It does this by being shown matched examples of the things that we can measure and the things that we will want to know. The way that this knowledge is instilled in our neural network is the crux of machine learning: as shown in the diagram above, machine learning is applied by the machine learning wizard/ess using magic spells such as “back-propagation” and “adaptive moment estimation gradient descent”. In most cases, the things that we want to know are either difficult to measure or can’t be measured just yet. For example, perhaps they need an expert to make a judgement, or we’re trying to predict something that will happen in the future like a decision to make a purchase. This is why in the vast majority of cases, machine learning is performed using historical data—that is, data that we already know the outcome of

After training, the trained neural network can’t learn any more, because all that it can see are the things that we can measure. It can make its best guess about the things that we want to know, but without knowing what actually happens (perhaps we no longer have an expert’s judgement, or the thing we want to know really is in the future now) it is not training any more. Because no learning is happening, no machine learning is involved any more.

What do we mean by trained? You have two colleagues. One is trained, the other is untrained. Which one do you expect to perform better? An untrained model still “works”, it just probably doesn’t do what you want it to do. Just like with our everyday use of the word “training”, machine learning is used to make an untrained model perform better. And just like with our everyday use of the word, a model can be more well-trained or less well-trained, and an already trained model can be put back into training to become better at its job. A trained model is no longer being “machine learning-ed”—it’s just doing it’s job. It may even be doing it quite well(!).

How does machine learning train a model? Machine learning is a mathematical tool that efficiently fiddles with the internals of an existing model to make it behave more how you want it to. With Data™️ (*).

What is a model and which models should we use? A model is another word for “an equation which approximately mimics the real-world”. Weather forecasters have models, which look like big complicated fluid dynamics equations (link, if you dare…). In computer vision, we use neural network models because they are extremely generic. While the weather forecast models are designed according to specific physical laws between air speed, temperature, etc., neural networks only enforce that “there may or may not be some or many connections between each of our variables”. This means that neural networks can be—and are—used to model a huge variety of real-world phenomena (including the weather!).

Is a neural network still an equation? Yes, but it would take you many thousands of pages to write the equation out. This has two important negative effects when compared to non-machine learning models: 1) it takes a long time; 2) it is difficult (or impossible) to interrogate. Rather than give yet another explanation about what a neural network is, I recommend this intermediate article or this advanced article. However, you don’t need to understand what a neural network is to understand any of this document.

Why does training take a long time? When we build any model, we need to figure out what value each parameter in our model should be. Non-machine learning models tend to have between 1 and 100 parameters. Figuring out the value for one hundred parameters can be done almost instantly when we use a calculator (aka a computer). On the other hand, neural network models tend to have millions or billions (nowadays, even trillions) of parameters which need to be calculated. Having this many parameters is necessary when we’re trying to do something inherently difficult like identifying vague items in an image (such as a “scratch”, which can have almost infinite different appearances) or creating a human-like response to a question. However, the more parameters we have, the more computational work (aka calculations) is needed to calculate their value (NB: the calculation of these values is what we mean when we talk about “training”). Incidentally, this computational requirement is why the global workload of machine learning is a significant contributor to climate change (Nature paper).

Why is a neural network model difficult to interrogate? If I hand you thousands of pages of equations and ask you to explain it to me in two sentences, you’re gonna have a bad time. This is known as the “explainability” problem in machine learning. There is a catch-22: we need big, complex models like neural networks to achieve complicated real world tasks, such as finding objects in images; but big, complex models like neural networks are almost impossible to understand. In response, the current status of dealing with the explainability problem of machine learning is to state “we know our model works, but we don’t know why”. Having said that, it is vital to note that we do know why machine learning learning works. Machine learning works because it is a mathematical optimisation across variables, which has been well-studied for hundreds of years. The particular thing that is difficult is explaining the decisions made by a complex model (like a neural network) that has been optimised via machine learning.


Understanding machine learning and data

Think back to earlier when we discussed human-designed scientific models, like the weather forecasting models. You know which humans design really bad scientific models? Babies. Why do babies design really bad scientific models? Good question. Despite being adorable, babies have a generally tendency to draw erroneous or incomplete conclusions about how the world works due to an overall lack of experience. Some real world phenomena require relatively few experiences to learn. For example, “It has caused me pain” ⇒ “Don’t do it”. More complex phenomena need a lot of experiences. Some hypothesise that this is the fundamental reason behind the dearth of pre-school clinical physicians, or “doctlets” (come on babies, get your act together).

When a model is trained via machine learning, all of its worldly experiences come in the form of the data that we provide it with, known as training data. When I say “all of”, I really mean all of. While living beings like us pick up little bits of general knowledge here and there over the course of our lifetimes, most machine learning training takes place over the course of a few hours or days. While we have several senses and even more stimulus responses to mold our understanding, most data sets are comprised of a single sense (like images) and only a very small bit of additional information to go with it (like “this scooter is parked well”).

Any worldly experience not contained in your training dataset is entirely unexperienced by our neural network, so just like our adorable, pathetic baby friends, there’s no telling what baffling response our neural network will have when it eventually comes across it in the real world. What’s more, if the training dataset only contains a handful of examples of a specific scenario, there’s a distinct possibility that the neural network has misinterpreted the link between what we can measure and what we want to know (see the figure below).

Imagine that we have a training dataset with only two images, the image on the left and the image in the middle. We label the first image as “Bike” and the second image as “Scooter”. We want our neural network to tell us that the right-hand image is also “Bike”. However, for some reason it keeps returning its prediction to be “Scooter”. Can you see what it has unintentionally learned instead? Answer: Our model has learned something random that still seems to distinguish the first two images. For example, maybe it thinks that “Bike” means “without helmet” or “facing away from camera”. How can we avoid this problem? Perhaps if we had a larger training dataset which had more bike photos with people with and without helmets and facing in different directions, our machine learning process would result in the neural network learning to look at what vehicle people are riding instead.

To instil our neural network with reliable capability, we need to expose it to as much relevant worldly experience as possible. Note that with the current state of artificial intelligence, the word relevant forces us to train different neural networks for different tasks. This is because what is relevant and what is irrelevant is very task-specific. In the same way that an architect would need to be radically retrained to be a deep-sea diver, or an astrophysicist can’t be relied upon to bake an edible black forest gateau, neural networks are pretty universally really bad at nearly everything. The only thing we can hope them to be good at is the thing that we’ve trained them to do before.

That’s why the answer to “how much data do we need?” is always “as much as we can possibly get”—the training dataset needs to be sufficient to teach our naïve, semi-blind, infantile neural network to get a sense of what the real world looks and behaves like. In that way, the “how much data” question is a bit like asking “how many hours of piano lessons does my child need before they become so good at the piano I can retire on their world-class talent?”. I wish I could answer that question but all I can definitely say is “probably more”.

Did this answer your question?