There are several steps in the process of creating a new machine learning (ML) model and keeping that model updated so it can continue to be useful. This article will walk you through the high-level steps in the lifecycle of a model.
Step 1: Make a plan
This step can often be overlooked, but it’s arguably the most important step in the process. Before you jump into creating training data or training a model, it’s helpful to take a step back and define the problem you’re trying to solve.
Ask yourself: What images or video do you need? Is that data available? If so, how much of it is available, and where is it stored? Remember: the best quality models will come from consistent imagery, and you’ll want as many images of your target object(s) as you can get your hands on. Better to have too much than not enough.
Okay, so you have your data all lined up. Now ask yourself: can a person clearly and reliably see the object(s) you’re looking for? Are there objective markers or flags that make it obvious what that object is and how to define it, or is it ambiguous or subjective? Your target object needs to be explainable so annotators can find and label it consistently and reliably. If the object is not easy to explain and verify, your training data will be inconsistent. Just like with people, inconsistency ‘confuses’ the model and the model will perform poorly.
If your target object is too complicated, it may be easier to break it down into smaller tasks, or you may need to find a different approach to solving the problem. In our platform, creating training data is done in sequential steps, which makes it easy to break things down into smaller, simpler steps.
Step 2: Get data into the platform
This step is known as data “ingestion”. It’s essentially the same as uploading images or video to your favorite cloud storage platform, but there’s a few extra behind-the-scenes steps that take place to structure those media into a format that’s ready for machine learning. Thankfully, these steps are all done automatically, so all you’ll need to do is upload!
There is one step you can take that does make things a bit easier down the road: keep your media as organized as possible ahead of time. Keep videos with videos, and images with images. If you have several projects you want to work on, try to keep the media for those projects from mixing together. It will help keep you organized farther down the road.
Step 3: Create training data
This is a really big step in the process. Head on over to our full article about model training to read more information.
Once your images or video are in the platform and organized, it’s time to create training data! Training data is what puts the learning in "machine learning."
For computer vision, training data takes the shape of labels that you apply to your images or video. The type of label depends on the type of model you’re building (see our article on CV model types), but essentially the label is what tells the model “this is the thing I’m looking for in this image.” Gathering together several hundred or even thousands of high quality labels is what will create a really great model. This process of adding labels to imagery or video is called annotation.
You'll need to first decide what you're going to actually annotate. Take a look at the plan you made in Step 1—what are the specific objects or features you want a model to detect? You'll need to create Categories for these in your CrowdAI account. (At CrowdAI, we call these Categories, but elsewhere you may hear the term classes or even ontology, which have similar meanings. We think Categories is a bit easier to understand, though.)
Next, you’ll need to decide who is going to create these labels in the platform for you. You can do some yourself, gather together a band of helpful colleagues, press your summer interns into service, or even have CrowdAI do it for you.
Annotation does take some time, especially for more complex models. Even if you’re just annotating every instance of cake you find in 2,000 images, it can take a person around 3-5 seconds to mentally process an image and another 2-3 seconds to take a simple action (like draw a box around the cake). At an average of 7 seconds per image, that’s still about 4 hours of non-stop work for a single person, and that doesn’t include any bathroom breaks!
Thankfully, there are several ways we can speed up or even automate this step in the process, and we’ll go into more detail on those in a separate article.
Step 4: Train a model
Once you’ve created enough training data, it’s time to put that data to work and train a model!
Training a deep learning computer vision model can be pretty tricky. It’s a highly technical process involving a lot of data science. If you’re not familiar with the science, tools, and techniques, it’s a bit like sitting down at a piano for the first time and trying to recreate one of Bach’s piano concertos!
The process of training a model takes some time. In a nutshell, it looks like this:
Take the images and labels you created (your “training data”) and package it up for training.
Find the correct neural network architecture (the “cake recipe”) in our library.
Run the training data through the neural network to see how it performs. Did it find the target object(s) correctly?
Based on those results, tweak one of the settings of the neural network and try again.
Repeat steps 3 & 4 hundreds of times.
This is where we make things easy for you, as the platform will automatically do all the heavy lifting at this stage. Thankfully, there are plenty of deep learning experts at CrowdAI, and we’ve baked their combined knowledge and expertise directly into the platform. So you can sit back and relax while we do the work.
Step 5: Put the model to work
This is a really big step in the process. Head on over to our full article about model inference to read more in-depth information.
Once we have a working model, you’re ready to put that model to work, or “deploy” the model.
Deploying a model means getting it set up to see new imagery that it’s never seen before and label the target objects in that new imagery. In model training, you were the one adding labels to images, but now that you’re model’s trained, it’s ready to take over that task for you. This process of a model analyzing an image/video and adding labels is known as inference, as the model is using its knowledge to infer if and where the target object(s) are present.
At this point, you’ve successfully created AI. Mazel tov! 🎉
Step 6: Monitor and improve the model
Now that you have a working model, you can automate analyzing your imagery and video for whatever you trained the model to do. However, the lifecycle doesn’t actually end here—just like a piece of machinery, an ML model needs to be maintained if it’s to perform at its best.
Each time the model is shown new imagery and creates new labels (or inference), it has also actually created new training data. You can use the output from the model to improve your operations however you like, but you also use that same output later as training data to continue to update and improve your model for the future.
For example, did the model get it 100% right, or did it miss something? Either way, feeding this back into the beginning of the model lifecycle as training data will help keep the model up-to-date and prevent what we call “drift” in model performance.