All Collections
Getting Started
Essential Resources
Machine Learning for the Average Joe (or Jane)
Machine Learning for the Average Joe (or Jane)

The basics of machine learning for those of us who aren't AI engineers

Updated over a week ago

What is Machine Learning?

Machine learning is a field within artificial intelligence. It deals with the process behind systems that use inferences and statistical models to find connections in data. Patterns in the data allow the model to draw conclusions and make predictions. Greenscreens.ai’s machine learning models are used to predict prices of future loads based on patterns found in historical load data.

Artificial Intelligence vs Machine Learning

Before we dive into machine learning, it’s important to note the difference between Machine Learning and Artificial Intelligence. While the terms are often used interchangeably, machine learning is actually a subset of the broader category of Artificial Intelligence (AI).

Artificial Intelligence is the general ability to emulate human thought and behavior through computer science, while machine learning refers to the technologies and algorithms that enable systems to identify patterns, make decisions, and improve their capabilities through experience and data. AI is an umbrella term for computer software, with many subsets that work to achieve desired human-like outcomes. Other subsets within AI include deep learning, natural language processing and robotics.

The focus of artificial intelligence is on developing computers or robots that can mimic human behavior. The ultimate goal for AI-enabled programs is to provide information automatically and trigger actions without the need for specific human interaction. Artificial intelligence is already present in many of the technologies we use today. In fact, machine learning is the primary way most people commonly interact with AI.

Popular uses for artificial intelligence include:

  • Providing recommendations on streaming sites for related videos

  • Troubleshooting a problem with a chatbot

  • Smart devices or voice assistants like Siri

Many companies continue to explore AI as a means of increasing productivity and optimizing their workforce with automation and chatbots.

The key differences:

Artificial Intelligence

Machine Learning

  • The goal is to make computer systems imitate human behavior to solve complex problems.

  • AI design means working to create an intelligent system that performs various tasks.

  • The goal is to allow machines to learn from the data so they can provide an accurate output.

  • ML design means working to create machines that can perform specific tasks.

How Does Machine Learning Work?

Many different types of AI models can be used to make predictions based on previous data. Different models have different structures, and each has its strengths and weaknesses. In most cases the machine “learns” by testing examples and receiving feedback on how close each result is to being correct. It uses this feedback to adjust the weight it assigns to each factor in determining a result. Over time, the model gets better and better at predicting outcomes.

What is a Machine Learning Model?

A machine learning model is a smart program that learns patterns and relationships from data, which lets it make predictions and perform actions without being told exactly what to do. The algorithms and parameters that make up a machine learning model can be optimized through training for better performance and more accurate predictions.


Predictive Machine Learning for Freight Pricing

Features and Predictions

Each sample (a load in the case of Greenscreens) is represented by a set of features. Features represent important information about the load, such as its origin, its destination, and the distance between them. Every historical load is also accompanied by its price, which tells the model what kind of prediction to offer, given the conditions encapsulated in the load’s features.

Let's imagine a machine learning model designed to shoot hoops like a top NBA player. It needs to reliably sink baskets in any situation at any time. It would be easy to create a machine that could repeatedly sink baskets from the same spot with no one else on the court, but our hypothetical AI needs to keep doing it from various points with the defense actively trying to stop it. Not even Steph Curry makes 100% of his shots, because so many variables are at play. That’s where machine learning comes in. The AI notes and analyzes all of those variables -- where the shooter is, who the shooter is, what the defense is doing -- and finds the combination most likely to score in each situation. It can handle a huge amount of data, and it gets better as it takes more shots, so it doesn't take long for it to achieve a high shooting percentage.

Now let's apply that to freight pricing. A Greenscreens AI can take a lot of variables (origin, destination, weight, fuel, etc.) into account. It notes how each variable affects rate pricing in the context of all the other variables, and predicts a rate.

A human could arrive at the same prediction, given enough time, but a machine learning system can offer reliable predictions a lot more efficiently. It can even uncover dependencies and trends that might be hard for a busy human to catch, such as time-of-day effects on capacity and price. Machine learning multiplies the speed and efficiency of the human’s work. The model, however, needs to start with real-world information–in this case, historical loads.

Predictions vs Historical Averages

For years, brokerages have relied on historical averages alone to predict current freight costs. While historical data is critical in predictive models, when used alone, it can only capture past trends, while predictive AI models use both historical and current data and machine learning algorithms to predict what will happen next.

Due to the volatility of the freight market, using simple historical averages to predict current rates will often produce inaccurate and partial results. AI models use advanced statistical analysis to make predictions and measure the Confidence of those predictions. They also deal with data scarcity in a much more consistent and controlled way than plain averages. Predictive models can take into account many different features of a load, identify correlations that would be difficult to the human eye to recognize, and provide a rate that has considered all of those features and even similar features, such as load size, origin and destination.

Greenscreens automates the process of statistical analysis in order to deliver you reliable and easy-to-read rate predictions.

Training and Selecting a Prediction Model

We start by dividing our historical data into three sets: a training set, a validation set, and a test set. Training the model means feeding the training set into the model along with the expected outcome (price), so the model can generate a prediction for further loads. The training set typically contains the majority of the historical data. The additional sets outside of the training set measure the model’s performance on previously unseen data.

The training phase leaves us with multiple competing models that differ in the way they operate, the features they use, or the weight they give to different parameters. For instance, one model might assign more importance to origin and distance, while another prioritizes load size. We need to select the model that provides the most accurate rate predictions, either manually or by an automatic selection workflow.

The next step uses the validation set, which contains samples previously unseen by the model, to evaluate the models and select the most accurate ones. The models are given features of the loads within the validation set without their outcomes--in this case price--and each model's predictions are compared to the actual values from the sample. Each model is graded on how closely it predicted the validation set as a whole. We choose the most accurate model and refine it further with the values from the validation set.

We still need an objective assessment of the selected model’s results on previously unseen data. Because we just used the validation set to select the best performing model, we couldn’t be sure that the success of that model didn’t involve a bit of luck (i.e. random elements) if we used the same set again. So we use the test set, which is still new to the model and thus a good representation of the model’s accuracy with unseen data. That final test gives an objective evaluation of the model’s real-world performance. The Greenscreens app shows the model’s performance against the test set in the Rate Accuracy Report, which measures the accuracy of the rate predicted by Greenscreens compared to the actual booked rate of the load, so you can gauge the results.

Why do we use new data sets to evaluate the model? If we tested using the original training set, all we would find out was how good the model was at parroting back information it already had. Using a new set tells us how well the model does with new information. For instance, if we trained the model with a training set that included a shipment from Boston to San Antonio, testing the model with the same set would only test whether it could repeat the same rate every time. That wouldn't be much use, so we test with a different set.

Data Sources

Data sources provide the information from which the model learns. In the case of Greenscreens, the model uses a few different data sources. The primary data set used to predict rates for your brokerage is your own historical load data. Origin and destination, pickup date, transport type, transport mode, carrier and customer costs, and linehaul cost are the most important features for rate prediction. Greenscreens.ai’s models also learn from a wider set of brokerage data. As of May 2023, the Greenscreens model employs data from 120+ brokerages. It can also learn from external data sources and macroeconomic features for increased accuracy.


Greenscreens.ai Rate Predictions

Target Buy Rate

The Target Buy Rate is the predicted buy rate on a lane for that given brokerage. Greenscreens predicts this rate when you enter a lane and transport type into our user interface or Greenscreens-integrated TMS. This rate is influenced by your historical data and reflects your company’s individual buying behavior against the current market conditions, but is also often influenced by external data, such as the data we receive for other brokerages (the Greenscreens.ai network).

Target Sell Rate

The Target Sell Rate is determined as a markup of the Target Buy Rate. The markup value can be a percentage or a dollar value. It can be determined through the creation of pricing rules, or generated by the customer’s historic margin on the lane. Without rules or historical data, the markup will default to a baseline of 15%. Here’s an example of a Target Buy Rate: let’s say that for a particular shipment we predict a rate of $1,000. Currently the brokerage has a pricing rule stating that all loads originating in Seattle should have a 10% markup. The sell rate in this case would be $1,100. A markup of 20% would return a rate of $1,200, and so on.

Network Rate

The Greenscreens Network Rate represents the expected rate in the market based on the aggregated data in the Greenscreens network. The Network Rate has no bias towards any one brokerage’s data. It can be lower or higher than the Target Buy Rate, depending on how the brokerage books freight on the underlying lane or similar lanes.

Similar Lanes

So how does Greenscreens handle lanes that don’t see much traffic? No one has historical data on a shipment from Butte, MT to Eureka Springs, AR. Greenscreens can still come up with an accurate prediction on the lane, because there’s plenty of data available for shipments from Spokane to Little Rock that use the same corridor.

A Greenscreens machine learning model recognizes similarities and connections between lane features, such as origin and destination, which lets it make reliable predictions even when historical data for a particular lane isn’t available. The Network Rate widget in the Greenscreens user interface provides numbers for similar historical lanes from Greenscreens’ network, so you can compare the predicted lane to similar ones. This is especially useful when making predictions for lanes with little or no historical data.


Accuracy and Confidence

Model accuracy

AI models are a sophisticated tool that can produce accurate and insightful price predictions for truck brokers. It’s important to understand the capabilities of machine learning. The goal of a machine learning model is not to produce exact predictions 100% of the time, but to minimize the average margin of error across a large data sample.

Greenscreens keeps track of the average margin of error (expressed as percentage) in a model’s predictions to ensure that the AI is working as it should. Since the accuracy of predictions will always show some amount of variation, we use the overall margin of error to assess a model’s performance. Even allowing for these variations, an AI with a low margin of error will still be far and away more accurate than a simple average of rates.

What Do We Mean by Confidence?

Every Greenscreens rate prediction is accompanied by a Confidence Level. This is an AI-generated score to help identify the amount of work that might be needed to find capacity at a specific price. It is not an indicator of rate accuracy, but is intended to take the guesswork out of how hard it will be to cover a load at the given price based on the current market conditions.

The Confidence Level is primarily determined by three things:

  • Density of Historical Data
    Confidence levels tend to be higher on lanes where there is a larger quantity of historical data. The model takes into account both broker-specific data and data provided by the entire Greenscreens.ai network.

  • Market Volatility
    The model takes note of market fluctuations. It considers the rate at which truck rates change over time, both in frequency and magnitude. It also takes into account capacity supply and demand conditions in origin and destination markets. The model feels more confident when markets are less volatile, as pricing is unlikely to change drastically in a short period of time.

  • Spread of Potential Outcomes
    The Target Buy rate is the midpoint of a predicted range of potentially successful rates, influenced by historical data. When this range is narrower, the Confidence Level tends to be higher. If a broker has prices that fluctuate substantially, the model may predict a wider range of potential rates, and confidence will be lower.

Confidence Suggestions

  • Low Confidence (62% and below): We suggest doing one or more of these things:

    • Get multiple bids from carriers before accepting a price.

    • Consider starting negotiations with the Start Rate shown in the user interface.

    • Review the Greenscreens Network Rate and compare its Confidence Level with that of the Target Buy Rate.

    • Give yourself additional lead time in booking the load.

    • Review the data shown by the Similar Lanes feature.

    • Add some additional margin to make sure you’re covered.

  • Medium Confidence (63% - 75%): We suggest getting multiple bids from carriers before accepting a price or adding some additional margin to be sure you're covered.

  • High Confidence (76% - 87%) & Very High Confidence (88% - 100%): A high or very high Confidence Level suggests that you can book now at the given rate.

Target Confidence vs Network Confidence

For every prediction, Greenscreens provides a Network Rate prediction and a Target Buy Rate prediction. The Network Rate prediction is produced by the network model which uses our entire network's load data. The Target Buy Rate is generated by a model trained to predict a specific brokerage’s rates based on their individual buying behavior and buying power, but using both the brokerage’s data and data from Greenscreens’ entire network. Since the Network Rate prediction model works with a different data set, Confidence Levels will vary between these two predictions. This lets you compare your rates to those in the Greenscreens.ai network.

The Network Rate Confidence Level is not specific to any one brokerage, but represents the likelihood that any brokerage within the Greenscreens.ai network will be able to buy a specific lane at a specific price. If a brokerage has a smaller spread of truck rates on a given lane, with more volume consistency, the Target Buy Rate may have a higher Confidence Level. If a brokerage has no historical volume on a lane (and the model was unable to rely on history from other brokerages that were deemed to be similar or other lanes that would be strongly correlated to the requested lane), where the network has a large quantity of historical volume, the Network Rate Confidence will usually be higher.

Greenscreens recommends taking both network and Target Buy Rate Confidence into consideration when deciding how to cover a specific lane. We provide a Better Rate icon in our UI to indicate which rate has a higher Confidence Level.

Predictions and Confidence of Unseen Lanes

Machine learning lets us predict rates for previously unseen lanes by isolating and filtering for specific load features. The models recognize similarities between certain features in the context of historical rates, which allows them to make reliable predictions even if the lane is new to the brokerage. This is another example of the efficiency of machine learning – a human could easily look up a lane with a similar origin and destination, but a machine learning model can find many kinds of similarities and connections across your entire shipping history very quickly and come up with a faster and more reliable prediction, backed up by more extensive data.

It may seem odd for a brokerage to receive high Confidence on a lane they’ve never moved, but the multitude of data points machine learning models take into account lets them provide an accurate rate prediction with high Confidence. Greenscreens’ Accuracy Report breaks out the margin of error numbers for new lanes, so you can see exactly how well the model is doing at predicting rates for lanes with no historical data.


Did this answer your question?