Skip to main content

understanding holdout, in-sample, and out-of-sample in the algo optimizer

a plain-english primer on the algo optimizer's holdout validation — what in-sample and out-of-sample mean, what training % does, how to read the OOS deltas, and how to avoid the 3 most common misreads.

Written by Brad

summary: the algo optimizer has a powerful feature called holdout validation that protects you from overfit results. this article explains the concepts in plain english — what's in-sample vs. out-of-sample, what training % does, and how to read the OOS deltas in your results without misinterpreting them.

if you've used the algo optimizer and noticed terms like in-sample, out-of-sample, training period %, and OOS Δ — this is the primer. for the full UI walkthrough, see algo optimizer.

the problem this solves — overfitting in plain english

when you run an optimizer over thousands of setting combinations, the engine is searching for the version of your algo that performed best on a specific slice of history. but "performed best on history" isn't the same thing as "will perform well going forward."

with enough combinations to choose from, the optimizer will eventually find a setup that fits the noise in your backtest period — the random ups and downs that happened to favor a particular configuration. that setup looks great in the results table, then falls apart the moment you trade it live. that's overfitting.

holdout validation is the safety net. it splits your historical data in half and only lets the optimizer "see" one half. once the winners are found, the engine runs them against the other half — the half they never saw — and shows you what happened. setups that hold up on unseen data are real candidates. setups that fall apart are overfit and skipped.

the 3 concepts

in-sample data (a.k.a. the training set)

the slice of history the optimizer is allowed to see while searching for winning settings. every trade in this window is used to score and rank the top 20 combinations.

when you set training period % to 80, the in-sample data is the first 80% of dates in your backtest period.

out-of-sample data (a.k.a. the test set, or the held-back data)

the slice of history the optimizer is not allowed to see during the search. it's held back. once the top 20 winners are found on the in-sample data, the engine runs each one on this out-of-sample slice and reports how it performed there.

with training period % at 80, out-of-sample is the final 20% of dates in your backtest period.

holdout (the technique)

"holdout" is the name of the technique itself — the act of holding back a slice of data so the optimizer can't peek at it during the search. when you flip the validation toggle from standard to holdout, you're turning on this technique.

one note on vocabulary: you'll hear that held-back slice called a few different things — out-of-sample, walk forward, holdout, the test set. they all point at the same idea: settings tested on data the optimizer never got to see while it was searching. across edgeful we mostly say out-of-sample (OOS).

what training period % does

training period % is the slider that controls how the split is made. defaults to 80.

  • training period % = 80 — the optimizer sees the first 80% of dates while searching. the last 20% is held back for testing

  • training period % = 70 — the optimizer sees 70%, tests on 30%

  • training period % = 90 — the optimizer sees 90%, tests on 10%

80/20 is the sensible default. higher training % gives the optimizer more data to find winners but less to validate them on. lower training % does the opposite — the search gets weaker, the verdict gets harsher.

anything below 60 starves the search and the top 20 starts fitting the small training slice. anything above 90 leaves so little out-of-sample data that the verdict becomes noise. start at 80.

standard vs holdout — when to use which

prefer to watch? here's André walking through standard vs holdout:

standard runs the optimizer on the full backtest period. every trade in the window is used to find the top 20. it's fast and it uses all your data.

  • use standard for — fast iteration, exploratory runs, narrowing down which ranges and constraints to try

holdout splits the period into training and held-back slices. slower than standard (the engine effectively runs each top combination twice), but gives you a verdict on whether the winners actually hold up.

  • use holdout for — verifying winners before you trust them with real money, comparing candidate setups, anything you're seriously considering pushing live

a practical workflow: run standard first with wide ranges to see what kinds of setups win. then re-run with holdout on tighter ranges around those winners to confirm they survive out-of-sample.

reading the OOS deltas in your results

once a holdout run finishes, the results table picks up two extra columns:

  • OOS win Δ — how much the win rate changed from training to out-of-sample. negative numbers mean the win rate dropped on unseen data

  • OOS PF Δ — how much the profit factor changed from training to out-of-sample

concrete read on the numbers:

  • OOS win Δ within ±5%, OOS PF Δ within ±0.3 — the strategy held up. real candidate

  • OOS win Δ drops 10%+ or OOS PF Δ drops 0.5+ — the edge didn't survive out-of-sample. overfit; skip

  • OOS Δs are positive — rare but real. the strategy did better on unseen data. don't get too excited about a single run; check the strategy health card and rerun on a longer period to confirm

always cross-check the OOS Δ columns against the strategy health card on the report tab. the card runs 7 independent checks on top of the holdout — if it flags concentration risk or performance decay even when the OOS Δs look small, trust the card. the two systems catch different failure modes.

3 common misreads

these come up over and over once people start running holdout.

1. "a lower training % is stronger validation" — no. lowering training % from 80 to 70 doesn't make the result more rigorous. it shrinks the search space the optimizer can work with and gives you more test data — the search gets weaker, not stronger. 80/20 is the right default.

2. "my strategy validates at 70% but breaks at 80% — should I use 70%?" — no, the opposite. a strategy that only validates at one specific training % is leaning on a particular slice of recent data. a strategy you can trust holds up across multiple training % values. run the same algo at 70, 80, and 90 and compare. if the winners shuffle dramatically, the algo is fitting noise. if the same setups stay near the top with small OOS Δs across all three, that's confluence.

3. "the OOS Δs are slightly negative — does that mean the strategy doesn't work?" — not necessarily. a small drop on unseen data is normal and expected; live trading is almost always a little worse than the cleanest backtest period. the question is how much. ±5% on win rate and ±0.3 on profit factor are the rules of thumb for "held up." beyond that, the edge is decaying meaningfully.

what to do with this

a practical sequence once you've understood the concepts:

  1. run a standard optimization first to see what kinds of settings rank well

  2. re-run with holdout at 80/20 to verify the top picks survive out-of-sample

  3. check the OOS Δ columns and the strategy health card on the report tab — both should agree the setup is solid

  4. run the same algo at 70/30 and 90/10 as a robustness check — if the same setups stay near the top, you have a real edge

finding an algo that holds up takes customization, testing, and time. the optimizer compresses the search part — but reading the holdout results correctly is what turns the output from a number generator into an actual edge-finding tool.

related articles

Did this answer your question?