Total Performance Ratings V2: 836,398 races, and a rating that means what it says.

We have rebuilt the rating engine that powers EquiAnalytix. Tested across more than 800,000 walk-forward predictions, the new TPR is calibrated to within 0.07 of a percentage point. Here is what changed, what V1 was getting wrong, the new variables that go into it, and why this should be your first stop when analysing any race.

THE SHORT VERSION

The number now means what it says. A TPR of 25 is a 25% chance to win, calibrated to within 0.07 of a percentage point across 836,398 historical predictions.
The model picks more winners. V2 captures 73% of all winners inside its top four ranked horses, 5.5 percentage points more than V1.
It works in every code. Flat, All-Weather, Hurdle, Chase and NH Flat â€” V2’s top pick wins between 24.7% and 30.1% of the time.
Four new variable families feed it. Pace dynamics, class profile, form trajectory and handicap-mark intelligence now contribute over a third of the model’s decision-making.
It tells you when to trust the market â€” and when not to. When V2 agrees with the favourite, that favourite wins 36.6% of the time. When V2 dissents, the same favourite wins just 24.7%.

Every horse on the EquiAnalytix dashboard carries a TPR. Most readers know that number as a quality indicator, the higher the better, but in the new framework TPR is something more specific and more useful. It is the calibrated probability that the horse wins today’s race, expressed on a 0 to 100 scale. A TPR of 25 is not a vibe. It is a statement that horses in that band, across more than 800,000 walk-forward predictions, win 24.3% of the time.

That sentence is the entire point of what follows. The previous version of TPR could not have made it.

Why we rebuilt it

The original TPR rating, which we will call V1, was a random forest model that had been the backbone of the launch of EquiAnalytix. It worked well enough as a ranking tool, separating likely contenders from outsiders, but its numerical outputs had stopped meaning what the dashboard implied they meant. When V1 said a horse had a figure of 50 or better, those horses won 16% of the time. When V1 said 30 to 50, the actual win rate was just under 10%. The model was projecting confidence it could not back up.

This is what statisticians call a calibration failure. The rankings were broadly sensible, the probabilities were not. And because the dashboard treated the number as a probability, subscribers reading “TPR 45” reasonably believed they were looking at a 45% chance. They were not.

So in April we lifted the bonnet, took the engine out and rebuilt it. But we did not stop there. The data going into the model needed work too.

Step one: a richer set of inputs

Before touching the model itself, we rebuilt the variable layer. The old TPR knew speed figures, condition fit (going, distance, track), trainer and jockey form, and basic breeding. That covered most of the standard form-book analysis a racing reader does. It missed the four things that have, in our testing, repeatedly separated the winner from the rest of the field.

Pace dynamics. Where does this horse run relative to the rest of the field? Is there a confirmed front-runner expected to dictate the gallop, and how many rivals are likely to challenge for that lead? PaceScore quantifies a horse’s running style from their last few outings on a 1 to 4 scale. PacePosition ranks them within the field, and RacePaceForecast counts the number of front-runners expected. A lone front-runner on a track that suits leaders is a structural edge the form book often misses and one we cover in detail on the Pace & Draw Bias hub.

Class profile. Beyond raw speed figures, where has this horse been competing? ClassDelta tracks whether they are moving up, holding steady, or dropping in grade compared to their recent average. A horse with a ClassDelta of minus five has been racing at a much higher level than today’s race and could be well ahead of these rivals on ability alone.

Form trajectory. A T-1 figure of 95 means different things depending on what came before it. RecentTrend captures whether a horse is improving or regressing run-on-run. PeakToRecent measures how far they are from their best. FormStdDev tells you whether they are reliable or volatile. These are signals an experienced reader of a form-book finds intuitively. The new model finds them automatically.

Handicap mark intelligence. For handicap races, ORDelta compares a horse’s current official rating to the mark at which they last won. ORTrajectory tracks whether the handicapper has been raising or lowering them over recent runs. A horse dropping back towards their winning mark while showing signs of life is the classic “well-handicapped” profile that the racing press has always written about. We codified it.

These four categories were not just bolted on. They were tested for predictive value before earning their place in the model. They earned it.

Step two: a better algorithm

The model itself makes three structural changes from the old one. None of them are flashy. All of them matter.

It uses a different algorithm. The random forest is gone, replaced by a gradient-boosted tree model (XGBoost). Boosting handles imbalanced data, which is exactly what horse racing is, far more honestly than a forest does.

It is calibrated separately from its training. After the model produces its raw probability, a second step (isotonic regression) maps that raw score to the actual historical win rate at each level. This is the step that V1 was missing. It is also the reason V2’s numbers can be read at face value.

It weights recent data more heavily. Horse racing changed materially around 2020, and a model that treats 2015 and 2024 races as equal is fitting itself to a sport that no longer exists. V2 uses a four-year half-life, which means the most recent four years carry roughly the same total weight as everything before them combined.

The headline result: it says what it means

The cleanest way to assess any probability model is to bucket its predictions and ask whether the actual win rates match. If the model says “20 to 30%” 65,000 times, the horses in that bucket had better win between 20 and 30% of the time.

Here is the result for V2 against V1, across 836,398 walk-forward predictions covering 2019 to April 2026:

V2 win rates (navy) sit almost exactly on the predicted line (gold). V1 win rates (light blue) flatten out from the 10% band upwards, regardless of the model’s confidence level.

The navy bars are V2’s actual win rates. The gold dashed line marks where a perfectly calibrated model would land. V2 is on or beside that line in every band. The largest miss anywhere up to a 50% predicted probability is 0.15 of a percentage point. Weighted across all 836,398 predictions, V2’s average calibration error is 0.07 of a percentage point.

The light-blue bars are V1. From the 10% predicted band upwards, the bars flatten. Whether V1 predicted 15% or 50% made little difference to whether the horse won, because V1 was using high confidence on horses that did not deserve it. Its average calibration error across the same predictions was 25.3 percentage points.

That is the difference. V2’s number means what the number says. V1’s did not.

How many winners does V2 capture?

Calibration tells you the numbers are trustworthy. The next question, the one every racing reader actually asks, is simpler: if I work my way down your top picks, how often is the winner in there?

V2 has the winner in its top 4 in 73% of all races, captures roughly half of all winners in just its top 2, and recovers more winners than V1 at every level.

Across the 88,084 races in our walk-forward set, V2’s top-ranked horse wins 29.4% of the time. Extend that to the top two and 48.3% of all winners come from those two picks. Top three captures 62.2%. Top four captures 73.0%. By the time you have read the top five horses on the dashboard, you have considered the eventual winner in over 81% of races.

The market favourite still captures more winners at every level, because bookmakers price information our model cannot see: race-day moves, jockey instructions, stable whispers. That is the market doing its job. But what matters is the V2 line against the V1 line. That is the apples-for-apples test of whether the rebuild was worth it.

The gain is consistent and material:

Top-1: V2 captures 29.4% of winners, V1 captured 23.7%. That is 24% more winners in the top pick alone.
Top-3: V2 captures 62.2% of winners, V1 captured 55.9%. A 6.3 percentage point gain on every race the platform analyses.
Top-4: V2 captures 73.0% of winners, V1 captured 67.5%. The same 5.5 percentage point gain holds at deeper coverage.

For a typical six-race card, that gain compounds. Across the meeting, a top-4 shortlist drawn from V2 will, on average, contain the winner of one race more than the same shortlist drawn from V1. Day after day, that adds up.

Where the new model gets its signal

One of the questions we get most often is what actually drives a high TPR. The answer is not “speed figures” alone, even though that is the assumption most form students make about any rating system. It is genuinely a blend across nine analytical categories, none of which dominate.

Connections and speed figures contribute the largest single shares, but the four new categories added in V2 collectively contribute over a third of model decision-making.

Connections (trainer and jockey form, partnership statistics) and speed figures (T1, T2, T3 and the going-adjusted variant) each contribute around a fifth of the model’s decision-making. That should not surprise anyone who has ever read a form book. What might surprise readers is what the rest of the model does.

Pace dynamics, the category that did not exist in V1, contributes 13.9% of model signal on its own. That is more than breeding or class profile, both of which the V1 model had access to. Across the four new categories added in V2, the model draws over a third of its analytical weight (36.3%) from variables that did not feed into the rating a few months ago. The numerical TPR you see today is a meaningfully different number, drawn from a meaningfully wider read of each race.

This breadth is the point. Any single category, on its own, will be wrong on some days. Speed figures fail when a horse is dropping in class or stepping back up. Trainer form fails when a yard has just had a busy week. Breeding fails when the form book has already shown what a horse does. The model wins when several of these signals point the same way at once. That is what convergence means, and it is what we built the new variable layer to do.

Hit rate by rank, when V2 disagrees with the market

V2’s top-rated horse wins 26.0% of the time, against V1’s 22.3%. The market favourite wins more often (33.5%), but at much shorter odds.

V2’s top-rated horse wins one race in four (26.0%). V1’s wins less than 22.3% of the time. That is a meaningful gap, not a rounding-error gap, and it widens further when V2 and V1 disagree. On the 58% of races where they pick different horses for top spot, V2’s pick wins 22.7% against V1’s 15.8%, a roughly 1.44 times improvement in head-to-head agreement-when-it-matters.

The starting price favourite wins more often than either, at 33.5%. As above, this is the market doing its job, capturing information the model will never have access to. The right question is not “can V2 beat the favourite”, because that is the wrong question. The right question is what V2 does relative to the favourite.

Confirming and denying the market

This is where the new model earns its place on the dashboard. Look at what happens to the starting price favourite when we condition on V2’s view of the race:

The SP favourite wins 36.6% of the time when V2 also rates them top-2. When V2 has them rank 4 or worse, that drops to 24.7%, a fall of nearly twelve percentage points.

Market favourites are not all equal. When the market and the model converge on the same horse, you are looking at a horse that wins 36.6% of the time at an average starting price of 2.91. When the model dissents, dropping the same favourite outside its top three, that horse wins 24.7% at an average starting price of 3.89. The market has missed something the model can see, often related to pace, going suitability, or trainer momentum that the price formers have not fully discounted. This is exactly the kind of signal our Market Movers hub is designed to surface.

This is not a shortcut to picking winners. It is a structural read of the race. Knowing which favourites the data agrees with, and which it does not, changes how you weight every race on the card.

It works in every code we tested

One sensible worry with any rebuild is that the model fits one type of racing well at the expense of the others. We checked.

V2’s top-rated horse wins between 24.7% and 30.1% of the time across every race code, against a random-pick baseline of around 9%.

Across Flat Turf, All-Weather, Hurdle, Chase, and NH Flat (bumpers), V2’s top-rated horse wins between 24.7% and 30.1% of the time. The random-pick benchmark, the rate you would expect from drawing a name out of a hat, is around 9%. Hit rate is highest in National Hunt Flat, where breeding signals dominate and the model leans hard on sire and damsire metrics that the market is slower to integrate. It is lowest on Flat Turf, where the market is most efficient, but still nearly three times what chance would produce.

How a single race looks through the new model

The aggregate numbers tell you the model works. What they do not show is how the model reads any individual race. The cleanest way to demonstrate that is with one.

On 16 February 2024 at Wolverhampton, a 10-runner handicap over 9.5 furlongs on the All-Weather Standard surface. Here is how the dashboard read the race:

Top five horses by V2 rank. Wolverhampton, 16 February 2024, 9.5f AW handicap.

Three different “top picks” went to three different horses. The V1 model had POLLING DAY at the top of its ranking. The market had AL RUFAA as the 3.50 favourite. The new V2 model rated WADACRE GOMEZ and CIVIL LAW as joint top-rated, both on a 16.7% chance, with the next horse 2.5 percentage points back.

What did V2 see that V1 and the market missed? Three things, all visible on the dashboard.

The speed-figure dominance was real. Wadacre Gomez carried a T1 of 160, the highest last-time-out figure in the field by a clear 37 points. V1 saw the same number but compressed it against other ratings and ranked the horse fourth. V2 recognised that this kind of T1 advantage on a horse with proven track-and-distance form (TrackTPR 160, DistTPR 160) carries more predictive weight than the V1 framework gave it.

The pace setup was perfect. Wadacre Gomez carried a PaceScore of 4.0, the highest in the field, with a PacePosition of 1. Only one rival (Polling Day, PaceScore 3.7) had a meaningfully forward style. On Wolverhampton’s All-Weather, a lone or near-lone front-runner with track form is structurally advantaged. V1 did not see pace at all. The market gave the angle only partial credit.

The condition fit was deep. GoingTPR of 130, DistTPR and TrackTPR both at 160. The horse had run to this level on this surface at this distance at this track. The figures all converged.

The other piece worth noting is V2’s number itself. Wadacre Gomez was rated at 16.7% to win. The starting price implied 23%. The data thought the market was undercooking the horse. He won, drifting in on the line. V2’s top-two picks went on to finish first and third. V1’s top pick finished fourth. The market’s top pick finished second.

Why this should be your first stop when analysing a race

The case for using the new TPR as your starting point on any race is built on three things, all backed by the numbers above.

It tells you what you can actually do with the number. A V2 of 25 means a 25% chance. You can compare that to the price on offer, work out whether the bookmakers are over or under-rating the horse, and decide whether the position has value. Under V1 you could not do that, because the number did not mean what it said. The same applies further down the field: a horse showing V2 of 8% is genuinely an 8% chance, neither a contender being undersold nor an outsider being puffed up.

It works across every code, every field size, and every type of horse. Flat Turf, All-Weather, Hurdle, Chase and NH Flat all show V2 top-rated win rates between 24.7% and 30.1%. Smaller fields, larger fields, novice events, listed races: the calibration holds in every slice we tested. You do not have to swap analytical frameworks depending on what race is in front of you.

It compresses what an experienced reader does, into a single number. The most experienced form students in the country look at speed figures, going fit, distance fit, track form, trainer momentum, jockey booking, pace shape, class movement, handicap angle and breeding pattern. Each one of those reads contributes to a final view of the race. The new TPR is that read, executed consistently, in seconds, across every runner in every race the platform covers. You do not stop thinking. You stop missing things.

What this means when you read the dashboard

Three practical changes follow from the rebuild.

Take TPR literally. A horse with TPR 28 has a 28% chance of winning. Not “looks competitive”. Not “in the mix”. A genuine 28% chance.

Pay attention to where V2 and the market disagree. When the top-rated horse on the dashboard is also the market favourite, you are in agreement-territory and looking at a roughly 37% chance with a roughly 2.9 starting price. When the dashboard’s top pick is at an inflated price because the market disagrees, you are looking at a structural position. The convergence cases — where V2 and the market both rate a horse highly — are the strongest analytical starting points on any race card.

Do not follow the top pick blindly. Across all 88,084 races in our walk-forward set, level-staking the V2 top pick at starting price loses 11.5%. The model is a probability tool, not a betting system. The analytical edge comes from comparing the TPR against the price on offer and identifying where the model’s read of the race is most structurally sound.

The honest framing

No model that publishes its calibration this honestly is going to claim it has solved racing. V2 has not. What it has done is replace a rating system that meant the wrong thing with one that means what it says, on data the model never saw at the time it was trained, across more races than most readers will see in a lifetime.

Every number in this article was generated by the walk-forward methodology described above, on the same dataset, with no peeking. The 0.07 percentage point calibration error, the 73% top-4 capture, the 24.7% to 30.1% cross-code hit rates — all of it is out-of-sample. The model never saw the races it was being tested on.

If you have been on the dashboard for a while, the TPR you are looking at this week is not the same number it was a month ago. It is sharper, honest about what it does not know, and finally lined up with what the column header claims to be. Read it that way and the data starts doing more of the work for you.

Frequently asked questions

What is TPR?

TPR (Total Performance Rating) is EquiAnalytix’s proprietary machine-learning rating. In the V2 framework it is the calibrated probability, on a 0 to 100 scale, that a horse wins today’s race. A TPR of 25 corresponds to a 25% chance, validated against 836,398 historical predictions to within 0.07 of a percentage point.

What is walk-forward testing?

It is a way of testing a model only on races it could not have seen at the time it was trained. The model trains on data up to a cut-off year, then predicts the next year. The cut-off rolls forward each year. Every number in this article was produced by predicting one year at a time using only data available before that year. There is no peeking.

How does V2 compare to V1?

V2 captures 5.5 percentage points more winners than V1 inside its top 4 ranked horses (73.0% vs 67.5%), and its average calibration error fell from 25.3 percentage points to 0.07. V2 also adds four new variable families pace dynamics, class profile, form trajectory and handicap-mark intelligence that together contribute 36.3% of model decision-making.

Does the V2 top pick make a profit at level stakes?

No. Across all 88,084 races in the walk-forward set, level-staking the V2 top pick at starting price loses 11.5%. The model is a probability tool, not a betting system. The analytical edge comes from comparing the TPR against the price on offer, not from following every top pick mechanically.

Does V2 work in jumps racing as well as Flat?

Yes. V2’s top-rated horse wins 24.7% of the time on Flat Turf, 25.3% on All-Weather, 28.0% over Hurdles, 26.3% in Chases and 30.1% in NH Flat. The calibration holds in every code we tested.

How often is V2 updated?

The model is retrained as new data flows in and recalibrated against the most recent walk-forward results. The daily TPR ratings on the dashboard always reflect the latest production version of the model.

See V2 on the dashboard

Members get the daily TPR ratings across UK, Irish, Hong Kong and UAE racing, plus the full pace, class, condition, handicap and breeding variables that feed into the model.

Explore the data — full access from just 27p a day