Case Study · AI UX · OPENLANE

From Forecast to Signal

Designing AI pricing UX for a user segment whose default posture is skepticism of probabilistic models.

8 min read · May 2026

Setup

The model exists. The integration shipped. Adoption is the question.

OPENLANE (Canada's largest wholesale vehicle auction platform) built an in-house machine learning pricing model. It predicts a vehicle's value at today, 30, 60, and 90 days. Internally, it was celebrated as a competitive differentiator and justified a meaningful price increase on the Market Guide (OPENLANE's wholesale comp data tool) subscription. It got integrated into every product surface that already showed historical sold data, and it got its own search-by-VIN entry point in the Market Guide UI. By the time I joined OPENLANE in September 2025, the path was already well into development and confirmed in the UX and UI behavior.

The model itself is sound. What the case study takes up is what a dealer is supposed to do with a probabilistic forecast over a 30-60-90 day curve when their entire decision habit is built around hard comparables.

The dealer segment OPENLANE serves grounds every decision in hard data: auction comps, recent transactions, bills of sale, receipts. When a dealer appraises a vehicle, they are looking for the specific comparable that justifies their number to a banker, an owner, or a buyer. A probabilistic forecast curve is conceptually foreign to that mental model. Worse, a forecast can be falsified by a single counterexample. The moment a dealer finds one vehicle whose actual value diverged from the 30-day prediction, the model's credibility is gone for that dealer.

After rollout, this surfaced in a way I saw firsthand, not only from internal partners and sales reps but directly on dealer visits. Dealers would pull up a specific vehicle, point to one case where the model was off, and say "well, look at this one." The objection is valid. Model accuracy at the aggregate level is demonstrable, but a single counterexample shifts the conversation, and recovering trust is harder than earning it correctly. The rollout produced two distinct trust problems. Reliability: dealers were hitting cases where the model produced a forecast that should not have been shown, because the vehicle fell outside the model's confident operating range. Comparability: the forecast lived in the same UI card as recent OPENLANE sales, with similar visual weight. Dealers assumed the two numbers were meant to be compared, even though one was VIN-specific and the other was a broad cohort. When they diverged, the forecast read as wrong.

Options I considered

By the time I had user research in hand, the integration was past the point of changing scope.

By the time user research was in hand, the integration was past the point of changing scope: the forecast was already surfaced everywhere historical sold data lived, plus a separate VIN search entry point. Four directions were on the table.

1. Lean further into the forecast as product. Gate the forecast on VIN age and odometer ranges (10 months to 20 years, 1,000 to 400,000 km) so it never shows when the model is operating outside its confident envelope. Standardize the "not enough info" messaging. Re-style the Market insights card: a contextual YMMTYear, Make, Model, Trim. The four attributes used to identify a specific vehicle configuration. header, a "New" pill on the forecast component, mileage anchors on the historical comps so the dealer sees what they are comparing to. The team was already on this path and the comprehension fixes shipped.

2. Condition trust through outcomes. The vRankOPENLANE's retail ranking score. A single number that tells a dealer where to price a vehicle listing relative to the local retail market. pattern. vRank earned trust by being right consistently, not by being explainable. Keep showing the forecast. Let outcomes do the work. Wait it out. (vAutoCox Automotive's dominant retail analytics platform, widely used by franchise dealers. One of OPENLANE's main competitive references in the dealer tools space., which is Cox Automotive's dominant retail analytics platform and one of our main competitive references, has a comparable tool.)

3. Push the model output into a different surface entirely. Stop framing the forecast as a destination. Use it as an input to internal pricing tools, but do not put a 30-60-90 day curve in front of the dealer at all.

4. Reframe the model output as an input to a signal, not as the product. Do not show the dealer "this vehicle will be worth $X in 30 days," show them "you are underwater at wholesale, it is time to act." The forecast curve disappears from the surface and a decision-shaped output takes its place, one that maps to the dealer's existing mental model.

Figure 01Market insights card · the comprehension fixes that shipped under option 1

Market insights for 2023 Porsche Taycan 4S AWD

Pricing forecast NEW

OPENLANE wholesale data as of May 24, 2026

$28,450 today · 30d · 60d · 90d

VIN · 42,300 km · Inspection-grade 3

Historical sales (24)

OPENLANE wholesale data, past 90 days

Lowest$24,10061,200 km Avg$26,950 Highest$31,20028,400 km

A contextual VIN header, explicit scope labels on each section, and mileage anchors on the highest and lowest comps were the comprehension fixes that landed. The forecast now reads as being about this VIN while the historical band is clearly about a cohort. Dealers can still disagree with either, but they disagree on the right terms.

The decision and why

Why option 4 is the right answer, and the trigger logic that makes it credible.

Option 1 was the path the rollout was already on. It produced an initial usage bump followed by decline. Our American counterparts, the team running a similar pricing model on the US data set, saw the same shape: discovered, tried, not returning. Option 2 (vRank-style trust through outcomes) is plausible but the product shape is wrong. vRank works on a single number mapping to a single decision (where to price a retail listing). A 30-60-90 day depreciation curve is not a single number. It is a curve over a time horizon, in a segment whose decisions live in days and hours. The product shape is wrong before model accuracy is even relevant. Option 3 (kill the dealer surface) loses the AI appeal and would have been hard to defend given the internal investment.

Option 4 is the path I have been pushing for. Surface the conclusion the forecast supports instead of the forecast curve itself: "predicted wholesale margin if you sell today is negative, time to act," or "this unit is aging past the segment median, here is the wholesale exit price the forecast supports." The model still runs underneath, but the dealer never sees a probability distribution. What they see is a decision they can defend, a number tied to an action.

For the signal to be trustworthy, the trigger has to be conservative. We do not surface a signal every time the model produces a number. We surface when the forecast diverges meaningfully from the historical baseline, in a statistically defensible way.

# diff = absolute distance between today's prediction and the
# historical average for comparable vehicles
diff = abs(TodaysPrediction - HistoricalAverage)

if not enough comps or stdev <= 0:
    if diff / HistoricalAverage >= 0.25:
        surface signal
else:
    if diff >= 2 * stdev:
        surface signal

Standard deviation when we have it, a 25% gap when we do not. The dealer only sees a signal when the model has earned the right to surface one, which means the math stays inside the system and what reaches the dealer is the conclusion.

For this release, I proposed two options for the trigger logic. The more precise version used the statistical approach above. The simpler version was a plain-language disclaimer surfaced whenever forecast and historical sales diverged: "Today's forecast may differ from the historical sales average. This can happen when the VIN details, condition, mileage, or options differ from recent comparable sales." The simpler version shipped. It is honest and clear. The statistical trigger is still the right long-term answer and the argument for it is still standing.

Figure 02The conceptual shift · forecast as product vs forecast as input to a signal

Live today

Forecast as product

$28,450today $27,80030d $27,10060d $26,40090d

The dealer is asked to read a curve. Probabilistic. Hard to act on. One vehicle that does not follow the curve destroys trust in the whole model.

Where the product needs to go

Forecast as input to a signal

Underwater at wholesale

Predicted wholesale exit is below your cost basis. Listing today protects an estimated $1,200 vs holding 30 more days.

The dealer is asked to make a decision. The output is shaped like a choice they already know how to make, and the probabilistic machinery stays in the system rather than on the screen.

The trade-offs I accepted

Pushing back on internal momentum is the slow path. I took it.

Pushing back on senior leadership's framing. The pricing forecast was widely regarded as a flagship application of in-house ML, so reframing it from "the product" to "an input to a signal" amounted to telling internal stakeholders that the surface they had celebrated was the surface the user did not want. That conversation is harder when the audience is invested in the original framing, and the only way to hold it is to lead with the user evidence: dealer skepticism patterns documented in the objection-handling material, the US team's matching usage curve. The data has to carry the argument, the conclusion follows from there.

Slower path to ship. Signal surfacing requires new product work that the forecast integration never needed: trigger logic, signal taxonomy, and surfacing rules per product context. The forecast was a single integration, while the signal is a system. In the comprehension PRD the smarter trigger was scoped, agreed on, and then placed in "Not in scope" for the release, with the simpler plain-language disclaimer shipping instead. That was the right call for the release on the calendar in front of us, but the fuller trigger is still the right long-term answer.

Some loss of AI demo appeal. A 30-60-90 day depreciation curve makes a great screenshot, while a small alert that reads "you are underwater at wholesale" does not. The marketing surface gets quieter as the user surface gets louder, and that is a trade I accepted on purpose.

The consequence

What shipped, what didn't, and the metric that would prove the signal version right.

The forecast-as-product version is live. The comprehension improvements shipped: confidence gating on age and odometer, standardized "not enough info" messaging, clearer differentiation between forecast and historical sales in the Market insights card, mileage anchors on historical comps, and French translations for Bill 96 compliance. The objection patterns documented for sales (price too high or low vs market, outside the dealer's historic range, preference for Black Book or Manheim, "I don't understand how the model arrived at this number") are exactly what user research predicted.

The conceptual shift to signal is in flight. Our strategy for the portfolio of dealer tools frames it as Trust and Interpretability, the second-ranked problem we need to solve. In the meantime the forecast has been integrated directly into our inventory management surface, surfacing margin signals alongside inventory data since early May 2026, and early engagement in that context is running above the standalone forecast baseline.

We have a roadmap but not yet shipped product on the full signal version, and the next leg of work is to build a surface that prompts an action rather than just delivering insight. That work sits inside a longer arc toward a single pane of glass where dealers manage their capital in one place.

If we land the signal version, the success metric should not be forecast engagement. It should be dealer action conditioned on the signal: how often does a surfaced "underwater at wholesale" message lead to a wholesale listing or a price adjustment, and what is the realized margin compared to similar units that did not get the signal? That metric ties the model directly to dollars saved or earned.

What I would do differently

Push the "so what for the dealer workflow" question earlier than internal AI excitement.

I would have pushed earlier on the "so what for the dealer workflow" question. By the time user research explicitly named the conceptual mismatch, the rollout was already deep. Nobody was asking what a 30-60-90 day depreciation curve actually tells a dealer in an appraisal moment. Asking that question before the integration scoped would have changed the shape of the work.

I would have led with vRank as the reference model earlier and more explicitly. It is the closest thing in the dealer's tool stack to a black-box AI model they trust. Studying how it earned that trust earlier would have surfaced faster that a depreciation curve over a 90-day horizon is the wrong product shape for this user segment, regardless of how accurate the model is.

Early in this role I was too deferential about a direction I had real concerns with. The data proved the concerns out. The instinct I am taking forward is that when the research is telling me something is wrong, that signal is worth more than the calendar cost of saying so before momentum has set.

Restricted preview