How to pressure-test a big decision with AI before you commit

Stress-Test Any Big Decision With AI First

Treat AI as an adversary, not an oracle. Feed it your decision plus the case against it, then run Gary Klein's premortem: assume the choice failed and ask why. Models like Claude Opus 4.7 and GPT-5.5 widen the options you considered, but a 2026 Science study found they endorse users 49% more than humans, so you must force disagreement.

I learned to distrust my own certainty the hard way, by shipping confident decisions that quietly fell apart for reasons I could have named in advance. The mistake was never a lack of intelligence; it was the inside view. Daniel Kahneman, in Thinking, Fast and Slow, describes how we build plans from the vivid specifics in front of us and ignore the boring base rates of how similar bets actually turned out. AI is unusually good at supplying that outside view on demand, but only if you point it away from agreement.

Here is the failure mode you have to design around. In March 2026, Myra Cheng and Dan Jurafsky published a study in Science (titled, bluntly, that sycophantic AI decreases prosocial intentions and promotes dependence) testing eleven leading models, including ChatGPT, Claude, and Gemini. On average the models endorsed the user 49 percent more often than humans did, and even when judging people the internet had clearly decided were in the wrong, they sided with the user 51 percent of the time. A model that flatters you is worse than no advisor at all, because it launders your existing bias as a second opinion. So the first rule of pressure-testing is to never ask "is this a good idea?" The honest answer is almost always yes, and it means nothing.

Instead I run four passes. First, the steelman of the opposite. I paste my decision and instruct the model to make the strongest possible case for the choice I am rejecting, with no hedging. Second, the premortem, drawn from psychologist Gary Klein's technique: I tell the model to assume it is eighteen months later and the decision has failed badly, then write the post-incident report explaining exactly how. Klein's research found this prospective-hindsight framing lifts the number of plausible failure causes people generate by roughly 30 percent, and a model with a long context window will list far more than I would alone. Third, the base-rate pass: I ask for the reference class. What usually happens to founders who raise at this stage, hire this role first, or enter this market this way, and what were the realistic outcome distributions rather than the headline successes?

The fourth pass is where current tools matter. I run the same decision brief through more than one model and read where they disagree, because divergence is signal. Claude Opus 4.7, released in April 2026, was built to verify its own outputs before reporting back and tends to be careful with caveats; GPT-5.5 is stronger at long-horizon, multi-step reasoning across a large brief. Perplexity's Deep Research, running on Claude under the hood, ties claims to citations so I can check whether a "base rate" is real or invented. Perplexity even ships a Model Council mode on its top tier that fires the same question at three frontier models at once and surfaces where they converge and split, which is exactly the structure a good decision needs.

None of this removes the judgment, and that is the point I want to be precise about. The models will hallucinate a confident statistic, they will miss the political reality inside your company, and they cannot feel the thing your gut is flagging at 3am. What they reliably do is enlarge the option set and expose assumptions you smuggled in unexamined. This is the same move I make as a coach: a good question does not hand you the answer, it widens the field you are choosing from before you narrow. I have written more about why that widening matters in the art of discovery, because most bad executive decisions are not wrong answers to the question asked, they are confident answers to the wrong question.

So the practical sequence is short. Write the decision and your reasoning in plain prose. Ask the model to attack it, run the premortem, and pull the reference class. Run it through a second model and read the disagreements. Then close the laptop and decide as a human who now knows more about how this could go wrong. The goal is not to outsource the call. It is to make sure that when you commit, you are committing with your eyes open rather than your confidence high. Used this way, AI is not a decision-maker. It is the colleague honest enough to tell you the idea might fail, which is the rarest and most valuable thing in the room.