Sartorial.

agentic-commerce

How AI Shopping Agents Make Buying Decisions

A 16,000-round study tested how AI shopping agents respond to eight promotional tactics. The results challenge how most merchants think about agent-readiness.

·9 min read·

By Sumit Jagdale · CEO, CTO

Most of the executives who run e-commerce businesses believe they understand what makes AI shopping agents tick.

That belief is what makes a particular piece of research so worth sitting with.

In an exploratory survey of 50 e-commerce executives across the U.S. and UK, the majority said they had already noticed traffic or conversion changes they attribute to AI shopping systems. They are actively trying to improve how those systems engage with their products. And many of them are confident: they believe the cues that persuade human shoppers tend to influence AI purchasing behavior in similar ways, and that they already understand which elements of their sites matter most.

The research team that ran that survey then ran a different study. They tested eight common promotional mechanisms across four AI models in more than 16,000 simulated shopping rounds. The executives' confidence, it turns out, is misplaced.


What 16,000 shopping rounds found

The study, published by Jafar Sabbah of Bayes Business School in Harvard Business Review in May 2026, built a simulation of how AI agents interact with e-commerce product pages. Four AI systems: GPT-4.1-mini, GPT-5, Gemini 2.5 Pro, and Gemini 2.5 Flash Lite. Eight promotional mechanisms. Products rotated across everyday categories including a phone, a fitness watch, a washing machine, and a mouse pad. 1,000 rounds per AI system per product, yielding more than 16,000 choice situations total.

The headline finding was pointed: star ratings were the only badge that consistently pushed selection upward across every AI model tested and every category, mirroring the well-established human reliance on quality signals. Effects from every other mechanism varied by product and context, sometimes dramatically.

Scarcity cues had no effect on some models. Countdown timers produced no stable pattern. Strike-through pricing, which works on humans by creating an anchor that frames the discount as a gain, did not produce a consistent response. Bundling sometimes increased selection, sometimes had no effect, and in at least one case reduced it.

That variation is itself a signal. AI agents are responding to these cues, but not for the reasons they work on humans. The psychological mechanisms, loss aversion, anchoring, urgency, social proof in the form of purchase counts, are either absent in these systems or operating very differently than decades of human behavioral research would predict.


When more persuasion makes things worse

The study's most strategically significant finding concerns the direction of travel.

The researchers noted a pattern between model capability and promotional receptivity. The non-reasoning models (Gemini 2.5 Flash Lite and GPT-4.1-mini) were generally more responsive to promotional cues. The reasoning models (GPT-5 and Gemini 2.5 Pro) were less responsive. In several cases, they appeared to penalize overt persuasion cues, as though interpreting heavy promotional language as a quality signal in the wrong direction.

The authors were careful to frame this as a pattern, bounded by model and category, rather than a universal rule. The same badge could produce opposite effects on the same model depending on the product involved. The directional conclusion is still clear: the marketing playbook optimized for human psychology is actively losing its footing with capable AI systems. With the most advanced reasoning models, heavy promotion is working against the seller.

The study's own framing: "The direction of travel is toward agents where more persuasion produces less selection." Merchants that have spent years engineering urgency and scarcity into their product pages are building habits and infrastructure that may increasingly cost them every time a capable reasoning system evaluates their catalog.


The decision gap compounds as agents take on more autonomy

One study of current agent behavior would be significant on its own. What makes it a strategic priority is what McKinsey QuantumBlack's automation curve shows about where things are going.

McKinsey describes six distinct levels of commerce autonomy, from Level 0 (pre-agentic auto-ship subscriptions) through Level 5 (fully networked multiagent coordination). Today's most common experiences sit at Levels 1 and 2: a system compares options and summarizes trade-offs, or assembles a purchase-ready basket for human approval. Meaningful surfaces, but still the conservative end of the curve.

At Level 3, the consumer authorizes the system to execute purchases within defined guardrails. Orders are placed, substitutions handled, exceptions escalated only when something falls outside the rules. The next step, Level 4, operates against standing intents: continuously monitoring needs, optimizing against longer-term goals, with the shopper reviewing outcomes rather than approving individual transactions. Level 5 is where AI systems negotiate directly with other AI systems across merchants and platforms, removing the human from the transaction loop almost entirely.

The decision gap documented by the HBR study compounds with each of those steps. At the Assist stage, a system that misreads your promotional signals produces a flawed summary. Manageable. At the Authorize stage (Level 3), it makes purchasing decisions on someone else's behalf without a human reviewing each choice. At Level 4, operating on standing intents, it shapes a consumer's ongoing supply of goods across months. Each progression raises the cost of having the wrong model of what your products are for and who they serve.

McKinsey's analysis of Level 4 is direct: "It's no longer enough to expose a catalog; retailers must expose the rules and policies that determine what 'good' looks like." That applies to operational logic, and it applies equally to brand reasoning. A system operating against a standing intent needs your substitution rules. It needs your fit logic. It requires the authored context that tells it what you will and will not claim about your products.

The promotional mechanisms the HBR study tested were designed for a shopper who arrives once, evaluates options, and clicks. A Level 4 purchasing system arrives repeatedly, on standing instructions, for a consumer who may not review each individual transaction. The mismatch between the signals merchants are optimizing for and the signals that class of buyer requires is the gap that compounds.


The executive survey reveals the real problem

The executives in Sabbah's survey have noticed the change in their traffic. They are actively adapting. The problem sits upstream of attention: they are applying a channel-optimization frame to a buyer who requires a fundamentally different response.

Treating AI systems as a ranking surface, as a new distribution channel to optimize the way search was optimized, produces exactly the wrong investments. It leads to promotional tuning at the precise moment when promotional signals are becoming liabilities with more capable systems.

Lareina Yee, a McKinsey senior partner, framed the scale of the shift: "Before long, nearly all retailers will have to grapple with the fact that a significant percentage of their customers will not be human users but rather AI agents."

Becca Coggins, also a McKinsey senior partner, framed the required shift: "To thrive, brands must rethink the full stack of engagement, not for the people they've worked to understand but for the agents now acting on their behalf."

Rethinking the full stack is a different project from tuning your badges. It requires a different starting question: what does this buyer actually need to make a sound decision on a consumer's behalf?


What the agent actually needs

The channel-trap framing that I explore in a companion piece shows why SEO-adjacent thinking fails specifically when the buyer is an algorithm. The short version relevant here: a shopping system arrives carrying the consumer's prompt, which may be something like "find the best-reviewed waterproof jacket under $200 that ships in two days." Its job is to evaluate whether your product satisfies those parameters, and its decision logic runs on the instructions it received. Human browsing intent, the kind that scarcity cues and countdown timers were built to convert, has no equivalent in that flow.

The HBR study makes the structural implication concrete. Star ratings succeeded because they are a verifiable quality signal: evidence that speaks to a judgment the system can make on the consumer's behalf. Every other mechanism the study tested was built around psychological shortcuts. Loss aversion, anchoring, urgency, social validation through purchase counts. Those tactics work on people because people are susceptible to them. They work inconsistently on AI systems, and with capable reasoning models they can work in reverse, because heavy promotion reads as a reason for skepticism.

What AI buyers actually require, and increasingly so as autonomy rises, is merchant-authored reasoning. What AI agents actually need from your catalog addresses the data-needs side of this directly. The full picture involves five dimensions that no product feed carries today: fit (who this product is for and when it is the right choice), substitution (what to recommend when the first option is unavailable), policy (what the merchant will and will not promise), routing (which product belongs in front of which query), and voice (how the brand explains itself and what it declines to say).

Structured metadata carries none of that. It is reasoning. It has to be authored by the merchant, because no platform can generate it on the brand's behalf.


The segment-level implication

McKinsey QuantumBlack estimates that AI agents could mediate $3 trillion to $5 trillion of global consumer commerce by 2030. That figure represents a buyer segment, not a traffic source.

The strategic error the market is making right now is scaling up the answer to the wrong question. "How do we perform better on AI ranking surfaces?" is a reasonable starting point if you believe agents are a new channel. It is the wrong frame if agents are a distinct buyer segment with their own decision logic, their own data needs, and their own relationship to persuasion.

The HBR data establishes that the decision logic is categorically different. The McKinsey automation curve shows that the segment is growing in scope and consequence. The agentic commerce inversion describes why the standard commerce stack was not built for this buyer and what that means for merchants trying to compete.

The merchants who treat this as a segmentation problem, who ask what their agent buyers actually need rather than how to rank better in agent outputs, will build the right capabilities. The merchant-authored brand-truth layer is what distinguishes a recommendable product from one that is merely present in an agent's catalog. Product feeds carry the inventory. The reasoning that makes one merchant's inventory more compelling than another's has to travel with it. Today, in almost every catalog operating through discovery protocols, that reasoning is absent.

The research does not make this a future problem. It makes it a present one, at the exact moment when executives believe they have it figured out.


Sumit Jagdale is the founder of Sartorial.

agentic-commerceAI-shopping-agentsbuyer-segmentationbrand-truthpersuasion-researchagent-decision-logic

Related