Sartorial.

agentic-commerce

What AI Agents Need From Your Product Catalog

AI agents are a distinct buyer segment with different data needs. Here is what that means for your catalog and why most merchants are missing it.

·9 min read·

By Sumit Jagdale · CEO, CTO

Run a shopping query through an AI agent and the evaluation happens before any product page loads. On the fields your catalog exposes through the protocol surface, the system decides whether your product belongs in the recommendation. Human browsing involves a click, a page view, copy written for a reader who has already signaled interest by arriving on your site. This evaluation works upstream of all of that.

Most merchants have not thought carefully about the distinction. The catalog fields they maintain were optimized for search crawlers and the human eye. Written to rank, to attract a click, to convert someone already looking. An AI buyer arrives through a different entrance and runs a different kind of evaluation, one where your PDP never renders and your carefully written product copy exists only as input to a decision process. That mismatch is what it looks like to have a buyer segment your catalog was never designed to serve.


The stack has a gap that nobody is filling

Bessemer Venture Partners mapped the agentic-commerce stack into three layers: execution, intelligence, and agent. The execution layer is where nearly all current investment flows. Platforms are constructing the pipes. The Universal Commerce Protocol (UCP), backed by Google, Shopify, Etsy, Wayfair, Target, and Walmart, gives AI buyers a standardized surface to query catalogs. The Agentic Commerce Protocol (ACP), OpenAI and Stripe's parallel effort, handles the transaction side. The infrastructure is real and moving fast.

BVP named what sits between the execution layer and the buyer: the intelligence layer, described as "the reasoning, context, and brand truth that sits between catalog and agent." Their assessment was blunt. This is the least-developed part of the stack, the tier where no major platform has filled anything and where third-party entrants are just beginning to arrive.

That gap is where buying decisions currently form without the reasoning they need. Nobody has authored what belongs there. When an AI buyer evaluates your catalog, the execution layer delivers the fields cleanly; the intelligence layer returns silence.

The implications compound as models advance. In the early days of mobile commerce, brands that adapted for smaller screens captured the buyers who moved there first. The dynamic here is analogous, but the required adaptation differs in kind. Optimizing screen layout for mobile is a rendering problem. Building the intelligence layer is an authorship problem, and no platform solves authorship on your behalf.


What the audit found

Sartorial's field audit of 2,483 top US Shopify merchants ran the same signed catalog query across every merchant and examined what came back. The protocol worked. The pipes delivered cleanly. The data they delivered exposed a shared, structural gap.

Zero merchants exposed reviews or ratings through the agent surface. Gymshark, Taylor Stitch, and UNRL all maintain strong review profiles on their own product pages. Each of them drops that signal entirely at the protocol boundary, returning no trace when an AI buyer queries their catalog. Stars exist on the website. The query response carries none of them.

This matters considerably. A 2026 HBR study tested four AI models (GPT-4.1-mini, GPT-5, Gemini 2.5 Pro, and Gemini 2.5 Flash Lite) across eight common promotional mechanisms in more than 16,000 simulated shopping rounds. The headline finding was precise: star ratings were the only badge that consistently pushed selection upward across all four models and every product category. Every other mechanism produced effects that varied by model, by product, often dramatically.

Discounts, countdown timers, bundling, and scarcity cues produced no stable pattern. Strike-through pricing did not behave like anchoring the way it does for humans. Scarcity signals had no effect on some models and, in at least one case, pushed GPT-5 toward lower selection rather than higher.

The more capable the model, the more skeptical it was. GPT-5 and Gemini 2.5 Pro were less responsive to promotional cues overall, and in several cases appeared to penalize them, interpreting aggressive promotional language as a signal of low quality or manipulation. The study's conclusion was direct: "more persuasion produces less selection" as frontier models advance.

The one signal that consistently worked across every model tested: ratings. The clearest form of social proof. The exact signal every merchant in the audit drops at the protocol boundary.

Merchants are maintaining the cues that move human buyers while abandoning the evidence that moves AI buyers, at precisely the surface where AI buyers arrive.


A distinct segment, not an optimization problem

The instinct many merchants reach for is to treat AI optimization as a new acquisition surface, like search engine optimization but for AI. Add structured data, feed cleaner attributes, include some agent-specific schema. The channel-trap thinking runs deep because it has worked before.

An acquisition surface is something you optimize to attract a buyer. An AI buyer executes a mandate. It arrives with goals, constraints, and instructions from the human who sent it, evaluates your catalog against those instructions, and decides. Persuasion cues designed to exploit cognitive shortcuts have no architecture to exploit. The psychological mechanisms that make scarcity badges effective for humans rely on fear of loss and urgency; these are not properties of a language model evaluating a product list.

The HBR study surfaced a concrete version of this gap. Those 50 e-commerce executives surveyed alongside the simulation research mostly believed that cues effective for human shoppers would also influence AI buyers. The simulation data contradicted that belief, consistently, across 16,000 rounds. The confidence gap between what executives expected and what the models did was the most pointed finding in the paper.

The pattern holds across the commercial landscape. Promotional infrastructure in e-commerce was developed over decades with human psychology as its constant substrate: loss aversion, anchoring, social proof, scarcity. Each of those mechanisms is a lever on a specific cognitive pattern. An AI buyer has none of them. What it has is the data your catalog exposes and the mandate it was sent to execute, and those are the only inputs that shape the decision.

What the research shows about how AI shoppers decide goes deeper into the HBR evidence and the behavioral patterns underneath it. The segment-strategy conclusion here is more compressed: AI buyers run a different decision process, and current catalog infrastructure was built for a different buyer type entirely.


What feeds leave out

Ratings are the tractable starting point. They map cleanly to a structured field and the case for their value is direct. Close the social-proof gap first. The remaining gap runs deeper still, into the authored reasoning that no platform will generate on a merchant's behalf.

BVP identified substitution logic as a "strategic battleground" in the intelligence layer. When a buyer's first-choice product is unavailable, the session needs a recommendation and a reason. Feeds carry availability booleans. They carry zero authored logic about what to suggest instead. The buyer either abandons the session or guesses based on surface similarity, neither of which reflects the merchant's actual intent.

Fit logic belongs in the same category. A buyer handling a request for a jacket that "travels well and works for both meetings and dinner" is doing reasoning work. It needs to know which products actually answer that prompt, in terms the catalog does not contain, because catalog copy was written for keyword ranking rather than constraint evaluation. The merchant has considered opinions about which item answers that query. Those opinions live nowhere in the feed.

Policy context shapes comparisons at the moment of selection. A brand with a strong warranty wants that information present when a buyer compares it against a competitor offering a 30-day return window. The feed schema has no field for it.

Voice and routing logic have the same problem. The merchant's authored judgment about which product belongs in front of which prompt, and how the brand should be represented in a conversation, has no place to live in current feed structures. A catalog with four similar jacket styles has opinions about which one leads for "casual weekend," which one belongs in a "gift for dad" session, which one is the right answer for a petite frame asking about travel-friendly options. Those editorial judgments are currently invisible to every AI buyer querying the catalog, because there is nowhere in the feed to put them.

The full five-dimension breakdown lives in the brand-truth layer post and a worked product-page example shows how the output shifts when authored reasoning is present. This post stays at the segment level. The point is structural: this buyer type needs a category of information that current feeds were never designed to carry, and richer schema fields do not solve it.

BVP's framing is exact. This authored-reasoning tier requires people with genuine knowledge of the products, the customer base, and what the brand is willing to promise. Richer taxonomy and cleaner execution-layer data improve how products are found. The intelligence layer governs whether they are chosen, and no platform generates what belongs there. It comes from the merchant.


Where Sartorial fits

Becca Coggins, McKinsey's global lead for retail and CPG, framed the strategic stakes: "Companies have spent decades refining consumer journeys, fine-tuning every click, scroll, and tap. But in the era of agentic commerce, the consumer no longer travels alone. To thrive, brands must rethink the full stack of engagement, not for the people they've worked to understand but for the agents now acting on their behalf."

McKinsey's ecosystem map for agentic commerce names the tier where this work belongs: Adapters and Enablers. Infrastructure companies build the pipes. Adapters and Enablers make what travels through those pipes useful to the new buyer type, translating catalog data into something a buying system can actually reason over. Sartorial is building that authored-reasoning artifact within this tier.

The agentic-commerce inversion covers the structural shift in how product discovery now works, and the bot-builder to bot-server story addresses the merchant-side architectural change underneath it. The specific point of this post is narrower: serving AI buyers well means building for their data needs, which are structural requirements of a distinct segment and cannot be met by optimizing for a different buyer type's decision logic.

Merchants building the intelligence layer now are doing so because they understand that the AI buyer reading their catalog is making a consequential decision on a human's behalf. The authored reasoning they put into that layer travels through every protocol in operation, at every interaction. The gap between merchants who have built it and those who have not widens each round.

Merchants treating AI readiness as a metadata problem will be present in the catalog. Presence and selection are different outcomes.


Sumit Jagdale is the founder of Sartorial.

agentic-commercecatalog-strategyAI-shopping-agentsbrand-truthintelligence-layerbuyer-segments

Related