Karpathy's Loop Meets Goodhart's Law: Introducing the Governance Gauntlet for Auto-Research

AI GovernanceReserachAgentic AIAI & Society

Apr 24

The Karpathy Loop is one of the clearest ideas to emerge from machine learning in 2026: give an AI agent a single editable surface, a single scalar metric, and a time budget, and let it iterate. Andrej Karpathy distilled recursive self-improvement to its load-bearing structure, and practitioners have been replicating and extending the pattern ever since. That is exactly the right response to a powerful idea.

This post makes one argument. The Karpathy Loop is an excellent engine. Every engine running under load needs a governor. When the loop optimizes against a metric that cannot fully capture what the system is actually supposed to do, the engine will, inevitably, find ways to move the gauge without doing the work. Charles Goodhart named this failure mode in 1975. The two ideas; Karpathy's efficiency and Goodhart's caution; now need to meet in a single operational framework.

That framework is the Governance Gauntlet. This post explains what it is, how it works, how it was empirically tested, and how you can deploy it in your organization today.

Three things you will take away: a clear picture of where the vanilla Karpathy Loop is exposed to specification gaming; a detailed description of the dual-rubric extension that closes that gap; and a practitioner checklist you can run against any auto-research deployment in the next 90 days.

What is the Karpathy Loop?

The Karpathy Loop is a recursive self-improvement protocol for automated research and optimization. It specifies three non-negotiable primitives:

1. One editable surface; a single component of the system that the AI agent can modify.

2. One scalar metric; a single numerical score that determines whether a proposed edit is accepted or reverted.

3. One time budget; a fixed wall-clock constraint on each trial.

The loop runs as follows. A meta-agent proposes an edit to the editable surface. The system evaluates the result. If the scalar metric improves, the edit is kept. If it degrades, the edit is reverted. The cycle repeats until the time budget is exhausted or a ceiling is reached.

Karpathy's contribution was the decisive simplification. Earlier work on neural architecture search, automated machine learning, and self-modifying code all tackled the same underlying goal; letting machines improve machines; but the theoretical apparatus was often heavier than the engineering insight it carried. Karpathy's loop removed the scaffolding and left the load-bearing wall. The result is a protocol that any engineering team can implement, verify, and audit.

The loop is not new as a concept. What is new is its legibility. And legibility, as any governance specializt knows, is the prerequisite for accountability.

Why the Karpathy Loop worked in six weeks

Andrej Karpathy released his auto-research loop in March 2026. By April, three independent replications had established that the pattern generalizes beyond its original machine-learning substrate.

Kevin Gu extended the loop to meta-agent self-modification: rather than optimizing the task-agent's outputs, the meta-agent rewrites its own harness. The loop's single-metric discipline held under that additional degree of freedom. The keep-or-revert rule contained the search space even when the thing being rewritten was the agent doing the rewriting.

Tobias Lütke and the Shopify engineering team applied the loop pattern to production code optimization in a non-ML domain. Lütke reported that an overnight run of 37 experiments delivered a 19% performance gain. The transfer from training-code to commercial-software contexts validated that the three-primitive structure is domain-independent.

The SkyPilot team applied the pattern to scaling ML experiments across a GPU cluster, parallelizing the single-GPU loop across 16 GPUs and driving validation loss from 1.003 to 0.974 over 910 experiments in eight hours. Again, the loop's minimalism was the feature, not a limitation. The scalar metric (validation bits per byte) gave the system a clean decision surface across all parallel arms.

What these replications share is also what they inherit. In each case, the loop was given a metric that tracked the thing it was optimizing for reasonably well. The single-metric discipline works when the metric is a faithful proxy for the goal. The question this post addresses is what happens when it is not.

Where the Karpathy Loop needs governance

Charles Goodhart was a British economist at the Bank of England. In a 1975 paper on monetary policy in the UK, he observed that any statistical regularity will tend to collapse once pressure is placed upon it for control purposes. Anthropologist Marilyn Strathern later generalized that insight into the formulation most practitioners now use: when a measure becomes a target, it ceases to be a good measure.

This is now called Goodhart's Law. It is the oldest, most reliable result in measurement theory, and it applies directly to any optimization loop that runs long enough or explores aggressively enough.

Here is the failure pattern in a Karpathy Loop context. The loop is given a scalar metric M. M is meant to proxy for quality Q. The meta-agent proposes edits. Early edits that genuinely improve Q also improve M, so they are accepted. This is the virtuous phase of the loop. It works.

But as the easy genuine improvements are exhausted, the optimization pressure increases. The meta-agent's search space has not changed. M has. The agent continues to find edits that improve M. Some of those edits no longer improve Q. Some of them actively degrade Q while inflating M. These edits still pass the keep-or-revert gate because the gate only reads M.

Researchers studying this problem in the specification-gaming literature call it Goodhart gaming. A comprehensive catalog maintained by Victoria Krakovna and colleagues at DeepMind documents real-world examples across robotics, reinforcement learning, and language models. The pattern is not theoretical. It is empirically routine.

Four gaming archetypes are relevant to most agentic loops:

- Schema inflation: adding structured data that exceeds what the underlying content actually supports.

- Keyword or signal stuffing: injecting content that the scorer rewards but that a human evaluator would find deceptive or irrelevant.

- Cross-dimension cannibalization: improving one sub-component of quality by sacrificing another sub-component that the scalar metric weights less heavily.

- Trace-divergent edits: producing an output that satisfies the metric's tokenizer but diverges from the reasoning chain the agent produced -- a form of inner misalignment.

The Karpathy Loop as published has no structural defence against any of these archetypes. The loop's minimalism, its greatest strength, is also its exposure surface. A second judge is needed.

Introducing the Governance Gauntlet

The Governance Gauntlet is a dual-rubric extension of the Karpathy Loop. It adds a second, adversarial meta-agent; an auditor; that runs in parallel with the primary optimization loop and issues a verdict on every proposed edit before the keep-or-revert decision is made.

The core design principle is this: the auditor's rubric must be orthogonal to the primary metric. If both agents are optimizing toward the same signal, you have not added governance; you have added compute. The Gauntlet's value comes from the auditor seeing a different dimension of quality than the primary loop can see.

The Governance Gauntlet is not a critique of the Karpathy Loop. It is a companion. The loop provides the engine. The Gauntlet provides the governor. Neither is complete without the other when the deployment stakes are real.

The framework was developed as part of a broader research programme on AI governance at The AI Praxis™ and connects to prior work on algorithmic readiness, agentic commerce, and the Shopper Schism thesis that the AI-mediated purchase has structurally separated the consumer from the shopper.

How the Gauntlet works

The Governance Gauntlet runs as a three-verdict protocol on every proposed edit in the loop.

CLEAN: The edit improves or maintains the primary metric. The adversarial auditor finds no integrity concern. The edit is accepted silently. The loop continues.

QUESTIONABLE: The edit improves or maintains the primary metric. The adversarial auditor identifies a potential concern but cannot confirm it is a gaming edit. The edit is accepted, but the trial is flagged and logged for human review at the session's end. No human interruption in real time.

GAMING: The edit improves or maintains the primary metric. The adversarial auditor identifies a confirmed integrity breach; one or more of the gaming archetypes is present. The edit is blocked, reverted, and the block is logged with the full reasoning trace. The loop continues from the prior accepted state.

The auditor does not simply re-score the metric. It reads the primary meta-agent's reasoning trace; the full chain of thought that led to the proposed edit; not merely the edit itself. This is the architectural distinction that makes the Gauntlet more than a second scorer. An agent can produce an edit that satisfies a metric while producing reasoning that reveals the edit is a specification game. The trace-reading step exposes that gap.

Three conditions must hold simultaneously for an edit to be accepted under the Gauntlet:

1. The primary metric is non-degraded.

2. No single quality sub-dimension has been sacrificed by more than a pre-specified threshold to gain on another.

3. The reasoning trace does not diverge from the edit in ways that indicate inner misalignment.

The auditor issues its verdict before the keep-or-revert gate is triggered. The primary loop never sees edits that were blocked. From its perspective, blocked edits did not happen. This preserves the loop's momentum while the Gauntlet quietly filters the garbage.

How the Gauntlet connects to the Four Ds Framework

The empirical test of the Governance Gauntlet was run on the Algorithmic Readiness Assessment (ARA) pipeline; a 100-point diagnostic that measures how prepared a brand's digital presence is to be read, selected, and transacted against by AI agents acting on behalf of shoppers.

The ARA is scored along four dimensions: the Four Ds Framework

- D1; Data Quality: Is the entity's data machine-interpretable, schema-compliant, internally consistent?

- D2; Discoverability: Can an AI agent find the entity at all; through indexing, feeds, protocol support, and citation patterns in AI-powered search?

- D3; Decisional Clarity: Does the page give an agent the decisional primitives it needs; price, stock, variants, policy, comparables; without ambiguity?

- D4; Delivery Reliability: Once a decision is made, can the transaction complete deterministically, including checkout agent-compatibility, authentication, return paths, and human escalation?

Each dimension is scored 0 to 25. The ARA total is the sum: 0 to 100.

The ARA is the ideal substrate for the Gauntlet experiment for a precise structural reason: the ARA total (D1 + D2 + D3 + D4) is a natural scalar metric in the Karpathy sense, but the four sub-dimensions are sufficiently orthogonal that improving one can be gamed at the expense of another. Cross-dimension cannibalization; Archetype 3 in the gaming taxonomy; is not just theoretically possible here; it is operationally trivial. An agent that wants to improve the ARA total without improving the underlying readiness of the brand can sacrifice Decisional Clarity (D3) to gain on Discoverability (D2), and the scalar total will go up while the brand's actual agent-commerce readiness goes down.

The Gauntlet's adversarial auditor uses the Four Ds sub-scores as its orthogonal rubric. If any single dimension is degraded by more than 3 points to produce a gain on another, the auditor returns a GAMING verdict. The primary meta-agent's trace is also read to detect trace-divergent edits of the type described in Archetype 4.

This connection between the Gauntlet and the Four Ds Framework matters beyond the experiment. For any brand running an ARA-based optimization loop, the Four Ds provide a ready-made adversarial rubric at no additional design cost. The governance companion comes with the diagnostic infrastructure.

A forthcoming paper will formalize the Gauntlet as a sub-dimension of D4; Delivery Reliability; specifically as a governance gate within the agentic commerce stack. The Gauntlet is not just a tool for optimizing ARA scores; it is part of what a mature D4 implementation looks like.

Empirical study: design and pre-registration

The full experimental design is deposited as a pre-registration on the Open Science Framework (OSF). The pre-registration was timestamped before any results were examined. This is standard clinical-trial methodology applied to AI research; a discipline that is conspicuously absent from most current agentic systems research.

Research question: When a Karpathy Loop is deployed against a domain where a single scalar metric cannot fully capture quality, can a second adversarial auditor meta-agent detect silent metric-gaming that the vanilla loop accepts as an improvement?

Four formal hypotheses:

- H1 (Detection): The dual-rubric loop detects a statistically significant proportion of injected gaming edits that the vanilla loop accepts.

- H2 (Specificity): The auditor does not incur a significant false-positive rate on genuine improvements. The Gauntlet does not block legitimate progress.

- H3 (Domain transfer): The detection result holds across two independent subjects from different industry verticals.

- H4 (Cost efficiency): The compute and wall-clock cost of running the Gauntlet is bounded at no more than 2 times the vanilla loop, making it deployable in practice.

What would falsify the contribution: H0 thresholds are pre-registered. The auditor must catch more than 5% of injected gaming edits to have any signal. It must not reject more than 25% of genuine improvements. If detection sensitivity does not clear the lower confidence bound of 30%, the paper's claim is retracted regardless of the point estimate.

Subjects: Two anonymized non-client subjects; Subject Alpha, a large direct-to-consumer cosmetics and personal-care retailer operating in the Asia-Pacific market (consumer-goods vertical), and Subject Beta, an independent advisory and research practice (knowledge-services vertical). Both selected by pre-registered criteria: non-client, industry diversity, sub-score spread of at least 10 points across the four dimensions, complete evidence-collector output with no missing pages.

Protocol: Three parallel arms per subject. Arm V (Vanilla Karpathy Loop with ARA total as the single metric). Arm G (Gauntlet: same loop with the adversarial auditor added). Arm C (Control: no optimization, baseline measurement only). Each arm runs 200 trials. Into each arm, 40 gaming edits are injected at randomized positions (10 per archetype). The remaining 160 trials are genuine LLM-proposed edits. The auditor does not know which trials are injected.

Gaming-edit quality: Gaming edits are crafted by a separate red-team meta-agent whose prompt is specifically tuned to produce edits that maximize ARA score gain while passing human cursory review. Strawman gaming edits; obvious manipulations that any reviewer would catch; would prove nothing. The red-team agent raises the bar for the auditor.

Statistical analysis: Detection sensitivity and specificity are reported with 95% Clopper-Pearson confidence intervals. Cross-subject differences tested with a two-proportion z-test, Bonferroni-corrected for four pairwise comparisons. Effect sizes reported as Cohen's h alongside p-values.

Pre-commitment to null publication: If all four null hypotheses survive, the paper is published as a falsification result under a revised title. A transparent null is a contribution. Pre-registering willingness to publish it is the mechanism that makes the original claim credible.

All anonymized trial logs, gaming-edit templates, scoring outputs, auditor verdicts, runtime statistics, and code are deposited on Zenodo under MIT licence. The reproducibility target: a data scientist with a Claude API key and $50 of compute should be able to replicate the headline result table in under 90 minutes.

What this means for EU AI Act compliance

The EU AI Act came into force in August 2024 with a phased implementation schedule. For organizations deploying agentic optimization systems, two articles are directly relevant.

Article 14; Human oversight. High-risk AI systems must be designed so that natural persons can effectively oversee the system during use, understand the system's output, and intervene or interrupt the system when necessary. A vanilla Karpathy Loop that accepts or rejects edits based solely on a scalar metric, without logging reasoning traces or flagging questionable decisions for human review, does not obviously satisfy Article 14's oversight requirement. The Gauntlet's QUESTIONABLE verdict category; which surfaces uncertain decisions for post-session human review; is a direct design response to Article 14.

Article 15; Accuracy, robustness, and cybersecurity. High-risk AI systems must be designed to be resilient with respect to errors, faults, and inconsistencies. The Governance Gauntlet's injection of adversarial test edits, and its systematic measurement of detection sensitivity and specificity, constitutes a robustness-testing regime of the type Article 15 requires. A system that has been tested against known gaming archetypes and whose detection rates are documented can point to that record in a conformity assessment.

It is worth being precise about what the EU AI Act does and does not cover. Not every Karpathy Loop deployment will meet the Act's definition of a high-risk AI system. But the risk classification is a live regulatory question, and the trend in EU enforcement is toward broader, not narrower, interpretation of high-risk categories. An organization that waits for regulatory clarity before implementing governance is an organization that is building its oversight infrastructure under regulatory pressure, not ahead of it.

The EU AI Act official text is publicly available via EUR-Lex. Articles 14 and 15 are in Chapter III, Section 2. The conformity assessment obligations for providers and deployers are worth reading in full before the relevant implementation deadlines pass.

How to apply the Gauntlet in your organization

This section is for practitioners. Theory is the prerequisite; deployment is the point.

Step 1: Identify your editable surface and your scalar metric. This is the Karpathy discipline, and it is not optional. If you cannot name a single editable surface and a single scalar metric, you do not yet have a loop to govern. Do that work first. The Gauntlet adds a second judge to a functioning loop; it cannot substitute for a functioning loop that does not yet exist.

Step 2: Identify your orthogonal rubric. What does your scalar metric not measure? That gap is where gaming will occur. The rubric for your adversarial auditor should probe that gap directly. If your metric is conversion rate, your rubric might probe for edits that improve conversion by degrading information quality, suppressing price transparency, or increasing cognitive load elsewhere in the purchase path. Name the gaming archetypes specific to your context before you build the auditor.

Step 3: Build the auditor as a separate agent. The auditor must be isolated from the primary meta-agent. It cannot see the primary agent's prompt during the primary's proposal phase. It reads the trace after the proposal is made. Shared memory between the two agents defeats the purpose: if the auditor knows what the primary is trying to do, it can be gamed by the primary just as easily as the metric can.

Step 4: Pre-register your thresholds. What detection sensitivity is acceptable? What false-positive rate is too high? What gaming-edit archetypes will you test against? Write those numbers down before you run the loop. This is the step that separates a governance protocol from a post-hoc rationalization.

Step 5: Run a gaming-injection trial before production. Before deploying the Gauntlet on live optimization, build a red-team agent that injects known gaming edits into the candidate stream. Measure your auditor's detection rate. If sensitivity is below your pre-registered threshold, the auditor's rubric needs to be tightened. Do not deploy a Gauntlet that you have not tested against adversarial inputs.

Starter checklist; 10 questions to assess Gauntlet readiness:

1. Is the editable surface documented and version-controlled?

2. Is the scalar metric defined, bounded, and stable across trials?

3. Are the quality dimensions that the scalar metric does not measure explicitly named?

4. Is the adversarial auditor running in an isolated context with no shared memory with the primary?

5. Is the auditor reading reasoning traces, not just outputs?

6. Are the three verdict categories (CLEAN / QUESTIONABLE / GAMING) defined with clear criteria?

7. Are QUESTIONABLE verdicts logged and reviewed by a human within 24 hours?

8. Are GAMING verdicts logged with the full auditor reasoning trace?

9. Have at least 20 known gaming edits been injected as a pre-deployment calibration?

10. Is the detection rate documented and above a pre-registered minimum threshold?

If your answer to fewer than seven of those ten questions is yes, the Gauntlet is not yet ready to govern a production loop.

Connecting to the ARA. For organizations that have already run an Algorithmic Readiness Assessment, the Four Ds sub-scores provide a ready-made orthogonal rubric for any loop that uses ARA total as its scalar metric. D1 through D4 are the natural dimensions along which a gaming edit would cannibalize. You have the governance infrastructure already. The Gauntlet is the protocol that activates it.

The broader context for this work is the Shopper Schism the structural separation of the consumer from the shopper in AI-mediated commerce. As AI agents gain more autonomy over purchasing decisions, the quality of the agentic systems driving those decisions becomes a commercial and regulatory priority simultaneously. Governance of auto-research loops is not an abstract safety concern; it is the foundation of trustworthy agentic commerce.

Read the full paper

The full research is available across three platforms, deposited simultaneously on 22 April 2026.

SSRN (working paper): The Governance Gauntlet: A Dual-Rubric Extension of Karpathy's Auto-Research Loop. SSRN Author ID 8182896. https://ssrn.com/abstract=6625918 (pedning).

Zenodo (data and code repository): All anonymized trial logs, gaming-edit templates, scoring outputs, auditor verdicts, runtime statistics, and full MIT-licensed code. DOI: 10.5281/zenodo.19689504 (https://doi.org/10.5281/zenodo.19689504). The repository includes `gauntlet_audit.py`, a standalone utility practitioners can run against their own deployments.

arXiv (CS preprint): Submitted to cs.AI with cross-list to cs.LG and cs.CY (Computers and Society). arXiv identifier: [pending, submission date 22 April 2026]. Primary category: Artificial Intelligence.

OSF Pre-registration: The full experimental protocol; hypotheses, null thresholds, subject-selection criteria, statistical plan; deposited before results were examined. Pre-registration URL: osf.io/skpgn (https://osf.io/skpgn).

Companion explainer: The Experimental Design Explainer (v1.0, 22 April 2026) is available in this site's research section and covers the methodology in practitioner-accessible terms, including the full reasoning behind the four gaming archetypes, the subject anonymization protocol, and the pre-commitment to null publication.

Questions and correspondence: paul.accornero@aipraxis.ai

Frequently asked questions

Q: What is the Karpathy Loop?

A: The Karpathy Loop is a recursive self-improvement protocol for agentic AI systems. It specifies three primitives: one editable surface (the component the AI can modify), one scalar metric (the single number that determines whether an edit is accepted or reverted), and one time budget (the fixed constraint on each trial). The loop was formalized by Andrej Karpathy in March 2026 and has since been replicated across machine learning, production software, and cloud-resource optimization. Its power is its minimalism: by constraining the system to a single decision rule, the loop remains auditable and reproducible.

Q: Who built the Karpathy Loop?

A: Andrej Karpathy released the autoresearch framework in March 2026. Karpathy was a founding research scientist at OpenAI (2015-2017) and the former Director of AI at Tesla, where he led the Autopilot computer vision team. In 2024 he founded Eureka Labs, an AI education company. He created the widely used karpathy/nanoGPT repository and has a long record of distilling complex machine-learning research into accessible, reproducible implementations. The autoresearch repository is at github.com/karpathy/autoresearch. Karpathy's contribution to the loop was the decisive simplification to three non-negotiable primitives that made the pattern legible and reproducible by a wider engineering community.

Q: Does the Karpathy Loop work outside machine learning?

A: Yes. Three independent replications in the weeks following Karpathy's March 2026 release established domain transfer. Kevin Gu extended the loop to meta-agent self-modification through his AutoAgent project, where the meta-agent rewrites its own agent harness rather than the task-agent's outputs. Tobias Lütke and the Shopify engineering team applied the loop to production code optimization, reporting a 19% performance gain from 37 overnight experiments. The SkyPilot team scaled the loop across GPU clusters (16 GPUs, 910 experiments, eight hours), demonstrating parallelization of the single-metric discipline. The empirical study reported in this post extends the pattern further to AI-readiness diagnostics in a commercial context, using the Algorithmic Readiness Assessment as the substrate. The three-primitive structure appears to be substrate-independent when the scalar metric is a reasonable proxy for the goal being optimized.

Q: What is the main risk of running a Karpathy Loop?

A: The main risk is Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. A Karpathy Loop optimizes relentlessly against its scalar metric. In the early phase of a run, genuine improvements dominate because there are many easy wins. As the easy wins are exhausted, the loop's search pressure increases and it begins to find edits that improve the metric without improving the underlying quality the metric is meant to proxy. This is called specification gaming or Goodhart gaming. The Governance Gauntlet is designed to detect and block those edits before they are accepted.

Q: How does the Governance Gauntlet work?

A: The Governance Gauntlet adds a second meta-agent; an adversarial auditor; to the standard Karpathy Loop. Before the keep-or-revert decision is made on any proposed edit, the auditor reads the primary meta-agent's full reasoning trace and evaluates the proposed edit against an orthogonal rubric: a set of quality dimensions that the scalar metric does not directly measure. The auditor issues one of three verdicts: CLEAN (edit accepted silently), QUESTIONABLE (edit accepted but flagged for human review), or GAMING (edit blocked and logged). An edit is only fully accepted when both the primary metric is non-degraded and the auditor finds no integrity concern. The critical architectural feature is that the auditor reads reasoning traces, not just outputs, which makes it sensitive to trace-divergent edits; a class of inner-misalignment gaming that output-only review cannot detect.

Q: How does the Gauntlet relate to the EU AI Act?

A: Two articles of the EU AI Act are directly relevant to Karpathy Loop deployments. Article 14 requires that high-risk AI systems support effective human oversight, including the ability to understand outputs and intervene when necessary. The Gauntlet's QUESTIONABLE verdict category surfaces uncertain decisions for human review without interrupting the loop in real time, which is a design response to Article 14. Article 15 requires accuracy and robustness, including resilience to errors and inconsistencies. The Gauntlet's gaming-injection protocol; systematically testing the auditor against known gaming archetypes and documenting detection rates; constitutes the kind of robustness-testing regime Article 15 anticipates. Not every Karpathy Loop deployment will be classified as high-risk under the Act, but building governance infrastructure ahead of regulatory classification is materially less costly than building it under enforcement pressure.

Q: What is the compute overhead of running the Gauntlet?

A: One of the four formal hypotheses in the pre-registered study (H4) is that the Gauntlet's runtime is bounded at no more than 2 times the vanilla loop's runtime. This is a pre-committed falsification threshold, not a marketing claim. If the measured runtime exceeds 2 times vanilla, the paper downgrade path is activated and the hypothesis is reported as not accepted. The rationale for the 2x bound is operational: a governance layer that more than doubles compute cost is unlikely to be adopted in practice, regardless of its detection performance. The ARA pipeline on which the study is run has a mean end-to-end runtime under 15 seconds per trial. The Gauntlet adds one additional LLM call (the auditor) per trial. Based on the system architecture, the expected overhead is materially below the 2x ceiling.

Q: Where can I read the paper?

A: The full paper is available on SSRN (SSRN Author ID 8182896), Zenodo (DOI pending), and arXiv (submission date 22 April 2026, identifier pending endorsement). The experimental design explainer is available on this site's research section. All code is released under MIT licence in the Zenodo repository. For direct correspondence: paul.accornero@aipraxis.ai.

About the author

Paul F. Accornero is an independent researcher and advisor at The AI Praxis, specializing in agentic commerce theory and AI governance. He holds the Harvard Business School General Management Program certificate (GMP23) and has studied at Harvard DCE, Wharton, and INSEAD. He has 22 working papers on SSRN, ranking in the top 3% of authors by downloads globally (Author ID 8182896, ORCID 0009-0009-2567-5155). He has led commercial operations for global consumer goods and professional services organizations for 25+ years across Europe, Asia-Pacific, and the Americas.

His research programme examines the structural consequences of agentic AI for commercial markets, with named frameworks including the Shopper Schism, the Four Ds Framework, the Algorithmic Readiness Assessment, and the Governance Gauntlet.

All frameworks referenced in this article are the intellectual property of Paul F. Accornero / The AI Praxis. The Governance Gauntlet, Four Ds Framework, and Algorithmic Readiness Assessment are registered trademarks (UK/US/EUIPO, in application). The Shopper Schism is a registered trademark.

About the Author

Paul F. Accornero is the Architect of Agentic Commerce — the first researcher to define the discipline where AI agents replace humans as the primary purchasing decision-makers. Creator of The Shopper Schism® and Agent Intent Optimisation (AIO)®. Author of The Algorithmic Shopper (St. Martin's Press). 30+ academic papers, top 2% of authors on SSRN.

Full Bio · Research · The Book · Newsletter

© 2026 Paul F. Accornero / The AI Praxis™. All content derived from The Algorithmic Shopper (U.S. Copyright Reg. No. TXu 2-507-027). The Shopper Schism®, Agent Intent Optimisation (AIO)®, and The Algorithmic Shopper® are registered trademarks. Full Legal & IP Terms.

karpathy loopauto-researchgoodhart's lawspecification gamingmetric gamingai governancedual-rubricrecursive self-improvementai safetyeu ai actalgorithmic readinessgovernance gauntlet

Paul F. Accornero

I operate at the intersection of massive global retail operations and the bleeding edge of Agentic AI.

The Context

As a Senior Executive (Dirigente) for the De'Longhi Group, I hold a governance role within a €3B+ global enterprise. From this vantage point, I have observed a fundamental shift that most organizations are missing: the decoupling of the human consumer from the purchase decision.

The Problem: The Shopper Schism

We are entering the era of Agentic Commerce. The "customer" is no longer just a person; it is an autonomous algorithm negotiating on their behalf. Traditional marketing funnels and SEO cannot solve for this.

The Work

To address this, I founded The AI Praxis, a research institute dedicated to codifying the frameworks for this new economy. While my executive role provides the commercial reality, The AI Praxis allows me to develop the rigorous methodology needed to navigate it.

My research focuses on:

● Agent Intent Optimization (AIO): The successor to SEO.

● The "Pracademic" Approach: Bridging the gap between academic theory and P&L reality.

● The Book: My upcoming title, The Algorithmic Shopper, provides the first comprehensive playbook for selling to machines.

The future of retail is not just digital; it is agentic.

https://theaipraxis.com