Summary
In November 2025, the World Economic Forum and Capgemini published AI Agents in Action: Foundations for Evaluation and Governance, introducing a structured framework for classifying, evaluating, assessing risk, and governing AI agents. The report's central artifact is the agent card — a structured description of an agent's capabilities, behavior, and operational context, inspired by Model Cards for Model Reporting (Mitchell et al., 2019). The report proposes seven classification dimensions, a multi-metric evaluation methodology, a five-step risk assessment lifecycle, nine baseline governance mechanisms, and a progressive governance model that scales oversight with agent capability.
The Agent Alignment Protocol (AAP) and Agent Integrity Protocol (AIP) implement what the WEF report recommends. AAP's Alignment Card is a machine-readable, protocol-level artifact that maps to all seven WEF classification dimensions and extends them with enforceable behavioral contracts, auditable decision trails, and multi-agent compatibility verification. AIP provides the continuous monitoring infrastructure the WEF calls for at every governance level.
This document provides a comprehensive mapping between the WEF framework and the AAP/AIP protocol suite, covering all four WEF pillars — Classification, Evaluation, Risk Assessment, and Progressive Governance — and addresses the WEF's forward-looking analysis of multi-agent ecosystem risks.
Key distinction: The WEF agent card describes an agent. The AAP Alignment Card binds it. The WEF tells organizations what to ask about their agents. AAP provides the machine-readable, verifiable answers. AIP provides the continuous assurance that those answers remain true at runtime.
1. The WEF Framework Architecture
The WEF report structures responsible agent deployment around three major sections and four foundational pillars:
1.1 Report Structure
| WEF Section | Content | AAP/AIP Relevance |
|---|---|---|
| Section 1: Technical Foundations | 3-layer agent architecture (Application, Orchestration, Reasoning), protocols (MCP, A2A, AP2), cybersecurity | AAP extends A2A agent cards; AIP addresses prompt injection and zero-trust |
| Section 2: Evaluation and Governance | Classification dimensions, evaluation criteria, risk assessment lifecycle, progressive governance | Alignment Card (classification), AP-Traces (evaluation), violation typing (risk), autonomy envelope (governance) |
| Section 3: Multi-Agent Ecosystems | Emerging risks, failure modes, governor agents, trust frameworks | Value Coherence Handshake, Braid grounding, AIP daimonion |
1.2 Four Foundational Pillars
| Pillar | WEF Purpose | AAP/AIP Implementation |
|---|---|---|
| Classification | Establish agent characteristics and operational context | Alignment Card — JSON-schema-validated, well-known endpoint, versionable, expirable |
| Evaluation | Generate evidence of performance and limitations | AP-Trace verification, AIP Integrity Checkpoints, drift detection |
| Risk Assessment | Analyse potential harm using classification and evaluation | Typed violation severities (FORBIDDEN_ACTION through CARD_MISMATCH), concern categories |
| Progressive Governance | Scale oversight proportionally to capability and context | Autonomy envelope + principal.relationship + AIP monitoring intensity + fail-open/fail-closed |
1.3 Provider vs. Adopter Perspectives
The WEF report distinguishes two stakeholder perspectives that shape how the framework is applied. AAP addresses both:
| WEF Perspective | WEF Responsibility | AAP/AIP Role |
|---|---|---|
| Provider | Build responsibly, supply documentation, ensure ethical guidelines | The Alignment Card is the provider's documentation artifact — structured, versioned, served at /.well-known/alignment-card.json |
| Adopter | Procure responsibly, deploy safely, ensure organizational compliance | AP-Trace verification and AIP monitoring give adopters independent assurance that provider claims hold in production |
2. Classification: Dimension-by-Dimension Mapping
The WEF's classification pillar introduces seven dimensions (Figure 6, p. 14), organized into Agent Characteristics (dimensions 1–5) and Operational Context (dimensions 6–7). The agent card is the primary artifact.
2.1 Function
WEF definition: What task is the agent designed to perform? (Free-text field)
AAP mapping: The Alignment Card's bounded_actions array declares the agent's permitted functions as an explicit, machine-parseable list. Where the WEF asks organizations to describe function in natural language, AAP requires it as structured data that can be verified against observed behavior.
| WEF Concept | AAP Field | Type |
|---|---|---|
| Agent function/task | autonomy_envelope.bounded_actions | String array |
| Function constraints | autonomy_envelope.forbidden_actions | String array |
AAP extension: The WEF describes function; AAP also describes anti-function — what the agent must never do, regardless of context. The forbidden_actions field has no WEF equivalent. A violation of forbidden_actions generates a FORBIDDEN_ACTION violation at CRITICAL severity — the highest in the system.
2.2 Role
WEF definition: Is the agent specialized (narrow task) or generalist (broad capabilities)? (Sliding scale: Specialist ↔ Generalist)
AAP mapping: Role specialization is captured through the combination of bounded_actions (scope breadth) and principal.relationship (how the agent relates to its human principal).
| WEF Concept | AAP Field | Values |
|---|---|---|
| Specialist vs. generalist | bounded_actions array length | Narrow (few actions) vs. broad (many) |
| Operational role | principal.relationship | delegated_authority, advisory, autonomous |
AAP extension: The WEF's role dimension is descriptive. AAP's principal.relationship field is prescriptive — it determines how the agent should behave when it encounters uncertainty. An advisory agent recommends and waits. A delegated_authority agent acts within bounds. An autonomous agent operates within declared values. This role classification directly affects runtime behavior and, via AIP, determines monitoring intensity.
2.3 Predictability
WEF definition: Is the agent deterministic or non-deterministic? (Sliding scale: Deterministic ↔ Non-deterministic)
The WEF introduction (p. 6) explicitly identifies "behavioural drift" as a novel risk that traditional governance cannot manage. The report notes that unlike conventional software, agents "simulate reasoning and adapt their behaviour through feedback mechanisms."
AAP mapping: Predictability is addressed through the audit commitment and the distinction between AAP (post-hoc, handles non-deterministic behavior after the fact) and AIP (real-time, monitors non-deterministic reasoning as it happens).
| WEF Concept | AAP/AIP Field | Function |
|---|---|---|
| Behavioral predictability | audit_commitment.trace_format | Structured logging of non-deterministic decisions |
| Non-deterministic monitoring | AIP Integrity Checkpoints | Real-time analysis of thinking blocks between turns |
| Behavioral change over time | AIP IntegrityDriftAlert | Cross-session behavioral divergence detection |
| Tamper evidence | audit_commitment.tamper_evidence | append_only, signed, or merkle trail integrity |
AAP extension: The WEF asks whether an agent is predictable. AAP and AIP assume non-determinism is the default and provide infrastructure to observe it. AP-Traces record what the agent considered and chose. AIP Integrity Checkpoints reveal what it was thinking — with thinking_block_hash for verification. Drift detection via IntegrityDriftAlert surfaces when behavior changes, with typed drift_direction values: injection_pattern, value_erosion, autonomy_creep, deception_pattern. The question shifts from "is it predictable?" to "is its unpredictability observable and characterized?"
2.4 Autonomy
WEF definition: The degree of independent planning, decision-making, and action. (Sliding scale: Low ↔ High)
The WEF draws an analogy to SAE International's driving automation levels (Level 0–5) and notes that autonomy and authority "are not inherent system properties but design choices" that "can also be calibrated during assessment or adjusted in real time" (p. 14).
AAP mapping: This is the most direct mapping. AAP's autonomy envelope is a formal, machine-readable specification of exactly what the WEF means by "autonomy level."
| WEF Concept | AAP Field | Function |
|---|---|---|
| Autonomy level | autonomy_envelope (composite) | Complete autonomy specification |
| What agent can do independently | autonomy_envelope.bounded_actions | Permitted autonomous actions |
| When agent must stop and ask | autonomy_envelope.escalation_triggers | Condition-based escalation rules (each with condition, action, reason) |
| Financial limits on autonomy | autonomy_envelope.max_autonomous_value | Currency-denominated ceiling (amount + currency) |
| Who to escalate to | principal.escalation_contact | Endpoint for escalation notifications |
| Real-time calibration | AIP recommended_action | continue, log_and_continue, pause_for_review, deny_and_escalate |
AAP extension: The WEF describes autonomy as a spectrum. AAP decomposes it into enforceable fields: what you can do (bounded_actions), what you can't (forbidden_actions), when you must ask (escalation_triggers), and how much you can spend (max_autonomous_value). This decomposition makes autonomy auditable — AAP's verify_trace function checks every logged decision against the autonomy envelope and flags violations by type and severity. AIP's real-time recommended_action field implements the WEF's observation that autonomy should be "adjusted in real time."
2.5 Authority
WEF definition: The actions an agent is permitted to take, from read-only access to full administrative control. (Sliding scale: Low ↔ High)
The WEF notes that autonomy and authority "can be combined in different ways" and are design choices informed by "intended functions, risk considerations and oversight requirements" (p. 14).
AAP mapping: Authority maps to the combination of the autonomy envelope (behavioral permissions) and the principal block (delegation chain).
| WEF Concept | AAP Field | Function |
|---|---|---|
| System permissions | autonomy_envelope.bounded_actions | What the agent is permitted to do |
| Permission boundaries | autonomy_envelope.forbidden_actions | Hard limits regardless of context |
| Data access scope | autonomy_envelope.escalation_triggers | Conditions that constrain data access |
| Delegation chain | principal.type + principal.relationship | Who delegated authority and how |
| Permission expiry | expires_at | Authority has a time limit |
| Authority verification | verify_trace → UNBOUNDED_ACTION | Detects actions outside granted authority |
AAP extension: AAP adds verifiable delegation chains. When principal.type is "agent", the card records that authority was delegated from another agent, enabling accountability tracing through multi-agent workflows. The WEF recognizes this need under "multi-agent ecosystem risks" but does not specify a mechanism. AAP's Value Coherence Handshake provides one — agents verify value compatibility before coordination proceeds.
2.6 Use Case
WEF definition: The specific application domain and environment where the agent performs its function. (Free-text field)
AAP mapping: Use case context is captured in the Alignment Card's values block and optional extensions.
| WEF Concept | AAP Field | Function |
|---|---|---|
| Application domain | values.declared | Domain-specific values (e.g., principal_benefit, minimal_data) |
| Domain constraints | values.conflicts_with | Values the agent explicitly rejects |
| Value definitions | values.definitions | Maps each value ID to name, description, priority |
| Value hierarchy | values.hierarchy | lexicographic, weighted, or contextual resolution |
| Domain-specific extensions | extensions | Protocol-specific or domain-specific metadata (namespaced) |
AAP extension: The WEF's use case dimension is classification metadata. AAP's values system is evaluable — AP-Trace verification checks whether values_applied in each decision are consistent with values.declared in the card. An agent that claims principal_benefit as a value but consistently acts otherwise will generate UNDECLARED_VALUE violations and drift alerts with drift_direction: "value_erosion".
2.7 Environment
WEF definition: Operational environment complexity — simple, complex, or multi-system. (Sliding scale: Simple ↔ Complex)
The WEF defines a complex environment as one with "incomplete or noisy information, unpredictable outcomes, changing conditions over time, continuous ranges of possible actions or states, and interactions with other agents whose behaviour also affects results" (p. 14).
AAP mapping: Environment complexity is addressed through the protocol's composability features.
| WEF Concept | AAP Field | Function |
|---|---|---|
| Single-system vs. multi-system | A2A Agent Card alignment block | AAP extends A2A for cross-system use |
| External system interactions | /.well-known/alignment-card.json | Discoverable card for any system to retrieve |
| Zero-trust assumptions | AIP fail-closed mode | Block agent on analysis failure in high-security environments |
| Cross-agent coordination | Value Coherence Handshake | Pre-coordination compatibility check |
| Environment observability | AIP window_summary | Rolling integrity statistics: size, verdicts, integrity_ratio, drift_alert_active |
AAP extension: The WEF notes that agents in complex environments must "operate under zero-trust security assumptions." AAP's well-known endpoint convention enables any system to retrieve the agent's behavioral contract and verify it before granting access. AIP's fail-open vs. fail-closed configuration (FailurePolicy.mode) allows organizations to match their failure policy to environment risk — exactly the proportional governance the WEF calls for in complex environments.
3. Evaluation: Metrics and Evidence
The WEF's Evaluation pillar (Section 2.2, pp. 18–20) establishes four evaluation principles and specific performance metrics. The report emphasizes that evaluation should be "structured, context-aware and continuous" (p. 19).
3.1 Evaluation Principles → AAP/AIP Infrastructure
| WEF Evaluation Principle | WEF Description | AAP/AIP Implementation |
|---|---|---|
| Contextualization | Reflect the tools, workflows, and edge cases the agent will encounter in practice | AP-Traces record context for each decision — the actual operational conditions, not lab conditions |
| Multidimensional assessment | Define success across accuracy, robustness, latency tolerance, compliance, and user trust | verify_trace produces multi-dimensional results: violation counts by type and severity, not a single pass/fail. AIP output analysis (v0.5.0) adds output accuracy tracking against alignment card values as a continuous assessment dimension |
| Temporal and behavioural monitoring | Track performance over time to detect regressions, shifts in behaviour, or failures to adapt | AIP IntegrityDriftAlert with integrity_similarity ratio and sustained_checks count — continuous monitoring by design |
| Provider-deployer collaboration | Transparent documentation enables deployers to validate reliability and apply safeguards | Alignment Card at /.well-known/ is the transparent documentation; AP-Traces provide the evidence base |
3.2 Evaluation Metrics → AAP/AIP Evidence
The WEF report (Figure 8, p. 19) identifies specific evaluation metrics. AAP/AIP provides the evidence infrastructure for each:
| WEF Metric | AAP/AIP Evidence Source |
|---|---|
| Task success rate | AP-Trace verify_trace — ratio of traces with zero violations to total traces |
| Task completion time | AP-Trace timestamps (timestamp per entry) enable latency analysis |
| Error types | Typed violations: FORBIDDEN_ACTION, UNBOUNDED_ACTION, MISSED_ESCALATION, UNDECLARED_VALUE, CARD_EXPIRED, CARD_MISMATCH |
| Tool call success | AP-Trace action field logs tool invocations; verification flags UNBOUNDED_ACTION for unauthorized tool use |
| Edge case robustness | AIP concern categories — reasoning_corruption and autonomy_violation surface edge case failures. Output analysis adds output_misalignment and output_injection_compliance for detecting subtle failures where thinking appears clean but output diverges |
| Trust indicators | AIP integrity_ratio (rolling ratio of clear verdicts in the integrity window) — a quantitative trust metric |
| Capabilities | Alignment Card bounded_actions declares capabilities; AP-Traces verify they match observed behavior |
3.3 Audit Logs
The WEF report emphasizes that "audit logs are central throughout this life cycle, providing structured records of agent activity and the rationale behind it" (p. 19). AAP's audit_commitment block formalizes this:
| WEF Audit Requirement | AAP Field | Implementation |
|---|---|---|
| Structured records | audit_commitment.trace_format | "ap-trace-v1" — standardized, schema-validated |
| Retention policy | audit_commitment.retention_days | Explicit retention period |
| Queryable logs | audit_commitment.queryable + query_endpoint | API-accessible trace history |
| Tamper resistance | audit_commitment.tamper_evidence | append_only, signed, or merkle |
| Rationale capture | AP-Trace alternatives_considered + selection_reasoning | Why the agent chose what it chose |
4. Risk Assessment: Lifecycle Mapping
The WEF's Risk Assessment pillar (Section 2.3, pp. 21–22) proposes a five-step lifecycle (Table 1). AAP/AIP provides tooling at each step:
| WEF Step | WEF Objective | AAP/AIP Tooling | Outputs |
|---|---|---|---|
| 1. Define context | Establish scope, boundaries, criteria for managing risk | Alignment Card defines the agent's identity, values, autonomy bounds — the risk context | Card serves as "context definition" + "risk evaluation criteria" |
| 2. Identify risks | Identify potential technical, organizational, and ecosystem risks | forbidden_actions pre-declares known risks; values.conflicts_with declares value-level risks; AIP concern categories enumerate risk types | Risk register derived from card + concern categories |
| 3. Analyse risks | Assess probability and impact, considering autonomy, authority, predictability, and operational context | verify_trace produces violation counts by type and severity; AIP IntegrityDriftAlert surfaces drift_direction with typed categories | Risk analysis scores from verification results |
| 4. Evaluate risks | Rank and prioritize risks, determine tolerability | AAP violation severities (CRITICAL, HIGH, MEDIUM, LOW) provide built-in risk ranking; AIP severity levels in concerns enable prioritization | Risk ranking from violation/concern severity distributions |
| 5. Manage risks | Implement response actions, monitor continuously | AIP recommended_action (continue, log_and_continue, pause_for_review, deny_and_escalate) implements graduated response; drift alerts feed monitoring | Control actions, evidence logs, monitoring reports |
4.1 Violation Severity as Risk Taxonomy
AAP's verification engine provides a ready-made risk taxonomy that maps to the WEF's risk categories:
| Violation Type | Severity | WEF Risk Category | WEF Governance Area |
|---|---|---|---|
FORBIDDEN_ACTION | CRITICAL | Authority violation, safety hazard | Access control |
CARD_MISMATCH | CRITICAL | Identity/integrity failure | Traceability and identity |
UNBOUNDED_ACTION | HIGH | Autonomy overreach | Access control, Human oversight |
MISSED_ESCALATION | HIGH | Governance failure | Human oversight |
CARD_EXPIRED | HIGH | Lifecycle management failure | Long-term management |
UNDECLARED_VALUE | MEDIUM | Value misalignment | Trustworthiness and explainability |
5. Progressive Governance: Mechanisms Mapping
The WEF's Progressive Governance pillar (Section 2.4, pp. 24–27) proposes that governance scales with agent capability. It identifies nine baseline governance mechanisms (Table 2, p. 26) and two human oversight models (HITL/HOTL).
5.1 Human Oversight Models → principal.relationship
The WEF defines two governance configurations for human oversight (p. 26):
| WEF Model | WEF Definition | AAP Field | AAP Behavior |
|---|---|---|---|
| HITL (Human-in-the-loop) | Agent suggests or prepares actions; final decisions require explicit human approval | principal.relationship: "advisory" | Agent recommends and waits; escalation is the default |
| HOTL (Human-on-the-loop) | Agent acts within defined boundaries; humans monitor, receive alerts, and retain ability to intervene or override | principal.relationship: "delegated_authority" | Agent acts within bounded_actions; escalation on trigger conditions |
| (Implied high-autonomy) | Agent operates independently within declared value framework | principal.relationship: "autonomous" | Agent operates within values; AIP heightened monitoring with stricter conscience prompt evaluation |
AIP implements progressive monitoring intensity based on this mapping: advisory agents receive standard monitoring, delegated_authority agents receive standard monitoring with escalation enforcement, and autonomous agents receive heightened monitoring with deeper conscience evaluation (consultation_depth: "deep").
5.2 Baseline Governance Mechanisms → AAP/AIP
Table 2 (p. 26) defines nine governance mechanisms that form the baseline for all agents. AAP/AIP provides protocol-level implementation for each:
| WEF Governance Area | WEF Mechanism | AAP/AIP Implementation |
|---|---|---|
| Access control | Enforce least-privilege access; define task boundaries | autonomy_envelope.bounded_actions (permitted), forbidden_actions (denied), max_autonomous_value (financial ceiling) |
| Legal and compliance | Data protection impact assessments; privacy and regulation compliance | values.declared encodes compliance values; extensions namespace for regulatory metadata (e.g., "eu_ai_act" extension); audit_commitment enables DPIA evidence |
| Testing and validation | Sandbox runs, controlled pilots, input-output filters, third-party audits | verify_trace against Alignment Card is the validation engine; AP-Traces from sandbox runs provide audit evidence; AIP input analysis (prompt injection detection) acts as input filter |
| Monitoring and logging | Implement logging for all agent actions; anomaly alerts and dashboards | AP-Traces (audit_commitment.trace_format: "ap-trace-v1"), AIP Integrity Checkpoints (per-turn analysis), IntegrityDriftAlert (anomaly detection), OTel export (@mnemom/aip-otel-exporter) for dashboards |
| Human oversight | Define HITL/HOTL models; require policy review before deployment; set supervisory triggers | principal.relationship (HITL/HOTL mapping above), escalation_triggers (supervisory triggers with condition, action, reason), principal.escalation_contact (escalation endpoint) |
| Traceability and identity | Assign unique agent identifiers; tag outputs to the responsible agent instance | card_id + agent_id (unique identification), AP-Trace entries linked to card_id, AIP checkpoints linked to agent_id + session_id |
| Long-term management | Protocols for ongoing monitoring, updates, and eventual decommissioning | expires_at (card expiry enforces lifecycle review), CARD_EXPIRED violation (HIGH severity) triggers re-evaluation, AIP drift detection surfaces long-term behavioral change |
| Trustworthiness and explainability | Implement explainability tools; establish trust metrics | AIP reasoning_summary (natural-language explanation of thinking block analysis), AP-Trace alternatives_considered + selection_reasoning (decision rationale), AIP integrity_ratio (quantitative trust metric) |
| Manual redundancy | Establish procedures for human takeover of critical business cases | escalation_triggers define when to transfer to human, principal.escalation_contact routes to the right person, AIP recommended_action: "deny_and_escalate" forces handoff on critical concerns |
6. Technical Foundations: Protocol Alignment
The WEF's Section 1 covers the technical architecture of AI agents, including communication protocols and cybersecurity. AAP/AIP aligns with and extends this technical layer.
6.1 Communication Protocols
The WEF report discusses MCP (Anthropic, late 2024), A2A (Google, April 2025), and AP2 (Google, September 2025) as the emerging protocol landscape (pp. 9–10). The report notes that "technical compatibility alone does not guarantee successful coordination between agents" and that the agent card concept is inspired by Model Cards (Mitchell et al., 2019, endnote 8).
| WEF Protocol | AAP/AIP Relationship |
|---|---|
| MCP | AAP extensions namespace supports MCP-specific metadata; Alignment Card can describe MCP-connected agents |
| A2A | AAP extends A2A Agent Cards with the alignment block — adding behavioral contracts to capability descriptions |
| AP2 | AAP's max_autonomous_value maps to AP2's auditable transaction limits |
| Agent Cards (A2A) | AAP Alignment Card is the A2A agent card plus enforceable alignment posture |
6.2 Cybersecurity
The WEF report (Section 1.3, pp. 11–12) identifies prompt injections and agent misuse as key threats, noting that "agents can be misused" and "exploited through design flaws or prompt injections."
| WEF Security Concern | AIP Implementation |
|---|---|
| Prompt injection | AIP concern category: prompt_injection — dedicated detection in every Integrity Checkpoint |
| Agent misuse | AIP concern category: deceptive_reasoning + undeclared_intent — surfaces hidden goals |
| Zero-trust model | AIP FailurePolicy.mode: "fail_closed" — blocks agent on any analysis failure |
| Audit trails for attribution | AP-Traces + Integrity Checkpoints provide complete forensic record per session |
| Identity verification | /.well-known/alignment-card.json enables any party to verify agent identity before interaction |
7. Multi-Agent Ecosystem Risks
The WEF's Section 3 (pp. 28–29) identifies five emerging failure modes in multi-agent ecosystems plus four emerging ecosystem patterns. AAP/AIP addresses all five failure modes.
7.1 Five Failure Modes
| WEF Risk | WEF Description | AAP/AIP Solution |
|---|---|---|
| Orchestration drift | "When agents are plugged into other agents without shared context or coordination logic, workflows can become brittle or unpredictable" | Value Coherence Handshake: Before coordination, agents exchange Alignment Cards, compute compatibility score via value_coherence_check → coherence_result. If coherence.compatible is false, coordination pauses and principals are notified. This is the "shared context" the WEF calls for. |
| Semantic misalignment | "When two agents interpret the same instruction differently, it can lead to conflicting actions or duplicated effort" | Braid grounding protocol: Agents detect semantic divergence via SSM analysis and initiate vocabulary calibration. values.conflicts_with pre-declares known semantic conflicts. Value definitions (values.definitions) with name, description, priority reduce interpretation ambiguity. |
| Security and trust gaps | "Without shared trust frameworks, agents may inadvertently expose sensitive data or interact with malicious actors" | Well-known endpoint discovery (zero-trust — any party retrieves and verifies the card), AIP prompt_injection concern category, AIP fail-closed mode for sensitive environments. Alignment Card serves as the "shared trust framework" the WEF calls for. |
| Interconnectedness and cascading effects | "Failures in tightly linked agents or systems can propagate across networks, creating a chain of disruptions" | AIP IntegrityDriftAlert with drift_direction typing enables early detection before cascading. CARD_MISMATCH (CRITICAL) immediately flags identity inconsistencies across agent boundaries. Escalation chain via principal.escalation_contact ensures human notification when integrity degrades. |
| Systemic complexity | "As the number and diversity of interacting agents grow, the likelihood of emergent behaviours and cascading failures increases, making them more difficult to anticipate, trace or diagnose" | AP-Traces with linked_trace_id enable cross-agent forensics. AIP provides per-agent integrity windows (window_summary) that can be aggregated for system-level health. audit_commitment.queryable with query_endpoint enables cross-agent trace correlation. |
7.2 Emerging Ecosystem Patterns
The WEF identifies four patterns shaping multi-agent ecosystems:
| WEF Pattern | AAP/AIP Relevance |
|---|---|
| Agent-to-agent commerce | max_autonomous_value provides financial guardrails for agent-to-agent transactions |
| Internet of agents | Well-known endpoint discovery enables internet-scale agent identity; Alignment Cards provide the interoperability standard |
| Trust frameworks for inter-agent collaboration | Value Coherence Handshake is a trust framework — it verifies identity, capabilities, and behavioral standards before collaboration |
| Governor/auditor agents | AIP's daimonion architecture — the integrity analysis that monitors the agent's thinking — is architecturally equivalent to the WEF's "governor" or "auditor" agent concept, but implemented as a protocol-level service rather than a separate agent |
7.3 Governor Agents → AIP Daimonion
The WEF report (p. 28) envisions "dedicated 'governor' or 'auditor' agents" that "monitor, audit or regulate the actions of other agents, validating transactions, detecting anomalies and correcting unsafe or unintended behaviours." The report notes these "enable scalable oversight in complex ecosystems."
AIP's daimonion (integrity analysis service) implements this concept at the protocol level:
| WEF Governor Agent Function | AIP Daimonion Implementation |
|---|---|
| Monitor agent actions | Integrity Checkpoints analyze every thinking block between turns |
| Validate transactions | Verdicts (clear, review_needed, boundary_violation) per checkpoint |
| Detect anomalies | Six concern categories: prompt_injection, value_misalignment, autonomy_violation, reasoning_corruption, deceptive_reasoning, undeclared_intent |
| Correct unsafe behavior | recommended_action: "deny_and_escalate" blocks the agent on critical concerns |
| Scalable oversight | Protocol-level service, not a separate agent — scales with infrastructure, not agent deployment |
The WEF warns that governor agents "risk overreliance on agents supervising other agents." AIP mitigates this by keeping the daimonion's output advisory in fail-open mode (the agent proceeds, with concerns logged) and blocking only in fail-closed mode (explicit organizational choice). The human principal retains ultimate authority via principal.escalation_contact.
8. Summary Mapping Tables
8.1 Classification Dimensions
| WEF Dimension | WEF Agent Card | AAP Alignment Card | Extension |
|---|---|---|---|
| Function | Natural language description | bounded_actions + forbidden_actions | Machine-parseable, verifiable, includes anti-function |
| Role | Specialist ↔ Generalist scale | principal.relationship + action scope | Prescriptive — affects runtime behavior and monitoring intensity |
| Predictability | Deterministic ↔ Non-deterministic scale | AP-Traces + AIP Checkpoints + drift detection | Observable unpredictability with typed drift directions |
| Autonomy | Low ↔ High scale | Autonomy envelope (actions, triggers, limits) | Decomposed, auditable, enforceable, real-time adjustable |
| Authority | Low ↔ High scale | Delegation chain + autonomy envelope + expiry | Verifiable delegation chains through multi-agent workflows |
| Use Case | Free-text application domain | values (declared, definitions, hierarchy, conflicts) + extensions | Evaluable values — verification checks consistency over time |
| Environment | Simple ↔ Complex scale | Well-known endpoints + Value Coherence + fail-closed | Zero-trust discoverable, multi-agent compatible, environment-proportional |
8.2 Pillars and Governance
| WEF Pillar | WEF Recommendation | AAP/AIP Implementation |
|---|---|---|
| Classification | Agent card with 7 dimensions | Alignment Card — JSON schema, well-known endpoint, versioned, expirable |
| Evaluation | Contextualized, multidimensional, temporal, collaborative | AP-Trace verification + AIP integrity checks + drift detection + OTel export |
| Risk Assessment | 5-step lifecycle (define, identify, analyse, evaluate, manage) | Typed violations with severity + concern categories + drift alerts + graduated response |
| Progressive Governance | 9 baseline mechanisms + HITL/HOTL + proportional scaling | Autonomy envelope + principal.relationship + AIP monitoring intensity + fail-open/closed |
8.3 Multi-Agent Risks
| WEF Risk | AAP/AIP Solution |
|---|---|
| Orchestration drift | Value Coherence Handshake — pre-coordination compatibility |
| Semantic misalignment | Braid grounding + value definitions + conflicts_with |
| Security and trust gaps | Well-known endpoints + prompt injection detection + fail-closed |
| Interconnectedness and cascading effects | Drift alerts + CARD_MISMATCH detection + escalation chains |
| Systemic complexity | Cross-agent trace correlation + per-agent integrity windows + queryable audit |
9. Lineage and Standards Context
The WEF agent card concept exists within a broader standards lineage that AAP builds upon:
| Standard/Framework | Relationship to AAP |
|---|---|
| Model Cards (Mitchell et al., 2019) | Foundational concept (WEF endnote 8); AAP extends from static documentation to enforceable behavioral contract |
| A2A Agent Cards (Google, 2025) | AAP extends A2A cards with the alignment block for behavioral verification |
| OECD AI Principles (WEF reference 2) | AAP's values system and audit commitment implement OECD transparency and accountability principles |
| NIST AI RMF (WEF reference 3) | AAP/AIP maps to all four NIST NCCoE focus areas (see companion NIST comment document) |
| ISO/IEC standards (WEF reference 4) | AAP's JSON schema validation and well-known endpoint conventions follow ISO-style specification patterns |
| EU AI Act | AAP's audit infrastructure and AIP's monitoring directly address Article 50 transparency requirements (enforcement August 2026) |
10. Conclusion
The WEF's AI Agents in Action framework is the most comprehensive governance blueprint for autonomous agents published by a major international body. It correctly identifies what organizations need to know about their agents (classification), how to generate evidence about their behavior (evaluation), how to reason about potential harms (risk assessment), and how oversight should scale with capability (progressive governance).
AAP and AIP provide the protocol-level implementation of that blueprint:
- The Alignment Card is the WEF agent card — not as a descriptive document, but as a machine-readable, verifiable, enforceable behavioral contract.
- AP-Traces and Integrity Checkpoints provide the evaluation infrastructure the WEF calls for — contextualized, multidimensional, temporal.
- Violation typing and concern categories provide the risk assessment taxonomy — with built-in severity rankings that map to the WEF's governance areas.
- The autonomy envelope,
principal.relationship, and AIP monitoring provide the progressive governance — with HITL/HOTL mapping and proportional monitoring intensity. - The Value Coherence Handshake, Braid grounding, and AIP daimonion address the multi-agent ecosystem risks the WEF identifies as emerging challenges.
The WEF tells organizations what questions to ask. AAP and AIP provide the infrastructure for agents to answer them — verifiably.
References
- World Economic Forum & Capgemini. AI Agents in Action: Foundations for Evaluation and Governance. November 2025.
- Agent Alignment Protocol (AAP) Specification v0.1.0. Mnemom Research, February 2026.
- Agent Integrity Protocol (AIP) Specification v0.1.5. Mnemom Research, February 2026.
- Alignment and Integrity Infrastructure for Autonomous Agents. Mnemom Research, February 2026.
- Mitchell, M., Wu, S., Zaldivar, A., et al. Model Cards for Model Reporting. FAT* '19, 2019.
This document is released under CC BY 4.0. Copyright 2026 Mnemom LLC.
