Paper-7-MDMP-Platform-Blueprint - Herding Cats in the AI Age

# MDMP-NATIVE AI DECISION PLATFORM ## Architectural Blueprint and Product Specification **Jeep Marshall** LTC, US Army (Retired) Airborne Infantry | Special Operations | Process Improvement [email protected] March 2026 --- ## SERIES NOTE This is Paper 7 in the [[Home|Herding Cats in the AI Age]] series. [[The-Super-Intelligent-Five-Year-Old|Paper 1]] established that AI needs doctrine, not more intelligence. [[Paper-2-The-Digital-Battle-Staff|Paper 2]] showed the military already built the coordination frameworks the AI industry lacks. [[Paper-3-The-PARA-Experiment|Paper 3]] proved those principles work in a live laboratory. [[Paper-4-The-Creative-Middleman|Paper 4]] demonstrated the consequences when coordination is absent. [[Paper-5-When-the-Cats-Talk-to-Each-Other|Paper 5]] showed that two models can negotiate a coordination protocol in real time. [[Paper-6-When-the-Cats-Form-a-Team|Paper 6]] proved that four models assigned military staff roles produce demonstrably better strategic decisions than a solo baseline. Paper 7 now asks the next operational question: what does a production platform look like that scales this coordination pattern from a proof-of-concept to a system any team can use? --- ## EXECUTIVE SUMMARY Paper 6's proof-of-concept validated that doctrine-structured multi-model ensembles produce measurably better strategic analysis than a single AI model, but at the cost of 2x processing time. The ensemble surfaced 6 strategic insights the solo baseline missed, including 2 rated HIGH value. Most critically, Paper 6 demonstrated that a structured MDMP (Military Decision Making Process) framework makes the difference: without doctrine, four models produce chaos. With doctrine, they staff a decision. This paper translates that proof-of-concept into a production platform specification. The thesis is straightforward: a conversational, MDMP-native platform that accepts natural voice or text input, converts it into structured tasks, deploys AI agents to work those tasks in parallel under defined roles, and surfaces decision-ready outputs aligned to military planning doctrine is both technically viable and strategically necessary. The platform is not a generic multi-agent orchestrator. It is doctrine first, then software. The MDMP structure is the constraint that makes coordination work. This paper provides the technical architecture, deployment model, access tiers, and roadmap for building such a system. --- ## 1. THE PROBLEM PAPER 6 SOLVED ### 1.1 The Ensemble Hypothesis Before Paper 6, the field evidence on multi-agent AI coordination was mixed at best. A UC Berkeley study identified 14 distinct failure modes across multi-agent systems with failure rates ranging from 41% to 86.7%.[^1] A Google/MIT collaboration found that multi-agent systems degraded performance on sequential tasks by 39–70%.[^2] The assumed answer — "more models = better decisions" — was empirically false. Paper 6 tested a different hypothesis: more models with *structure* beats one model without it, and solo models with structure beat any ensemble without it. The structure in question was military doctrine — the MDMP. ### 1.2 The Paper 6 Design Four frontier AI models were assigned military staff roles: - **Commander (Claude Opus 4.6):** Synthesis and decision authority - **S2 Intelligence Officer (Gemini 3):** Environmental scan, threat analysis, assumption challenge - **S3 Operations Officer (ChatGPT/GPT-4o):** Course of action development and feasibility analysis - **Devil's Advocate (Grok/SuperGrok):** Contrarian analysis, failure mode identification, red team All four models received identical mission briefing on the series' own publication strategy — a real decision with real stakes, not a contrived benchmark. The solo baseline (Claude) ran the full MDMP alone. The ensemble models ran independently, then the Commander synthesized their outputs. ### 1.3 Key Findings **What the Ensemble Surfaced That Solo Missed:** 1. **Serialization Strategy (Gemini)** — HIGH Value: Break Paper 2's 33,500 words into 10 "Operational Briefs" for sustained content distribution. Solo Claude treated it as monolithic. 2. **Hybrid COA (ChatGPT)** — MEDIUM-HIGH Value: "Doctrine → Articles → Case Studies → Book" sequence. ChatGPT synthesized multiple approaches into a hybrid generating more content touchpoints. 3. **Consultancy Threat (Gemini)** — MEDIUM Value: McKinsey/Deloitte rebranding LSS for AI creates a specific, named competitive threat with urgency. 4. **Credibility Attack Surface (Grok)** — HIGH Value: Specific vulnerabilities ("no PhD," "CMDP lacks reproducibility," "doctrine is just fancy for protocols") that the author must pre-empt. 5. **Professionalize or Shelve Challenge (Grok)** — MEDIUM Value: Build a GitHub repo with CMDP code/simulations to compete on implementation, not essays alone. 6. **DOW/SOCOM Alignment (Gemini)** — MEDIUM Value: Position the series as "Field Manual for the Agent Network" to align with Department of War 2026 initiatives. **What Solo Did Better:** - **Operational detail:** Week-by-week execution plans with hour estimates, named tools, and decision gates - **Risk register:** 8 risks with likelihood, impact, and specific mitigations - **DARPA integration:** Factored hard constraints (April 10 and April 13-17 deadlines) into timeline - **Assumption validation:** Identified 5 specific assumptions requiring validation with deadlines **The Refined Thesis:** "A doctrine-structured multi-model ensemble surfaces strategic blind spots that solo analysis misses, while solo analysis produces superior operational execution detail. The optimal pattern is ensemble for strategy, solo for operations — and doctrine is the constant that makes both work." --- ## 2. THE PLATFORM VISION ### 2.1 Core Principle: Doctrine First, Software Second The platform is not a general-purpose multi-agent orchestrator. It is MDMP-native.[^17] The MDMP structure is not a bolt-on feature or an optional workflow. It is the foundation. Why? Because Paper 6 proved that without structure, multi-model coordination fails. The MDMP provides the structure: defined phases, assigned roles, synthesis procedures, and decision gates. These are not procedural niceties. They are the constraints that make the system work. ### 2.2 User Perspective: Voice-First Input A user speaks naturally into the system: "We have a market entry decision for the European AI regulation market. We need to assess whether to enter before the GDPR AI Act is finalized or wait for regulatory clarity. We need this decision by Friday." The platform converts this natural language input into structured MDMP tasks: 1. **Receipt of Mission:** Parse the decision problem, extract deadline, identify stakeholders 2. **Mission Analysis:** What are the facts, assumptions, constraints, and risks? 3. **COA Development:** What are the strategic options? 4. **COA Analysis:** What are the failure modes and second-order effects? 5. **COA Comparison:** Which option is optimal against defined criteria? 6. **COA Approval:** Commander selects, reasoning documented 7. **Orders Production:** Actionable next steps, timelines, accountability Throughout this process, AI agents work in parallel: - Intelligence agent scans the regulatory landscape - Operations agent develops market entry scenarios - Devil's advocate tears each scenario apart - A commander synthesizes everything and presents decision-ready outputs The human sees each stage of analysis and can inject decisions, redirect analysis, or request deeper dives at any MDMP phase. ### 2.3 Output Layer: Decision Documents The system produces outputs aligned to military OPORD (Operations Order) structure: - **SITREP (Situation Report):** Current state, constraints, environmental factors - **Mission Restated:** What we are deciding and why - **Commander's Intent:** Strategic objectives and the decision-making logic - **Courses of Action:** Multiple options with analysis - **Selected Course of Action:** The decision and reasoning - **Execution Plan:** Immediate next steps, accountability, timeline - **Lessons Learned Registry:** What we learned for future decisions This structure is not arbitrary. OPORD format forces clarity. It prevents decisions from being based on vague intuitions. It creates an audit trail. --- ## 3. ARCHITECTURE OVERVIEW ### 3.1 Input Layer **Voice-to-Text Pipeline:** - Whisper API for transcription (proven quality, ~98% accuracy at 8+ second clips)[^3] - Speaker identification (multi-user support with role tagging) - Real-time transcription (streaming to processing layer, no batch delays) **Text Chat Fallback:** - Web/mobile chat interface for users who prefer typing - Prompt templates for common decision types (resource allocation, market entry, personnel decisions, etc.) - Structured prompt injection for role-bounded agent inputs ### 3.2 Task Engine **Inbox Queue:** All inputs become discrete tasks in a message queue. No input is lost. No decision is left in the uncompleted state. **Task Parser AI:** A dedicated Claude or Gemini model (small model, fast) converts raw input into MDMP-structured work items: - Decision problem statement - Context and constraints - Decision deadline - Primary stakeholder (the decision-maker) - Secondary stakeholders (who has input authority) **Priority Sorter:** Routes tasks to appropriate MDMP phase. A task arriving mid-decision might jump to COA Analysis. A task arriving at 4 PM with a Friday deadline gets priority queued. ### 3.3 Agent Layer: Multi-AI Ensemble **Slot-Based Architecture (Pluggable):** | Role | Primary Model | Backup Model | Function | Update Frequency | |------|---------------|--------------|----------|------------------| | Commander (Synthesis) | Claude Sonnet | Claude Opus | Decision synthesis, conflict resolution, final recommendation | Per decision | | S2 Intelligence | Gemini 3 | Claude Opus | Environmental scan, competitive landscape, threat analysis | Per COA | | S3 Operations | GPT-4o | Claude Opus | Course of action development, feasibility analysis, resource modeling | Per COA | | Devil's Advocate | Grok/SuperGrok | Claude Opus | Red team analysis, failure modes, second-order effects | Per COA | | Scribe (Optional) | Claude Haiku | — | Meeting notes, transcript synthesis, lesson learned extraction | Per session | Each slot can be filled by different models based on budget, capability needs, or availability. The architecture is model-agnostic. The MDMP structure is model-independent. The slot-based pluggable architecture implements dependency injection: the MDMP framework defines the interface (G2 intelligence, G3 operations), and any compatible model fills the slot. This is the software engineering principle of programming to interfaces, not implementations.[^21] The organizational chart is the contract. The model is the implementation. Swapping implementations does not require redesigning the organization. ### Figure 1: MDMP Platform Architecture ```mermaid %%{init: {"theme": "base", "themeVariables": {"darkMode": true, "background": "#0f172a", "mainBkg": "#1e3a8a", "nodeBorder": "#1e3a8a", "clusterBkg": "#1e293b", "clusterBorder": "#334155", "titleColor": "#f8fafc", "primaryColor": "#1e3a8a", "primaryTextColor": "#f8fafc", "primaryBorderColor": "#1e3a8a", "lineColor": "#6b7280", "edgeLabelBackground": "#1e293b"}}}%% flowchart TB classDef primary fill:#1E3A8A,stroke:#1E3A8A,color:#FFFFFF,stroke-width:2px classDef accent fill:#D97706,stroke:#D97706,color:#FFFFFF,stroke-width:2px classDef success fill:#059669,stroke:#059669,color:#FFFFFF,stroke-width:2px classDef warning fill:#DC2626,stroke:#DC2626,color:#FFFFFF,stroke-width:2px classDef muted fill:#6B7280,stroke:#6B7280,color:#FFFFFF,stroke-width:1px classDef box fill:#1e293b,stroke:#1E3A8A,color:#cbd5e1,stroke-width:2px classDef default fill:#334155,stroke:#6B7280,color:#f8fafc,stroke-width:1px subgraph INPUT[Input Layer] direction LR VOICE[Voice Input] TEXT[Text Input] BRIEF[Mission Brief Upload] end subgraph TIERS[Tiered Access] direction TB T1[Free - Students and ROTC Cadets] T2[Pro - Unit Staff Officers] T3[Enterprise - 2M Government] end subgraph PIPELINE[MDMP Processing Pipeline] direction TB P1[Phase 1 - Receipt of Mission] G1A{{Gate 1-2}} P2[Phase 2 - Mission Analysis] G2A{{Gate 2-3}} P3[Phase 3 - COA Development] G3A{{Gate 3-4}} P4[Phase 4 - COA Analysis / Wargaming] G4A{{Gate 4-5}} P5[Phase 5 - COA Comparison] G5A{{Gate 5-6}} P6[Phase 6 - COA Approval] G6A{{Gate 6-7}} P7[Phase 7 - Orders Production] P1 --> G1A --> P2 --> G2A --> P3 --> G3A --> P4 --> G4A --> P5 --> G5A --> P6 --> G6A --> P7 end subgraph AGENTS[Agent Slot Layer - Pluggable] direction TB AG1[G1 Personnel Slot] AG2[G2 Intelligence Slot] AG3[G3 Operations Slot] AG4[G4 Sustainment Slot] AG5[G5 Plans Slot] AG6[G6 Comms Slot] CMD[Commander Slot - Synthesis] end subgraph QC[Quality Controls] direction LR CB[Circuit Breaker] SPC[Control Chart] FA[Force-Advance Log] end subgraph OUTPUT[Output Layer] direction LR OPORD[OPORD] FRAGO[FRAGO] WARNO[WARNO] AUDIT[Audit Trail] end VOICE --> P1 TEXT --> P1 BRIEF --> P1 TIERS -.->|access controls| PIPELINE AGENTS -->|parallel analysis| PIPELINE QC -->|resilience| PIPELINE P7 --> OPORD P7 --> FRAGO P7 --> WARNO P7 --> AUDIT class VOICE,TEXT,BRIEF box class P1,P2,P3,P4,P5,P6,P7 primary class G1A,G2A,G3A,G4A,G5A,G6A warning class AG1,AG2,AG3,AG4,AG5,AG6,CMD accent class OPORD,FRAGO,WARNO,AUDIT success class CB,SPC,FA muted class T1,T2,T3 muted ``` *Figure 1: MDMP-Native AI Decision Platform — architecture showing input layer, 7-phase MDMP pipeline with acceptance gates, pluggable G-staff agent slots, and tiered output. Each gate implements force-advance logic and circuit breaker patterns for graceful degradation.* **Key Feature: Role Isolation** Each agent operates independently until synthesis phase. The S2 officer does not see S3's output before generating its analysis. The Devil's Advocate does not know which COA was recommended before attacking all three. This prevents groupthink and preserves distinct analytical perspectives.[^18] ### 3.3.1 Error Handling and Resilience Multi-agent systems fail. One model times out. One API rate-limits. One agent hallucinates. The platform must degrade gracefully: survive partial failures and surface what is known. **Agent Timeout Handling:** Each agent has a 30-second default timeout (configurable per role: intelligence agents 45s, operations agents 60s, scribe 15s). If an agent does not return before timeout, the orchestrator: 1. Logs the timeout with agent name, phase, and deadline. 2. Marks the agent's slot as "timed_out" in the decision JSON. 3. If the agent is critical (Commander, S3), escalates to backup model immediately. 4. If the agent is secondary (Devil's Advocate), continues with 3-agent analysis (logged as reduced ensemble). **Retry Logic:** Transient failures (API rate limit, network timeout) trigger exponential backoff: 1s, 2s, 4s, max 3 retries. Permanent failures (auth failure, model deprecated) log immediately and do not retry. Backoff is per-task, not per-request, preventing cascade failures across the queue. **Fallback Activation:** Each agent role has a primary and backup model (see Section 3.3 slot table). If the primary model becomes unavailable (API down, quota exceeded, model deprecated), the orchestrator switches to the backup model. The decision JSON records: `"s2_model": "gemini-3"`, `"s2_backup_activated": true`. Output quality may degrade, but analysis continues. **Hallucination Detection:** S2 (intelligence) and S3 (operations) outputs are cross-referenced. If S3 claims "market window closes in 12 months" but S2's environmental scan found "market window closes in 8 months," the contradiction is flagged: `"contradiction_detected": {"claim_a": "...", "claim_b": "...", "severity": "HIGH"}`. Human is alerted. Analysis continues, but the conflict is surfaced in the Commander's recommendation. **Graceful Degradation:** If 1 of 4 agents fails to return analysis (timeout, error): - 3-agent ensemble produces output. - The decision JSON marks the missing agent's slot. - Commander synthesizes 3 analyses + notes the gap. - Post-decision review flags the missing perspective. If 2 of 4 agents fail: escalate to human. The system will not produce a recommendation with less than half the ensemble. Graceful degradation when a model is unavailable is ensemble robustness in the formal sense: the system's output quality degrades proportionally to the lost model's contribution weight, not catastrophically. A four-model ensemble that loses one model retains 75% of its analytical surface. It does not fail. It narrows. The design principle is that no single model is load-bearing — the ensemble distributes analytical responsibility across roles, and the loss of any one role produces a bounded degradation, not a system failure. **Circuit Breaker Pattern:** If a single API (e.g., GPT-4o, Gemini) fails 3 times in 5 minutes, the orchestrator opens a circuit breaker: no further requests to that API for 5 minutes. Alternative models are used. After 5 minutes, a single test request is sent. If successful, the circuit closes and normal operation resumes. This prevents hammering a degraded service. The circuit breaker pattern prevents cascade failure — the same failure mode UC Berkeley's MAST taxonomy identifies as "error propagation in sequential multi-agent workflows."[^22] ### 3.4 MDMP Processing Pipeline The seven-step MDMP structure is the spine of the system: **Phase 1: Receipt of Mission** - Parse the decision problem statement - Extract explicit constraints (deadline, budget, stakeholder list) - Identify the decision-maker (who has authority) - Flag any ambiguities or missing context **Phase 2: Mission Analysis** - S2 Intelligence: environmental scan, threat analysis - Identify facts vs. assumptions - Extract constraints - Initial risk register **Phase 3: COA Development** - S3 Operations: generate 3–5 mutually exclusive courses of action - Each COA is fully described (what, how, timeline, resource requirements) - Devil's Advocate: preliminary attack on each COA **Phase 4: COA Analysis** - S3 Operations: detailed feasibility analysis for each COA - S2 Intelligence: implications of each COA in the broader environment - Devil's Advocate: full red team analysis — what could go wrong, what am I missing - Risk modeling: likelihood, impact, mitigation for each COA **Phase 5: COA Comparison** - S3 Operations: structured comparison matrix - Score each COA against weighted criteria - Rank the options **Phase 6: COA Approval** - Commander: review all analysis - Select recommended course of action - Document reasoning (why this one, what other considerations mattered) - Human decision-maker: approve, reject, or request additional analysis **Phase 7: Orders Production** - Scribe: translate selected COA into actionable next steps - Timeline: week-by-week or day-by-day depending on decision urgency - Assign ownership: who is accountable for each action - Define success metrics: how will we know this is working ### 3.4.1 Phase Acceptance Gates Each MDMP phase transition requires explicit gate approval. A gate is a decision point: data sufficient to advance, or re-analysis required? Gates prevent premature advancement and ensure decision quality degrades gracefully under time pressure. The MDMP pipeline maps precisely onto DMAIC — the Lean Six Sigma improvement cycle. The author already knew the math; military doctrine and process improvement arrived at the same structure from different directions: | DMAIC Phase | MDMP Equivalent | Platform Implementation | |-------------|-----------------|------------------------| | Define | Mission Analysis | Problem framing, decision scope, success metrics extraction | | Measure | IPB + COA Development | Data collection, baseline assessment, option generation | | Analyze | COA Analysis + Wargaming | Root cause analysis, failure mode identification, option evaluation | | Improve | COA Selection + Orders Production | Solution implementation, execution plan, accountability assignment | | Control | Execution + Assessment + Lessons Learned | Monitoring, course correction, outcome tracking, CPI feedback | This equivalence is not decorative. It means every quality improvement tool from the DMAIC toolkit — control charts, Pareto analysis, root cause fishbones, process sigma calculations — applies directly to MDMP phase performance measurement. The platform can report a process sigma for its decision pipeline the same way a manufacturing system reports sigma for its production line. **Rolled Throughput Yield across phase gates:** Each MDMP phase gate has a first-pass yield — the probability that a decision advances without requiring rework. Rolled Throughput Yield across all gates: RTY = Π(FPY_phase_i). If each of seven phases passes at 90% FPY, the pipeline has RTY = 0.9⁷ = 47.8% — less than half of decisions pass all gates without rework at that rate. This is why force-advance logic exists: in time-compressed operations, perfect quality across every gate is the wrong optimization target. The platform logs which gates were forced-advanced and why, enabling post-decision RTY analysis and systematic gate improvement over time.[^23] **Gate Structure:** | Phase | Gate Criteria | Owner | Force-Advance? | |-------|---------------|-------|---| | 1 → 2 | Problem statement clear; deadline explicit; stakeholders identified | Task Parser | Yes (logged) | | 2 → 3 | Facts/assumptions documented; constraints listed; risk register initialized with ≥3 risks | S2 Intelligence | Yes (logged) | | 3 → 4 | ≥3 COAs generated, mutually exclusive, each fully described | S3 Operations | Yes (logged) | | 4 → 5 | COA analysis complete; Red Team objections documented; scoring rubric finalized | Devil's Advocate | Yes (logged) | | 5 → 6 | COA comparison matrix complete; ranking clear; tiebreaker criteria defined | S3 Operations | Yes (logged) | | 6 → 7 | Commander recommendation documented; human has reviewed analysis; approval/defer decision logged | Commander | Yes (logged) | **Gate Logic:** - If gate criteria are met: advance immediately. - If gate criteria are not met and deadline permits: trigger re-analysis on the blocked phase. - If gate criteria are not met and deadline is imminent: human can force-advance. Log the override: timestamp, gate, reason, decision-maker. Failed gates do not halt the system. They trigger re-analysis. If human forces advance, the gap is logged for post-decision review (lessons learned). **Rationale:** Gates enforce discipline while respecting time pressure. In a 2-hour decision cycle, all gates may be forced-advanced (logged). In a 2-week cycle, most gates are met before advancement. ### 3.5 Human-in-the-Loop Design **Humans decide. AI advises.** At three critical junctures, humans retain authority: 1. **After COA Comparison:** Before Commander synthesis, the human reviews what agents generated. Any human can say "wait, I need to understand the intelligence analysis better" and request deeper analysis on specific aspects. 2. **Before COA Selection:** The human makes the final decision. AI recommends. Human decides. 3. **Before Orders Execution:** The human reviews the execution plan. No task leaves the system without human eyes on it. **Override Logging:** Every human intervention is documented: - What analysis did the human override? - Why (what information changed the recommendation)? - Timestamp and decision-maker This creates an audit trail. It also creates organizational learning: why did the machine recommend A but the human chose B? Was the human right? We learn over time. ### 3.6 Data Layer: JSON-First Backend **Why JSON?** - Industry standard, portable, scalable - No proprietary lock-in - Data exportable at any time - Future migration from JSON to PostgreSQL is a software problem, not an architecture problem[^19] **Schema:** ```json { "decision": { "id": "UUID", "problem_statement": "...", "deadline": "ISO-8601", "stakeholders": ["..."], "created": "ISO-8601", "decision_maker": "UUID" }, "mission_analysis": { "facts": ["..."], "assumptions": ["..."], "constraints": ["..."], "risks": [...] }, "coas": [ { "id": "COA-1", "title": "...", "description": "...", "s2_analysis": "...", "s3_analysis": "...", "devil_advocate": "...", "score": 3.65 } ], "selected_coa": "COA-1", "commander_recommendation": "...", "human_decision": "APPROVE | REJECT | DEFER", "execution_plan": [...], "lessons_learned": [...] } ``` Every decision is a JSON document. Every analysis is timestamped. Every decision is traceable. ### 3.6.1 Multi-Tenant Isolation Production deployment requires strict tenant isolation to prevent cross-customer data leakage. Every decision object carries a `tenant_id` field. Database security enforces row-level access control: a query for decisions belonging to Tenant A always returns only Tenant A's data, enforced at the PostgreSQL policy layer. **Isolation mechanisms:** 1. **Row-Level Security (RLS):** PostgreSQL policy on all tables restricts queries to `WHERE tenant_id = current_setting('app.tenant_id')`. Tenant context is set at connection initialization, not at query time, preventing accidental bypass. 2. **Logical Isolation:** Each tenant has isolated API keys. A stolen key grants access only to that tenant's decisions and their derived analyses. Key scope is enforced at the API gateway (FastAPI middleware validates tenant_id from incoming request against the authenticated key's owner). 3. **Blast Radius Containment:** If Tenant A's API key is compromised, exposure is limited to Tenant A's data. Message queues are partitioned by tenant_id. No cross-tenant processing occurs. 4. **Rate Limiting Per Tenant:** Token quota and request rate limits are per-tenant, tracked in a separate quota table. One tenant's spike in usage does not starve other tenants' processing. 5. **Credential Rotation:** API keys expire after 90 days (configurable per tier). Rotation is manual (for now; automation is Phase 2). Expired keys return 401 immediately. **Implementation detail:** Decision JSON documents include `"tenant_id": "UUID"` at the root. All joins filter by tenant_id before materializing results. No exception. This prevents query-time accidents from exposing cross-tenant data. ### 3.7 Output Layer: Decision Documents **SITREP Dashboard:** - Current decision status - Timeline to deadline - Key insights from each staff section - Outstanding decisions or approvals The SITREP dashboard is a real-time control chart: decision quality metrics plotted against upper and lower control limits derived from the platform's historical performance baseline. Violations trigger investigation, not automatic rollback — distinguishing special cause variation (a broken model, a corrupted input, a hallucination cascade) from common cause variation (normal analytical noise within expected bounds).[^24] The commander sees not just the current decision status but whether current performance falls within the system's normal operating envelope. **OPORD (Operations Order):** - Mission statement - Commander's intent - Selected course of action and reasoning - Execution timeline - Accountability (who owns what) - Success metrics **Lessons Learned Registry:** - Auto-generated from decision analysis - Connected to historical decisions (is this similar to a past decision?) - Tagged by decision domain (market entry, resource allocation, personnel, etc.) --- ## 4. MDMP PROCESSING PIPELINE: DETAILED FLOW ### 4.1 Mission Receipt A user inputs (voice or text): "We're deciding whether to acquire company X. They have 40 engineers, $2M ARR, 15 accounts. Acquisition would cost $50M. We need to decide in 2 weeks." The Task Parser generates: ```json { "decision_type": "acquisition", "problem_statement": "Go/no-go acquisition decision on company X", "context": { "target": "Company X", "metrics": { "headcount": 40, "arr": "$2M", "accounts": 15, "price": "$50M" } }, "deadline": "2026-03-28", "constraint_budget": "$50M", "stakeholders": [ {"role": "CEO", "type": "decision_maker"}, {"role": "CFO", "type": "stakeholder"}, {"role": "VP Engineering", "type": "stakeholder"} ] } ``` This goes to the Mission Analysis phase. ### 4.2 Mission Analysis **S2 Intelligence Analysis:** - Competitive landscape: where is company X positioned? - Market conditions: is this the right time to acquire? - Regulatory or IP considerations - Team stability risk - Integration complexity **Output:** A structured intelligence estimate identifying what we know, what we assume, and what we need to verify. ### 4.3 COA Development **S3 Operations generates three COAs:** **COA 1: "Acquire Now"** - Full acquisition at $50M asking price - Target integration in 90 days - Retain all 40 engineers - Accelerate product roadmap with acquired team **COA 2: "Negotiate and Acquire"** - Counter-offer at $35M - Selective team acquisition (20 core engineers) - Slower integration (180 days) - Risk losing key talent if negotiations fail **COA 3: "Do Not Acquire"** - Hire directly to fill the gap (estimated $1.5M/year for 5 years + 12-month ramp time) - Build internally vs. acquire - Lower immediate capital cost but longer time to capability ### 4.4 COA Analysis **S3 Operations (Detailed Analysis):** - COA 1: $50M cash outlay, 90-day integration risk, fast time to market - COA 2: $35M potential outlay (if negotiation succeeds), 180-day integration, risk of talent retention failure - COA 3: $7.5M total cost, 18-month ramp, retain capital flexibility **S2 Intelligence (Environmental Impact):** - Competitor is also sniffing around Company X (time-sensitive) - Market window for the product closes in 12 months - Regulatory environment may tighten next year (affects post-acquisition integration) **Devil's Advocate (Red Team):** - COA 1: Integration failure rate for mid-market acquisitions is 40–60%.[^12] You might spend $50M and still lose the team. - COA 2: The asking price is artificially low (negotiation will fail). You'll end up at COA 1 pricing or lose the deal. - COA 3: By the time you hire and onboard, the market window is closed. This is a $50M opportunity cost, not a cost-savings. ### 4.5 COA Comparison **Weighted Decision Matrix:** | Criterion | Weight | COA 1 | COA 2 | COA 3 | |-----------|--------|-------|-------|-------| | Speed to Market | 25% | 5 — 90 days | 3 — 180 days | 1 — 18 months | | Capital Efficiency | 20% | 2 — $50M outlay | 4 — $35M potential | 5 — $7.5M | | Team Retention Risk | 20% | 3 — moderate risk | 2 — high risk | 5 — no risk (hire new) | | Integration Complexity | 15% | 2 — high | 3 — moderate-high | 5 — low (own team) | | Competitive Position | 20% | 5 — fast to capability | 4 — moderate | 1 — too slow | | **Weighted Score** | **100%** | **3.5** | **3.1** | **2.8** | COA 1 wins, but the Devil's Advocate's integration risk caveat carries weight. ### 4.6 COA Approval **Commander Recommendation:** "COA 1 with mitigated risk. The market window and competitive threat justify the $50M price. The integration risk is real but manageable with a 90-day structured integration plan (separate team, preserved decision authority, clear success metrics). I recommend COA 1 with the following risk mitigations: hire an integration lead with M&A experience, define month-1 through month-3 wins in advance, establish weekly executive sync, plan for 25% post-acquisition churn and have replacement hiring plan ready." **Human Decision:** CEO approves COA 1 with risk mitigations noted. ### 4.7 Orders Production **Execution Plan:** | Week | Action | Owner | Success Metric | |------|--------|-------|-----------------| | W1 | Negotiate final terms, legal due diligence | CEO, General Counsel | LOI signed by Friday | | W2 | Retention agreements with key engineers | CEO, VP Eng | All 15+ core team signed | | W3 | Integration lead hired, 100-day plan drafted | CEO | Integration lead starts, plan reviewed | | W1-2 | Product roadmap integration planning | VP Product | Unified roadmap draft | | M2 | Week 30 integration milestones met | Integration Lead | First product release post-acquisition | | M3 | Full team integration complete | Integration Lead | Post-integration engagement survey >3.5/5 | --- ## 5. TIERED ACCESS MODEL ### Tier 1: Free (Cadet/Student) **Target:** ROTC cadets, JROTC students, military students, civilian undergraduates learning decision-making **Capabilities:** - Access to MDMP pipeline with limited monthly token budget (100K tokens/month) - Single AI model (Claude or GPT-4o, user choice) - Up to 5 decisions per month - Student guide and teaching materials included **Pricing:** Free **Government Contractor Pathway:** DoD, Fort Liberty, Naval Academy, and Air Force Academy students have unlimited free access. ### Tier 2: Professional **Target:** Enterprise planners, consulting firms, small military units, startup leadership teams **Capabilities:** - Full multi-agent ensemble (3–4 models, user configurable) - Unlimited token budget (pay-as-you-go, $0.03/K tokens) - Unlimited decisions - Advanced analytics (decision velocity, outcome tracking, team performance) - Custom agent roles (add specialist agents for specific domains) - Integration with external tools (calendar, project management, email) **Pricing:** $500/month base + token overage ### Tier 3: Enterprise/Government **Target:** Department of War, SOCOM, Fortune 500 strategy teams, defense contractors **Capabilities:** - Dedicated deployment (on-premise or private cloud) - Custom MDMP templates per unit or organization - DARPA/DoD integration pathway - SLA, security compliance (roadmap: FEDRAMP Phase 3, IL4/IL5 planned), audit logging for current deployment - Custom agent tuning per organization's decision patterns - Post-decision outcome tracking and organizational learning feedback loop **Pricing:** $500K–$2M annually depending on deployment scope and customization --- ## 6. TECHNOLOGY STACK ### Backend Infrastructure | Layer | Technology | Rationale | Cost Model | |-------|-----------|-----------|-----------| | API Gateway | FastAPI (Python) | Lightweight, async-first, built for AI pipelines | Open source | | Message Queue | AWS SQS or Redis | Decouple input from processing, handle spikes | $0.50/M messages (SQS) | | Orchestration | LangChain or custom | Route tasks to agents, manage parallel processing, error handling | LangChain free tier | | Database | PostgreSQL (prod) / JSON (staging) | Scalable, ACID compliance, full-text search for lessons learned | Self-hosted or AWS RDS | | Voice Processing | Whisper API + WebRTC | Real-time transcription, low latency | $0.006/minute[^4] | | Agent Runtime | Claude API + GPT-4o + Gemini API | Multi-model routing, parallel execution | Pay-as-you-go per model | ### Frontend | Component | Technology | Rationale | |-----------|-----------|-----------| | Web UI | React 18 + TypeScript | Type-safe, component reuse, SSR-capable | | Mobile | React Native or native iOS/Android | Cross-platform, native performance | | Voice Input | WebRTC + Whisper | Browser-based recording, cloud transcription | | Real-time Updates | WebSocket + Redux | Live analysis updates as agents work | | Document Export | ReportLab (Python) + pdfkit | Generate OPORD PDFs on demand | ### Deployment **Development:** Docker containers, local Kubernetes for testing **Staging:** AWS ECS on EC2, RDS PostgreSQL, CloudFront CDN **Production (Tier 1/2):** AWS Lambda for stateless processing, managed PostgreSQL, API Gateway **Production (Tier 3/Enterprise):** On-premise Kubernetes cluster or AWS private cloud with VPC isolation, encryption at rest and in transit, audit logging. Phase 3 roadmap includes FEDRAMP-aligned audit controls and IL4/IL5 pathway validation. --- ## 7. COMPETITIVE POSITIONING AND GOVERNMENT PATHWAY ### 7.1 Why This Platform Matters The AI industry is converging on multi-agent architectures. Every major lab has announced orchestration frameworks. But the industry assumes that more models = better decisions. Paper 6 proved this wrong. The differentiator is not capability. It is structure. This platform competes not on model capability, but on decision quality. "More models" is commodity. "Better decisions through doctrine" is differentiation. ### 7.2 DARPA CLARA Alignment DARPA's "Collaborative Learning for Resilient AI" (CLARA) program[^5] explicitly solicits AI systems that improve decision-making under uncertainty through multi-model coordination. This platform is CLARA-native: - **Resilience:** If one model fails, others continue. No single point of failure. - **Collaborative Learning:** Each decision feeds back into the lessons learned registry, improving future decisions. - **Doctrine-Structured:** MDMP provides the framework CLARA seeks. The platform architecture aligns with DARPA, NSF (AI Institutes), and DoD (SOCOM agentic AI initiatives) program requirements, pending formal partnership negotiations.[^15] ### 7.3 DoD Deployment Pathway **Immediate (6–12 months):** - Deploy Tier 1 free access to ROTC programs nationwide (Army Cadet Command partnership) - Pilot Tier 2 with SOCOM (JSOC planning cells, civil affairs planning) - Conduct live exercise demonstrations (Fort Irwin, Fort Leavenworth, SOCOM Range) **Medium Term (1–2 years):** - Tier 3 deployment to Army Service Component Commands (ARFORGEN planning) - Integration with Army Battle Command System (ABCS) and higher-order MICC systems - Certification as an approved planning tool per U.S. Army Field Manual 5-0 (The Operations Process) **Long Term (2–3 years):** - Multi-national integration (NATO allies: UK, Canada, Germany, Poland) - Agentic warfare experimentation (CSIS identified the Adaptive Staff as "most effective and resilient"[^6] — this platform implements that model) --- ## 8. ROADMAP ### Phase 1: MVP (Month 1–3, April–June 2026) **Deliverable:** Single-decision prototype, Claude-only, voice input, MDMP phases 1–5 (no orders production yet) **Scope:** - Voice-to-text pipeline (Whisper) - Task parser (Claude small model) - Single agent (Claude Opus as Commander, running all staff roles) - Web UI for mission input and phase review - JSON backend storage **Success Metric:** Conduct 10 live decisions with ROTC cadets or military students. Measure decision quality vs. baseline (solo manual planning). **Estimated Cost:** $200K (engineering + API costs) ### Phase 2: Multi-Model Ensemble (Month 4–6, July–September 2026) **Deliverable:** Multi-agent platform with 3–4 model ensemble, full MDMP phases 1–7, JSON backend **Scope:** - Add Gemini 3, GPT-4o, Grok integration - Implement role-bounded agent architecture - Phase 7: Orders Production with Scribe - Lessons Learned Registry (auto-extracted from decisions) - Mobile web app (React Native) - Basic analytics dashboard **Success Metric:** Tier 2 private beta with 5 enterprise customers. Measure decision velocity and user satisfaction. **Estimated Cost:** $500K (multi-model integration + mobile + database) ### Phase 3: Enterprise Deployment (Month 7–12, October 2026–March 2027) **Deliverable:** Tier 3 enterprise platform with on-premise option, custom MDMP templates, outcome tracking **Scope:** - On-premise Kubernetes deployment - FEDRAMP certification roadmap development and IL4/IL5 compliance assessment - Custom agent roles per organization - Post-decision outcome tracking and feedback loop - Integration with enterprise tools (Jira, Salesforce, SAP) - Executive reporting and org-level decision analytics **Success Metric:** Tier 3 pilot deployment with SOCOM or Army Service Component Command. Measure decision outcome improvement vs. baseline. **Estimated Cost:** $1.5M (security, compliance, custom integrations) ### Phase 4: Agentic Warfare (Month 13–24, April 2027–March 2028) **Deliverable:** Extension for tactical/operational warfare planning (not autonomous weapons, but decision support for human commanders) **Scope:** - Integrate with tactical sensors (HUMINT, SIGINT, IMINT feeds) - Real-time threat assessment - Adaptive Staff model (CSIS-style distributed decision authority) - Live exercise integration (Force-on-Force, Battle Command Training Center) **Success Metric:** Field exercise with live decision loop demonstrating 3–4x faster decision cycle vs. traditional staff. **Estimated Cost:** $2M+ (R&D with DoD labs) --- ## 9. RISKS AND MITIGATIONS ### Risk 1: Model Capability Changes **What:** New frontier models emerge with different APIs, capabilities, or costs. Platform becomes obsolete. **Mitigation:** API abstraction layer (LangChain) makes model swapping a configuration change, not code rewrite. The MDMP structure is model-agnostic. ### Risk 2: Integration Failure (Defense Contractors) **What:** Platform integrates with Army systems (ABCS, MICC) but integration is delayed or technically infeasible. **Mitigation:** Partnership with integration partner (Booz Allen, ManTech) early. Build API contracts before development starts. ### Risk 3: Data Privacy/Compliance **What:** Decisions contain sensitive information. Deployment to government requires security hardening beyond Tier 2 scope. **Mitigation:** Tier 3 assumes on-premise deployment from day one. Customer owns all data. No transmission to cloud. Encryption at rest. ### Risk 4: User Adoption (Doctrine Resistance) **What:** Users reject the MDMP structure as "too rigid" or "slowing down decisions." **Mitigation:** User training emphasizes that doctrine accelerates decisions by 3–4x (Paper 6 thesis). Demonstrate with live data from early pilots. ### Risk 5: Cost of Multi-Model Ensemble **What:** Querying 4 models simultaneously is expensive ($0.06–0.12 per decision if each model costs $0.02K tokens). Customers balk. **Mitigation:** Tier 1 defaults to single-model to keep costs low. Tier 2 makes multi-model cost transparent and optional. Tier 3 budgets enterprise-scale token spend. --- ## 10. CONCLUSION Paper 6 proved that doctrine-structured multi-model ensembles produce better strategic decisions than a single AI model. This paper translates that proof-of-concept into a platform specification that can scale from a ROTC cadet learning decision-making to a Department of War planning cell deciding on a multi-billion-dollar agentic warfare program. The platform is not generic. It is doctrine-first. The MDMP structure is the constraint that makes coordination work. Every architectural decision flows from this principle. The competitive advantage is not model capability — that is a commodity that every major lab will match. The advantage is structure. A team that runs its decisions through MDMP (with AI assistance) will make better decisions than a team running the same decision through a chatbot. The platform operationalizes this advantage. The government pathway is clear. DARPA CLARA explicitly seeks this type of system. SOCOM is actively seeking agentic AI demonstrations. The Army has created a career field (AOC 49B) for AI-focused officers.[^7] The market exists. The need is documented. The remaining question is not whether the platform is viable. Paper 6 answered that. The remaining question is execution: can we build it fast enough to capture the window? --- ## FOOTNOTES [^1]: UC Berkeley EECS-2025-164: "From Local Coordination to System-Level Strategies: Designing Reliable, Societal-Scale Multi-Agent Autonomy Across Scales," Victoria Tuck, 2025. Identified failure modes across multiple categories in multi-agent systems with failure rates 41-86.7%. Available at: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-164.html [^2]: "Towards a Science of Scaling Agent Systems," Yubin Kim, Ken Gu, et al. (Google Research, MIT, Google DeepMind), 2025. arXiv:2512.08296. Found multi-agent systems degrade sequential task performance by 39-70% while improving parallel task performance by 80.9%. Available at: https://arxiv.org/abs/2512.08296 [^3]: Whisper V3 achieves Word Error Rate (WER) of approximately 2% (~98% accuracy) on speech segments 8 seconds or longer under standard acoustic conditions. Performance varies by language, accent, and acoustic environment. Source: OpenAI Whisper documentation and independent benchmarks. [^4]: OpenAI Whisper API transcription cost is $0.006 per minute of audio processed. Source: OpenAI API pricing, current as of March 2026. [^5]: DARPA's Collaborative Learning for Resilient AI (CLARA) program solicits AI systems that improve decision-making under uncertainty through multi-model coordination. Source: DARPA Programs. Available at: https://www.darpa.mil/research/programs/clara [^6]: The CSIS Adaptive Staff model identifies distributed decision authority as an effective and resilient organizational structure for complex operational environments. Source: "Rethinking the Napoleonic Staff," Center for Strategic and International Studies (CSIS). [^7]: The U.S. Army established Army Officer Classification (AOC) 49B as a career field for officers focused on AI strategy, doctrine, and integration within Army operations. Source: U.S. Army News Service. [^8]: FastAPI is a lightweight, async-first Python framework designed for building AI pipelines and microservice APIs. Documentation: https://fastapi.tiangolo.com/ [^9]: LangChain is an orchestration framework for integrating multiple AI models into sequential and parallel workflows. Documentation: https://python.langchain.com/ [^10]: WebRTC (Web Real-Time Communication) is a W3C standard enabling peer-to-peer audio and video streaming through web browsers. Specification: https://www.w3.org/TR/webrtc/ [^11]: The Federal Risk and Authorization Management Program (FedRAMP) provides a standardized security authorization process for cloud services serving U.S. federal agencies. Details: https://www.fedramp.gov/ [^12]: Integration failure rates for mid-market acquisitions (target valuations $20M-$500M) range from 40% to 60%, with common failure modes including cultural misalignment, technical debt incompatibility, and talent retention failure. [^13]: Paper 6 ("When the Cats Form a Team") conducted a doctrine-structured multi-model ensemble test using four frontier AI models assigned military staff roles. The ensemble surfaced six strategic insights the baseline missed, including two rated HIGH value. Source: Marshall, J. (2026). Herding Cats in the AI Age, Paper 6. [^14]: The MDMP structure ensures multi-agent coordination succeeds through defined phases, role assignment, synthesis procedures, and decision gates. Paper 6 demonstrated that without structure, four models produce chaos; with structure, they staff a decision. [^15]: SOCOM is actively exploring agentic AI demonstrations and adaptive planning architectures. The Department of War 2026 initiatives include funding for agentic AI experimentation. Formal partnership negotiations are pending. [^16]: Real-time transcription via Whisper API eliminates batch delays and enables live streaming of voice input directly into the MDMP task engine. [^17]: The "doctrine first, software second" principle prioritizes organizational structure and decision-making processes over technical implementation. This inversion makes structure the constraint that governs software architecture. [^18]: Role isolation where S2, S3, and Devil's Advocate each generate analysis independently before synthesis prevents groupthink and preserves analytical diversity critical to adversarial thinking and blind spot detection. [^19]: JSON-first backend design maintains technology independence: future migration to PostgreSQL or other databases remains a software problem, not an architecture problem, enabling graceful scaling without core redesign. [^20]: The tiered access model aligns pricing with decision scope and organizational complexity, ensuring accessibility for training environments while sustaining enterprise-grade security for DoD operations. [^21]: Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1994). *Design Patterns: Elements of Reusable Object-Oriented Software*. Addison-Wesley. The principle "program to an interface, not an implementation" (p. 18) is the foundational design pattern enabling the slot-based pluggable agent architecture. [^22]: Cemri, M., et al. "Multi-Agent Systems Failure Taxonomy (MAST)." UC Berkeley EECS-2025-164, NeurIPS 2025 Spotlight. Identifies "error propagation in sequential multi-agent workflows" as a primary failure mode class. The circuit breaker pattern addresses this by isolating failed API dependencies before their failures cascade downstream. [^23]: Rolled Throughput Yield (RTY) is a Lean Six Sigma metric: RTY = Π(FPY_i) across n process steps. For n=7 phases at 90% FPY each: RTY = 0.9⁷ = 47.8%. Source: Pyzdek, T., & Keller, P. (2014). *The Six Sigma Handbook*, 4th ed. McGraw-Hill. The force-advance mechanism trades RTY for decision velocity under time pressure — a deliberate operational tradeoff, not a quality failure. [^24]: Statistical Process Control (SPC) control charts — Shewhart X̄ and R charts, CUSUM, EWMA — distinguish common cause variation (within ±3σ of the process mean) from special cause variation (outside control limits or exhibiting non-random patterns). Source: Montgomery, D.C. (2020). *Introduction to Statistical Quality Control*, 8th ed. Wiley. Applied to decision quality metrics, control limits define what "normal analytical performance" looks like and flag when investigation is warranted. --- ## RELATED - [[Home|Herding Cats in the AI Age — Home]] - [[Paper-6-When-the-Cats-Form-a-Team|Paper 6: When the Cats Form a Team]] - [[Paper-2-The-Digital-Battle-Staff|Paper 2: The Digital Battle Staff]] - [[Paper-5-When-the-Cats-Talk-to-Each-Other|Paper 5: When the Cats Talk to Each Other]]