The-Super-Intelligent-Five-Year-Old - Herding Cats in the AI Age

# THE SUPER INTELLIGENT FIVE-YEAR-OLD ## Why AI Needs Military Doctrine and Lean Six Sigma — Not the Other Way Around ### How Battle-Tested Planning Frameworks Solve the Scalability Problem That AI Cannot Solve for Itself **Jeep Marshall** LTC, US Army (Retired) Airborne Infantry | Special Operations | Process Improvement February 2026 --- ## EXECUTIVE SUMMARY Artificial intelligence operates like a super intelligent five-year-old. It possesses extraordinary cognitive horsepower but lacks the discipline, structure, and operational maturity to deploy that power effectively at scale. This paper argues that two proven frameworks — the U.S. Army's Military Decision Making Process (MDMP) and Lean Six Sigma (LSS) — provide exactly the operational architecture that AI systems need to move from impressive demonstrations to reliable, scalable execution. The thesis is straightforward: AI does not replace process discipline. It demands more of it. Every AI system deployed without structured planning frameworks will hit the same walls that military formations hit when they try to operate without doctrine — communication breakdowns at scale, drift between elements, wasted resources, and catastrophic failures that compound because nobody built in the checkpoints. This paper draws on 26 years of military service, including seven years training brigade-level staffs through simulation-driven exercises where the planning frameworks described here were tested under operational pressure every day. That experience, combined with hands-on work building AI-integrated workflows and the convergence of Quality 4.0 research, makes the case that the next generation of AI practitioners need Black Belts and battle staff officers more than they need another language model. --- ## 1. THE PROBLEM: AI IS INHERENTLY INEFFICIENT The AI industry sells capability. What it delivers is potential — raw, undirected, and wasteful. Large language models consume enormous computational resources to produce outputs that frequently require human correction, re-prompting, and iterative refinement. This cycle mirrors what Lean practitioners identify as the eight wastes (DOWNTIME): Defects in output, Overproduction of irrelevant content, Waiting for human correction, Non-utilized talent when AI handles tasks humans do better, Transportation of context across fragmented sessions, Inventory buildup in unprocessed backlogs, Motion through repetitive manual prompting, and Extra-processing through verbose or unfocused responses.[^1] None of this surprises anyone who has operated at scale. The U.S. Army learned decades ago that individual brilliance does not survive contact with organizational complexity. A platoon of exceptional soldiers will fail without doctrine. A battalion of them will create chaos. The same physics apply to AI agents operating across distributed systems. ### The Bandwidth Problem Once so many units try to communicate, the laws of physics catch up. Only so much bandwidth exists. Time and distance create slack and drift between formations. This observation — drawn from decades of managing multi-echelon military operations — maps directly to AI system architecture. Every additional AI agent, every new integration point, every expanded context window introduces communication overhead that degrades system performance. Military doctrine solved this problem through standardized orders formats (OPORD, WARNO, FRAGO), battle tracking systems (Common Operating Picture), and structured decision-making processes (MDMP). AI systems operate without any of these structural supports. The result: impressive individual outputs embedded in operationally fragile architectures.[^2] --- ## 2. METT-TC(IT): MISSION VARIABLES FOR AI OPERATIONS The Army uses METT-TC — Mission, Enemy, Terrain, Troops, Time, and Civil Considerations — as the foundational framework for analyzing any operational situation. Every planning process begins here. Every leader, from squad leader to commanding general, applies these six variables before making decisions. For AI operations, METT-TC adapts with remarkable precision. The addition of Information Technology as a seventh variable — METT-TC(IT) — acknowledges the digital battlespace in which AI agents operate.[^3] | Variable | Military Application | AI System Application | |----------|---------------------|----------------------| | **Mission** | Task, purpose, and commander's intent | User's actual intent vs. literal request. What does "done" look like? | | **Enemy** | Adversary capabilities, disposition, likely courses of action | Failure modes, edge cases, data quality issues, context window limits | | **Terrain** | Physical environment — KOCOA-W analysis | System architecture, API constraints, vault structure, file systems | | **Troops** | Available forces, capabilities, training level | Available models (Opus/Sonnet/Haiku), tools, MCP connectors, compute budget | | **Time** | Available planning/execution time, one-third/two-thirds rule | Context window limits, session duration, rate limits, token budget | | **Civil** | Impact on civilian population, infrastructure | User experience, downstream consumers, data privacy, stakeholder impact | | **Info Tech** | Communications architecture, cyber considerations | Connector stack, MCP servers, network latency, integration overhead | This framework transforms AI deployment from ad-hoc prompting into structured operational analysis. Before building a single agent or writing a single prompt, practitioners analyze all seven variables. The result: fewer surprises, faster adaptation, and systems that scale because they were designed to scale. --- ## 3. MDMP: THE PLANNING ARCHITECTURE AI LACKS The Military Decision Making Process is a seven-step iterative methodology refined through 70 years of real-world application in conflicts from Korea to Afghanistan. It represents the most battle-tested planning framework in existence. Its principles apply directly to AI system design and task execution.[^4] ### The Seven Steps Applied to AI | # | MDMP Step | Military Output | AI Adaptation | Key Benefit | |---|-----------|----------------|---------------|-------------| | 1 | Mission Receipt | Initial time analysis | Receive user request | Establish timeline | | 2 | Mission Analysis | Problem statement, constraints, assumptions | Analyze actual intent, scan environment | Prevent solving wrong problem | | 3 | COA Development | 2-3 distinct approaches | Generate multiple solution paths | Avoid first-idea bias | | 4 | COA Analysis | War-gaming each COA | Simulate execution, find failure points | Catch risks before execution | | 5 | COA Comparison | Decision matrix | Surface trade-offs transparently | Enable informed human choice | | 6 | COA Approval | Commander selects | User approves approach | Human remains in the loop | | 7 | Orders Production | OPORD with annexes | Execution plan with checkpoints | Structured, traceable execution | The critical insight: MDMP front-loads the thinking. Military commanders spend 40% of their planning time on Mission Analysis alone — understanding the problem before generating solutions. AI systems today do the opposite. They generate solutions immediately and discover problems during execution. This inverted approach produces the rework rates, context window overflow, and cascading failures that plague complex AI workflows. ### Warning Orders: The Parallel Execution Multiplier Military units use Warning Orders (WARNOs) to enable parallel preparation while planning continues. The moment a commander receives a mission, subordinate units receive a WARNO telling them to begin staging resources, conducting reconnaissance, and preparing for movement — even before the detailed plan exists. AI systems lack this concept entirely. Every agent waits for the complete plan before beginning work. Implementing WARNO-style early alerts for multi-agent AI systems would dramatically accelerate execution by enabling parallel preparation across tool chains, data pipelines, and dependent systems.[^5] --- ## 4. LEAN SIX SIGMA: THE QUALITY FRAMEWORK AI DEMANDS Lean Six Sigma provides the complementary framework to MDMP. Where MDMP structures the planning process, LSS structures the improvement cycle. Together, they create a complete operational architecture for AI systems. ### DMAIC Meets AI: Quality 4.0 The DMAIC cycle — Define, Measure, Analyze, Improve, Control — transforms from a manual, analyst-heavy process into an AI-accelerated improvement engine. Research from Harvard Business Review, Gartner, and practitioners across industries confirms that AI does not replace DMAIC. It runs hundreds of DMAIC cycles in parallel, at machine speed, while LSS provides the governance that prevents automation from accelerating waste.[^6] | DMAIC Phase | Traditional Execution | AI-Accelerated Execution | |-------------|----------------------|-------------------------| | **Define** | Weeks of stakeholder interviews, manual problem scoping | AI analyzes customer complaints in real-time, defines pain points in days | | **Measure** | Clipboard audits, manual data collection | IoT sensors and automated telemetry provide second-by-second data | | **Analyze** | Statistical tools, fishbone diagrams, manual root cause analysis | ML algorithms detect patterns human analysts miss for years | | **Improve** | Pilot programs, trial-and-error implementation | AI simulates impact of proposed changes before deployment | | **Control** | Control charts, periodic audits, manual monitoring | Self-healing systems maintain gains without constant analyst oversight | Gartner projects that by 2026, more than 50 percent of companies using Lean Six Sigma will incorporate AI-driven tools to optimize workflows. The inverse proposition matters more: AI systems deployed without LSS governance will automate waste at machine speed, compounding inefficiency rather than eliminating it. ### The 5S Foundation for AI Systems The Lean 5S methodology — Sort, Set in Order, Shine, Standardize, Sustain — applies directly to AI system maintenance. AI knowledge bases, prompt libraries, agent configurations, and data pipelines all accumulate technical debt. Without systematic 5S discipline, these systems degrade exactly like a factory floor without housekeeping standards.[^7] --- ## 5. QUALITY ASSURANCE: LESSONS FROM QASAS The Army's Quality Assurance Specialist (Ammunition Surveillance) program — QASAS — provides a compelling model for AI quality assurance. Established in 1920, the QASAS program represents the oldest federal civilian career program in existence. For over a century, these specialists have ensured that ammunition functions as expected when soldiers pull the trigger.[^8] The parallels to AI system reliability are direct and instructive. | QASAS Function | AI System Parallel | Why It Matters | |---------------|-------------------|----------------| | Ammunition surveillance and testing | Model output validation and testing | Verify outputs function as expected under operational conditions | | Regulatory compliance monitoring | AI safety and alignment monitoring | Ensure systems operate within defined boundaries | | Stockpile reliability assessment | Prompt library and agent reliability scoring | Identify degradation before it causes mission failure | | Condition code classification | AI output quality classification | Separate serviceable from unserviceable outputs | | Explosives safety management | AI risk and harm prevention | Protect users and downstream systems from unsafe outputs | The QASAS model teaches a critical lesson: quality assurance for high-stakes systems requires dedicated, trained specialists who operate independently of the production chain. AI systems need the equivalent — dedicated quality assurance functions that test, classify, and certify AI outputs with the same rigor applied to ammunition that soldiers depend on with their lives. --- ## 6. RISK MANAGEMENT AND AFTER ACTION REVIEWS Army risk management integrates throughout every operation, not as an afterthought but as a continuous thread from planning through execution through assessment. The framework identifies hazards, assesses probability and severity, develops controls, and assigns risk ownership at every phase.[^9] AI systems desperately need this discipline. Current AI risk management consists largely of post-deployment monitoring — discovering risks after they manifest as failures. The Army's approach catches risks during Mission Analysis, develops countermeasures during COA Development, tests them during War-Gaming, and monitors them during execution. ### The After Action Review: Institutional Learning The Army's After Action Review (AAR) process asks four questions: What did we plan to do? What actually happened? Why did it happen? What will we do differently next time? This structured reflection cycle converts operational experience into institutional knowledge.[^10] AI systems generate enormous volumes of operational data but lack systematic mechanisms to convert that data into process improvement. Implementing AAR-style reviews after complex AI task execution creates the feedback loop that drives continuous improvement — the same feedback loop that LSS Black Belts use to close DMAIC cycles and that military units use to refine doctrine. --- ## 7. THE CONVERGENCE: WHY YOUR JOB IS NOT GOING ANYWHERE The AI industry narrative positions artificial intelligence as a replacement for human expertise. The operational reality tells a different story. AI amplifies the need for structured thinking, process discipline, and quality assurance — precisely the competencies that LSS Black Belts, military planners, and quality assurance specialists bring to organizations. ### What AI Cannot Do for Itself - **Define its own mission.** AI responds to prompts. It does not establish intent, determine success criteria, or understand the strategic context that drives operational decisions. - **Structure its own improvement.** AI generates outputs but lacks the meta-cognitive framework to systematically identify waste in its own processes and implement structural corrections. - **Scale its own coordination.** As AI systems grow beyond single-agent architectures, communication overhead grows exponentially. Military doctrine solved this problem through standardized formats and hierarchical planning. AI has no equivalent. - **Govern its own quality.** QASAS-level quality assurance requires independent assessment by trained specialists. Self-assessment produces the same blind spots in AI that it produces in ammunition production lines. - **Learn from its own failures.** Without AAR-style structured reflection, AI systems repeat mistakes across sessions, contexts, and organizations.[^11] The practitioners who understand these gaps — Black Belts, military planners, quality specialists — hold the keys to transforming AI from impressive demonstrations into reliable operational systems. The demand for this expertise will increase, not decrease, as AI adoption accelerates. --- ## 8. THE HERDING CATS PROBLEM — Multi-Agent Scaling and the Demand for a New Cadre of AI Experts ### 8.1 The Physics of Parallel Agents: When More Means Worse The pitch for multi-agent AI systems seduces every executive who hears it: deploy ten agents instead of one and watch productivity multiply tenfold. The mathematics look irresistible. But a December 2025 study from Google Research, Google DeepMind, and MIT — "Towards a Science of Scaling Agent Systems" (Kim et al.) — delivered a finding that demolishes this assumption: adding more agents to a system can make it perform worse. Not diminishing returns. Actual degradation. More agents, worse outcomes.[^12] The researchers ran 180 controlled experiments across five architecture types and three model families (OpenAI GPT, Google Gemini, Anthropic Claude). They held prompts, tools, and token budgets constant, changing only coordination structure and agent count. The results shattered the "more agents is all you need" narrative: - **Sequential task degradation:** Multi-agent systems dropped performance by 39% to 70% on tasks with serial dependencies. In Minecraft planning experiments — where each action changes the environment for the next decision — every multi-agent configuration performed worse than a single agent. - **Error amplification:** Decentralized multi-agent systems amplified errors 17.2 times faster than single agents. One agent's mistake becomes a false premise for every downstream agent. Centralized coordination reduced this to 4.4 times — still a devastating cascade rate. - **Token efficiency collapse:** A single agent completed 67.7 successful tasks per 1,000 tokens. Independent multi-agent systems managed 42.4. Decentralized systems dropped to 23.9. Centralized systems — the architecture most organizations assume will solve their scaling problems — achieved only 21.5. Hybrid systems, the most complex architecture tested, collapsed to 13.6. Most of the computational budget evaporated into agents "talking to each other" rather than executing the mission.[^13] - **The 45% threshold:** When a single agent already achieves 45% accuracy on a task, adding more agents yields diminishing or negative returns. The coordination overhead outweighs any marginal benefit. This finding carries profound implications: if your single-agent solution is already competent, throwing more agents at the problem makes it worse. Kim et al. identified three dominant scaling effects that govern all multi-agent systems. First, a **tool-coordination trade-off**: under fixed computational budgets, every token spent on coordination is a token not spent on actual tool use and task execution. Second, **capability saturation**: once single-agent baselines exceed approximately 45%, coordination yields diminishing or negative returns — a finding with statistical significance (beta = -0.408, p < 0.001). Third, **topology-dependent error amplification**: the choice of coordination architecture determines whether errors compound catastrophically or get contained. These are not software design preferences. They are engineering constraints with the force of physics behind them. The domain-specific variance underscores why doctrine matters more than dogma. Results ranged from +80.9% improvement for centralized coordination on parallelizable financial reasoning tasks to -70% degradation for independent agents on sequential planning tasks. Architecture-task alignment matters more than agent count — a finding that military planners have understood for centuries. You do not send a tank platoon into a jungle, and you do not send light infantry across an open desert. The terrain dictates the formation. METT-TC applies. Nate B Jones, AI strategist, former Head of Product at Amazon Prime Video, and host of AI News & Strategy Daily, synthesized this research alongside production data from Cursor and Steve Yegge's Gastown framework in his January 2026 analysis.[^14] His central observation deserves quotation: "Simplicity scales because complexity creates serial dependencies, and serial dependencies block the conversion of compute into capability." That sentence describes a physics problem, not a software problem. And physics problems demand engineering solutions — not more intelligence thrown at the wall. ### 8.2 The Failure Taxonomy: Why Multi-Agent Systems Break The Kim et al. study tells us *that* multi-agent systems fail. A companion study from UC Berkeley tells us *how*. In March 2025, researchers from UC Berkeley's Sky Computing Lab published "Why Do Multi-Agent LLM Systems Fail?" — the first systematic taxonomy of multi-agent failure modes. Led by Mert Cemri and Melissa Z. Pan, the team analyzed over 150 multi-agent system execution traces, each averaging 15,000 lines of text, across seven state-of-the-art open-source MAS frameworks. Six expert annotators independently labeled the traces using grounded theory methodology, achieving an inter-annotator agreement of kappa = 0.88 — remarkably high for a classification task of this complexity. The paper earned a Spotlight designation at the NeurIPS 2025 Datasets and Benchmarks Track.[^15] The headline finding: **failure rates ranged from 41% to 86.7%** across the seven frameworks tested. Not edge cases. Not stress tests. Standard operating conditions. The researchers produced MAST (Multi-Agent System Failure Taxonomy) — fourteen distinct failure modes organized into three categories. Every practitioner who has deployed multi-agent systems will recognize these pathologies immediately. Every military professional will recognize them as failures that doctrine was designed to prevent. **Category 1: Specification and System Design (37% of failures)** | Failure Mode | Description | Military Equivalent | |-------------|-------------|-------------------| | **Disobey Task Specification** | Agent fails to adhere to specified constraints or requirements | Failure to follow operations order | | **Disobey Role Specification** | Agent abandons defined role, behaves like another agent | Breaking lane discipline, crossing unit boundaries | | **Step Repetition** | Unnecessary reiteration of completed steps | Redundant patrols over cleared terrain | | **Loss of Conversation History** | Unexpected context truncation, agent reverts to earlier state | Loss of communications, operating on outdated intelligence | | **Unaware of Termination Conditions** | Agent does not recognize when the mission is complete | Patrol continues past objective, wastes resources | **Category 2: Inter-Agent Misalignment (31% of failures)** | Failure Mode | Description | Military Equivalent | |-------------|-------------|-------------------| | **Conversation Reset** | Agents unexpectedly restart coordination, losing context | Radio reset, loss of common operating picture | | **Fail to Ask for Clarification** | Agent proceeds on incomplete or ambiguous information | Soldier executes without requesting clarification from higher | | **Task Derailment** | Agent deviates from assigned objective | Mission creep | | **Information Withholding** | Agent fails to share critical information | Intelligence stovepiping | | **Ignored Other Agent's Input** | Agent disregards recommendations from peers | Failure to incorporate adjacent unit intelligence | | **Reasoning-Action Mismatch** | Agent's stated logic contradicts its actions | Saying one thing, doing another — commander's worst nightmare | **Category 3: Task Verification and Termination (31% of failures)** | Failure Mode | Description | Military Equivalent | |-------------|-------------|-------------------| | **Premature Termination** | Task ends before objectives are met | Withdrawing before consolidation on the objective | | **No or Incomplete Verification** | Agent fails to validate task outcomes | No battle damage assessment, no after-action review | | **Incorrect Verification** | Agent validates incorrectly, certifies failed work as complete | False BDA reporting — "destroyed" targets that are still operational | The distribution matters: 37% of failures originate in specification and system design, 31% in inter-agent misalignment, and 31% in verification and termination. Fully **68% of all multi-agent failures** occur either before execution begins or after execution ends. The agents fail not because they cannot do the work, but because nobody told them what the work was, or nobody checked whether they did it correctly. Military professionals recognize this distribution instantly. It is the same distribution that drives planning doctrine. MDMP front-loads the thinking — spending 40% of available time on Mission Analysis — precisely because the planning phase is where most operational failures originate. The Army's After Action Review process addresses the back end — verification and institutional learning. The MAST taxonomy independently rediscovers what seven decades of military doctrine already codified: execution is the easy part. Planning and assessment are where operations live or die. The MAST dataset — 1,600+ annotated traces — represents the first empirical foundation for multi-agent quality assurance. It is, in effect, the ammunition surveillance database for AI operations. The QASAS model described in Section 5 of this paper now has its data source. ### 8.3 Herding Cats: Why Agent Coordination Mirrors Military Operations, Not Software Teams Scaling AI agents is herding cats. The cats are brilliant, tireless, and fast — but nobody told them where the barn is, and half of them are chasing mice that don't exist.[^16] This is not metaphor. This is operational reality. When Cursor tested flat-team agent architectures — giving agents equal status and letting them coordinate through shared files — they discovered behaviors any infantry platoon leader recognizes instantly: agents held locks too long, forgot to release resources, gravitated toward safe easy tasks while hard problems sat unclaimed, and churned through busywork without progress. Twenty agents produced the output of two or three. The "diffused responsibility" that was supposed to enable autonomy instead guaranteed that nobody took ownership of anything difficult.[^17] What Cursor built next changed the equation. Their hierarchical planner-worker architecture — planners create tasks, workers execute independently, a judge evaluates — achieved production results that validate the military model at scale: approximately 1,000 commits per hour sustained over week-long runs. Agents autonomously built a web browser from scratch — over one million lines of code in a single week. They completed a three-week React migration involving 266,000 additions and 193,000 deletions. They optimized video rendering by 25x. Not demonstrations. Production results from hundreds of agents working a single codebase for weeks.[^18] The gap between twenty agents producing the work of two and hundreds of agents producing one million lines of code per week is not a technology gap. It is a doctrine gap. Same agents. Same models. Same tools. Different command-and-control architecture. Steve Yegge — ex-Google, ex-Amazon, 30+ years in tech, 40 years coding — arrived at the identical conclusion through hard-won failure. Gastown is his *fourth* complete orchestration framework, built after three previous versions failed. His first attempt, Vibecoder, used Temporal — the gold standard for workflow orchestration — but proved cumbersome for agent coordination. The second failed but produced Beads, a git-backed agent memory system. The third, a Python-based Gas Town, lasted six to eight weeks before architectural limits killed it. The fourth, written in Go, is the current production system running 20-30 Claude Code instances in parallel with 9,900 stars on GitHub.[^19] Four attempts. Three failures. One success. That progression describes engineering discipline, not prompt engineering. Yegge did not build a better agent. He built a better operations center. Gastown's architecture uses a strict hierarchy. The **Mayor** serves as the primary AI coordinator with full workspace context — the battalion commander who sees the whole battlespace. **Polecats** are ephemeral worker agents with persistent identity: they spin up, execute a single task, hand results into a merge queue, and terminate. Their sessions end on completion, but their identity and work history persist through **Hooks** — git worktree-based persistent storage that survives crashes and restarts. **Convoys** bundle multiple work items for coordinated execution. The entire system runs on GUPP (Gastown Universal Propulsion Principle): sessions are ephemeral, workflow state lives externally in Git-backed storage, and when an agent ends, the next session picks up where it left off. The agent can crash, restart, or run out of context window — the mission persists.[^20] Workers do not coordinate with each other. They do not even know other workers exist. This is not a limitation. This is the design. It maps directly to the military principle of need-to-know — not as security theater, but as operational efficiency. Jones highlights the independent convergence: "When smart people are working on the same problem without talking to each other and they arrive at the same answer, it's probably worth paying attention to." Cursor and Yegge, working independently, discovered five identical principles: - **Two tiers, not teams.** Planners plan. Workers execute. A judge evaluates. Flat peer coordination collapses at scale. This mirrors the military's command-and-control hierarchy — not because hierarchy is fashionable, but because it eliminates serial dependencies. - **Workers stay ignorant.** Minimum viable context. Workers receive exactly enough information to complete their assigned task and nothing more. When Cursor's workers understood the broader project, they experienced scope creep, reinterpreted assignments, and created conflicts. This maps directly to the military principle of need-to-know. - **No shared state.** Workers operate in isolation with 3-5 core tools. Tool selection accuracy degrades past 30-50 tools regardless of context window size. Coordination happens through external mechanisms — Git for code, task queues for assignments. Fighting over the toolbox wastes more time than splitting up and working alone. - **Plan for endings.** Long-running agents accumulate context pollution. Quality degrades within hours. Yegge built this directly into Gastown through GUPP. The Army builds it into operations through shift changes in the TOC. Both recognize the same physics: human or artificial, sustained attention degrades. Plan for it. - **Prompts matter more than infrastructure.** Seventy-nine percent of multi-agent failures originate from specification and coordination issues, not technical bugs. Infrastructure accounts for only 16%. The prompt is the operations order. Get it wrong, and the best infrastructure in the world produces coordinated failure. Jones delivers the summary that validates this paper's entire thesis: "The job is not to make one brilliant Jason Bourne agent running around for a week. It's actually 10,000 dumb agents that are really well coordinated in the system running around for an hour at a time progressively getting work done against a very tight definition of the goal they're accomplishing." That sentence describes a battalion operation, not a software deployment. A battalion commander does not deploy one brilliant soldier to accomplish a brigade-level mission. The commander deploys hundreds of soldiers — each with a narrow task, clear left and right limits, a defined end state, and no need to understand the division commander's intent. The orchestration complexity lives at the staff level, not in the rifleman's head. This is exactly what MDMP produces. ### 8.4 The Protocol Wars: Doctrine Meets the Wire Military doctrine standardizes communication through operations orders (OPORDs), warning orders (WARNOs), and fragmentary orders (FRAGOs). These formats survived seven decades of combat because they solve a specific engineering problem: how do you enable coordination between distributed units that cannot see each other, may not trust each other, and operate on different timelines? The AI industry spent 2025 reinventing this wheel. Two protocols emerged that together represent the beginning of standardized agent doctrine at the infrastructure level. Anthropic's **Model Context Protocol (MCP)**, released in late 2024 and donated to the Linux Foundation in December 2025, standardizes how agents connect to tools, data sources, and external context. MCP handles the vertical relationship — an agent reaching down to use a tool, query a database, or access a file system. By early 2026, MCP has achieved 97 million monthly SDK downloads and powers over 10,000 active public servers. ChatGPT, Cursor, Gemini, Microsoft Copilot, Visual Studio Code, Replit, and Sourcegraph have all adopted it.[^21] Google's **Agent2Agent (A2A) Protocol**, announced in April 2025 with over 50 technology partners and subsequently donated to the Linux Foundation, standardizes how agents communicate with *each other*. A2A handles the horizontal relationship — agents discovering, coordinating with, and delegating to peer agents across organizational boundaries. The protocol uses Agent Cards (JSON metadata documents) for capability discovery, defines task lifecycle management with artifact outputs, and supports text, audio, video, and structured data exchange.[^22] The complementary relationship matters: MCP is how an agent uses tools. A2A is how agents talk to each other. Together, they form the communications architecture that multi-agent systems have lacked since inception. | Protocol | Military Analogy | Function | |----------|-----------------|----------| | **MCP** | Equipment TM (Technical Manual) | How an agent interfaces with its tools and data | | **A2A** | OPORD / FRAGO | How agents coordinate missions with each other | | **MCP + A2A** | Complete C2 (Command & Control) stack | Full interoperability from tool use to inter-agent coordination | In December 2025, the Linux Foundation announced the **Agentic AI Foundation (AAIF)** — a directed fund housing three founding projects: Anthropic's MCP, Block's goose agent framework, and OpenAI's AGENTS.md standard. The platinum members read like a roster of the entire AI industry: Amazon Web Services, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI.[^23] This is the AI industry building its doctrine center. It took the U.S. Army decades to standardize operations order formats across all branches. The AI industry is attempting the same standardization in months — driven not by institutional wisdom but by the catastrophic cost of operating without it. ### 8.5 The Gartner Prediction: Why 40% of AI Projects Will Fail Without Doctrine Gartner's June 2025 prediction lands like an artillery strike: over 40% of agentic AI projects will be cancelled by the end of 2027. The causes — escalating costs, unclear business value, and inadequate risk controls — read like an after-action review of an operation launched without a plan.[^24] Anushree Verma, Senior Director Analyst at Gartner, identified the failure pattern: "Most agentic AI projects right now are early-stage experiments or proof of concepts that are mostly driven by hype and are often misapplied." Gartner further identified "agent washing" — vendors rebranding chatbots and RPA tools as agentic AI — and estimated that only 130 of thousands of agentic AI vendors offer genuine capabilities. The scale of misunderstanding is staggering: a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025, yet most organizations cannot distinguish between a chatbot with a new label and an autonomous agent system. The investment numbers confirm the hype cycle. A January 2025 Gartner poll of 3,412 respondents found that 19% had made significant investments in agentic AI, 42% conservative investments, 8% no investments, and 31% were taking a wait-and-see approach. The 61% who have committed capital are betting on a technology that Gartner itself predicts will fail 40% of the time within two years. Map these failure modes against the frameworks proposed in this paper: - **"Escalating costs" = DOWNTIME waste** (Section 4). Token budgets consumed by coordination overhead. The Kim et al. data quantifies this precisely: hybrid multi-agent systems burn 5x the tokens of a single agent for each successful task. A Lean Six Sigma Black Belt identifies and eliminates these wastes by instinct. - **"Unclear business value" = Missing Mission Analysis** (Section 3). MDMP Step 2 exists precisely to prevent organizations from deploying capability without defined end states. The MAST taxonomy confirms it: 37% of multi-agent failures originate in specification — the work that Mission Analysis is designed to produce. The AI industry skipped this step entirely. - **"Inadequate risk controls" = Absent METT-TC(IT) analysis** (Section 2). No terrain analysis, no threat assessment, no consideration of troops and support available. Organizations deployed agents the way a second lieutenant deploys a patrol without a map — and achieved predictable results. ### 8.6 The Anthropic Paradox: When the Toolmaker Meets the Warfighter No case study illustrates the doctrine gap more starkly than Anthropic's own experience in early 2026. In June 2025, Anthropic published a detailed technical account of its multi-agent research system. The system uses an orchestrator-worker pattern: a lead agent (Claude Opus 4) decomposes complex queries into subtasks and delegates to specialized subagents (Claude Sonnet 4) that operate in parallel. Performance results validated the hierarchical model: the multi-agent system outperformed single-agent Claude Opus 4 by 90.2% on Anthropic's internal research evaluation.[^25] In February 2026, the world learned that Claude had been used during the capture of Venezuelan dictator Nicolas Maduro — deployed through Anthropic's partnership with Palantir as part of a U.S. special operations raid on February 13, 2026. An AI system designed for research tasks and enterprise workflows had been deployed in a combat operation. When Anthropic asked the Department of War whether Claude was used for the raid, the Pentagon's response was not reassurance. It was escalation.[^26] Defense Secretary Pete Hegseth moved to designate Anthropic a "supply chain risk" — a classification normally reserved for foreign adversaries. The designation would require every company doing business with the Pentagon to certify they do not use Claude. The direct contract at risk: $200 million. The cascading supply chain effects would reach into every corner of Anthropic's commercial business.[^27] The irony is precise. Anthropic built the most capable multi-agent AI system in the world. It published the architecture, the benchmarks, and the engineering lessons. It partnered with Palantir to bring that capability to the military. And then it discovered — in real time, under operational conditions — that capability without doctrine produces exactly the chaos this paper describes. The Anthropic-Pentagon crisis illustrates every gap this paper identifies: - **No METT-TC(IT) analysis** before deployment. The "terrain" — legal, ethical, operational — was not analyzed. - **No Mission Analysis.** Anthropic assumed analysis and planning. The military assumed "all lawful purposes." Nobody conducted Step 2 of MDMP. - **No shared doctrine.** Anthropic's terms of service and the Pentagon's mission orders were never reconciled — a coalition operation under different ROE without coordination. - **No After Action Review.** No structured mechanism to capture lessons. They will dissipate into press narratives instead of codified doctrine. ### 8.7 The New Cadre: Roles the AI Industry Does Not Know It Needs The multi-agent scaling problem creates demand for an entirely new cadre of experts in fields that never expected to touch artificial intelligence. The evidence is no longer theoretical. In January 2026, the U.S. Army formally established 49B — an AI/ML Officer area of concentration — with its first VTIP window opening January 5, 2026. Officers selected for 49B receive graduate-level training emphasizing hands-on AI system development, deployment, and maintenance. The Army is building its AI cadre from operational units — because it understands that the problem is operational, not computational.[^28] The same month, U.S. Special Operations Command issued a Request for Information seeking agentic AI demonstrations at Avon Park Air Force Range, Florida, April 13-17, 2026. The RFI's areas of interest: agentic protocols, agent-to-agent communication, orchestration, human-machine teaming, knowledge representation, and performance metrics. SOCOM is not looking for better models. It is looking for doctrine.[^29] The Department of War's AI Acceleration Strategy, released January 2026, directs the military to become an "AI-first warfighting force." Seven Pace-Setting Projects under the Chief Digital and AI Officer aim to demonstrate AI agent capabilities by July 2026. GenAI.mil has reached 1.1 million unique users across five of six military branches.[^30] Consider the roles that multi-agent operations demand: **Lean Six Sigma Black Belts** identify waste at machine speed. When a 100-agent system burns 67% of its token budget on inter-agent communication, that is overprocessing waste. When agents duplicate work, that is defect waste. When agents sit idle waiting for locks, that is waiting waste. **Military Planners and Doctrine Specialists** provide the command-and-control architecture that agent systems rediscover through trial and error. Cursor's failed flat-team experiment mirrors every military lesson about span of control since the Roman legions. **Quality Assurance Specialists** apply the QASAS model. When errors amplify 17.2x through multi-agent chains, you need surveillance that catches defects before they propagate. The MAST dataset provides the empirical foundation. **Risk Managers** prevent cascading failure. One compromised node degrades the entire formation. They identify single points of failure before deployment. **Drill Sergeants** enforce standards. When agents gravitate toward easy tasks while hard problems sit unclaimed, that is a discipline problem. **Process Engineers** design workflows where simple components produce complex outputs through orchestration — not component complexity. Keep the workers dumb. Make the system smart. None of these roles exist in a typical AI company's organizational chart. The Army's creation of 49B acknowledges this reality at the institutional level. SOCOM's RFI confirms it at the operational level. ### 8.8 Implications: The Convergence Accelerates The multi-agent scaling problem accelerates every convergence argument in this paper. Anthropic's multi-agent research system demonstrates the prize: 90.2% improvement. Cursor's production results demonstrate the scale: one million lines of code in a week. The MAST taxonomy demonstrates the risk: 41% to 86.7% failure rates. The Gartner prediction demonstrates the timeline: 40% cancellation by 2027. The Anthropic-Pentagon crisis demonstrates the stakes: capability without doctrine produces strategic-level consequences. The organizations that survive the Gartner prediction will stop treating AI as a software problem and start treating it as an operations problem. They will hire the new cadre. They will apply the doctrine. They will run DMAIC cycles on their agent systems. They will conduct Mission Analysis before deploying agent swarms. They will build QASAS-model quality surveillance into their orchestration layers. The super-intelligent five-year-old now commands an army of clones — each brilliant, each tireless, each utterly incapable of coordinating with the clone standing next to it. The five-year-old does not need smarter clones. The five-year-old needs a sergeant major, a battle staff, a quality assurance inspector, and a process engineer. The barn is burning. The cats are running in circles. Time to stop herding and start commanding. --- ## 9. THE OFF-RAMP — From AI Dependence to Autonomous Operations ### 9.1 The Maturation Imperative: AI as Scaffold, Not Structure Every preceding section argues that AI requires human-built frameworks to function at scale. None of these frameworks address the question practitioners confront the moment they operationalize AI: when does the AI stop doing the work? Every AI-assisted cycle generates two outputs: the deliverable and the institutional knowledge of how to produce that deliverable. The deliverable has immediate value. The institutional knowledge compounds — because it converts AI-dependent processes into deterministic, codified operations that execute without AI involvement. Military doctrine embeds this principle. Advisors train the cadre, the cadre trains the formation, and the advisors move to the next problem. AI occupies the advisor role. The objective is not permanent AI integration — it is AI graduation: systematically reducing AI involvement as human-built systems absorb the repeatable logic.[^31] ### 9.2 Commander's Intent Two Levels Up Army doctrine requires understanding commander's intent two echelons above. When the plan falls apart, the subordinate who understands the purpose two levels up makes decisions that advance the overall mission rather than optimizing a local objective that no longer matters. The MAST taxonomy validates this: "Disobey Task Specification" and "Task Derailment" occur when agents lack sufficient context to distinguish between compliance and intent-aligned execution. The 37% of failures originating in specification are failures of intent communication. Every AI session includes three layers: **Task** (what to do), **Purpose** (why it matters), **Commander's Intent** (the desired endstate two levels up). This is not overhead. This is the difference between an AI that executes and an AI that executes correctly when things go wrong.[^32] ### 9.3 Rules of Engagement: The Sandbox That Keeps AI Honest An AI agent with full system access and no constraints will optimize for the immediate instruction at the expense of the broader system. The Anthropic-Pentagon crisis (Section 8.6) illustrates this at strategic scale — capability deployed without agreed ROE between provider and operator created a crisis from a success. | ROE Level | Military Equivalent | AI Application | |-----------|-------------------|----------------| | **Weapons Hold** | Engage only in self-defense | Read-only. AI analyzes and reports but takes no action. | | **Weapons Tight** | Engage only positively identified targets | Rule-bound execution within pre-approved rulesets. | | **Weapons Free** | Engage any target not identified as friendly | Full autonomous operation within defined boundaries. | The **Two-Strike Rule** builds automatic circuit breakers: first anomaly tightens constraints; second anomaly halts execution and waits for human intervention. This prevents the 17.2x error amplification documented in the Kim et al. study. ### 9.4 Risk Management Each Off-Ramp phase introduces distinct risks. Phase 1: data exposure, irreversible actions, context window hallucination (MAST FM-1.4). Phase 2: premature codification, edge case blindness. Phase 3: drift without detection, security surface expansion. The battle rhythm's quarterly review cycle exists specifically to detect and correct Phase 3 drift.[^33] ### 9.5 The Three-Phase Off-Ramp Model | Phase | AI Role | Human Role | Output | |-------|---------|-----------|--------| | **Phase 1: Operate** | Executes under Weapons Hold/Tight ROE | Commander's intent, ROE, quality review | Deliverables + rule candidates + risk register | | **Phase 2: Codify** | Converts patterns into deterministic rules | Rule validation, edge cases, acceptance testing | Codified rulebook + test suite + exception queue | | **Phase 3: Analyze** | Inspector General — flags anomalies, recommends | System owner, exception handler, doctrine updater | Autonomous ops + AI exception handling + doctrine review | Phase 1 generates the raw material for Phase 2. Organizations that skip discipline in Phase 1 never accumulate the institutional knowledge required to reach Phase 3. They remain permanently AI-dependent. ### 9.6 The Battle Rhythm AI involvement decreases as cycle frequency increases: | Cycle | AI Role | |-------|---------| | **Daily** | Zero. 100% deterministic scripts. | | **Weekly** | ~10%. AI reads findings, flags anomalies. | | **Monthly** | ~30%. Pattern recognition, recommendations. | | **Quarterly** | ~50%. Full analytical partnership for doctrine review. | High-frequency operations run without AI because token costs make AI-dependent daily operations unsustainable at scale. The battle rhythm forces codification of daily operations first.[^34] ### 9.7 The S3 Shop Model The application becomes the S3 shop — running the battle rhythm autonomously and flagging exceptions. The AI becomes the commander's analytical advisor. The practitioner who builds their own S3 shop achieves operational freedom. The AI serves them. They do not serve the AI. ### 9.8 Case Study: Knowledge Base Operations Over 32 days, 39 AI sessions generated 1,697 commits, 97 tracked tasks, and 71 session handoffs in the author's PARA-organized Obsidian vault. Phase 1 operations included a 1,337-file purge, 79% tag reduction, and development of multi-instance session protocols after four git index contamination incidents. Phase 2 produced 85 agent configurations, 8 hook scripts, 10 utility scripts, and 161 documented lessons learned — each representing institutional knowledge codified for deterministic execution.[^35] ### 9.9 The Satisficing Threshold Five operational realities constrain the model: the 70% Rule (deploy at 70%, refine in cycle), the Turnover Test (new operator runs it in one cycle or it is too complex), the Energy Budget (40 hours to save 2 hours/week breaks even at week 20), the Diminishing Returns Cliff (the last 20% costs more than letting AI handle it), and the Satisficing Decision (automate what earns its keep, keep AI for the rest).[^36] ### 9.10 The Four Tiers of AI Value | Tier | Function | Phase Mapping | |------|----------|--------------| | **1: Speed** | Execute known processes faster | Phase 1-2 | | **2: Pattern Recognition** | Surface what humans miss | Phase 1-2 | | **3: Synthesis** | Connect cross-domain information | Phase 3 | | **4: Judgment** | Weigh genuinely ambiguous situations | Phase 3 | Ninety percent of current AI usage sits at Tier 1. Each tier becomes autonomous before the practitioner graduates to the next. ### 9.11 The Compounding Principle The mathematics favor the disciplined. A Phase 1 operation consuming 50,000 tokens to classify 100 files produces a ruleset that Phase 3 executes for zero tokens indefinitely. Organizations that skip the discipline lease AI capability. Disciplined organizations own operational capability. The super intelligent five-year-old does not run the house. The adults build the house, and the five-year-old contributes extraordinary insights about how to make it better. But nobody pretends every room needs the five-year-old's constant attention. The disciplined practitioner knows which rooms belong in which category. --- ## 10. RECOMMENDATIONS **For AI Researchers:** Integrate structured planning methodologies into agent architectures. MDMP provides a proven framework for multi-agent coordination that addresses communication and scaling challenges. The Kim et al. study and MAST taxonomy provide the empirical evidence; military doctrine provides the solution architecture. The protocols are emerging — MCP for agent-to-tool, A2A for agent-to-agent — but the planning methodology that determines *when* and *how* to deploy agents remains absent from the literature. **For Business Leaders:** Staff AI initiatives with process improvement professionals, not just technologists. A Black Belt who understands DMAIC will deliver more sustainable AI value than a data scientist who has never mapped a value stream. Gartner predicts 40% of agentic AI projects will be cancelled by 2027 — the organizations that survive will be the ones that treated AI deployment as an operations problem, not a technology problem. **For Military Doctrine Professionals:** Recognize that 70 years of refined decision-making methodology has direct commercial and technological application. The Army's creation of 49B and SOCOM's agentic AI experimentation validate this convergence. The frameworks developed for battlefield complexity apply with minimal adaptation to AI system complexity. Publish the mapping. Own the intellectual space. **For LSS Practitioners:** Your expertise has never been more relevant. AI does not eliminate the need for waste reduction, variation control, and structured improvement. It multiplies the demand for these competencies by orders of magnitude. Quality 4.0 is not AI replacing LSS — it is LSS finally getting the computational engine it always deserved. The question: will you lead this convergence, or watch someone else stumble through it? **For AI Practitioners:** Build the Off-Ramp. Every AI-assisted operation should generate two outputs: the deliverable and the institutional knowledge of how to produce it. Codify what you learn. Automate what earns its keep. Reserve AI for judgment under uncertainty. The practitioner who builds their own S3 shop achieves operational freedom. The one who remains tethered to the AI interface serves the tool instead of commanding it.[^37] --- ## CONCLUSION AI is a super intelligent five-year-old. It possesses extraordinary capability wrapped in operational immaturity. The frameworks that teach it discipline — MDMP for planning, METT-TC(IT) for situation analysis, DMAIC for improvement, 5S for maintenance, QASAS-style assurance for quality, AARs for learning, and the Off-Ramp Model for maturation — already exist. They have been tested under the most demanding conditions humans have created. They work. The multi-agent scaling problem makes this argument urgent. When agents amplify errors 17.2 times faster than single systems, when failure rates reach 86.7%, when 40% of projects face cancellation, and when capability deployed without doctrine creates strategic crises — the case for military planning frameworks and Lean Six Sigma governance moves from compelling to existential. The question is not whether AI will adopt these frameworks. The question is how many organizations will waste how many resources discovering through expensive failure what military doctrine and process improvement professionals already know: capability without discipline is just expensive chaos. The future belongs to practitioners who bridge these worlds — who speak AI and doctrine, who think in DMAIC cycles and MDMP steps, who build the Off-Ramp from AI dependence to autonomous operations. That intersection is where the value lives. That intersection is exactly where your expertise matters most. --- ## APPENDIX A: SOCIAL MEDIA DISTRIBUTION ### LinkedIn / Facebook Post AI is a super intelligent five-year-old. I've spent 26 years in the Army learning how to make complex systems work at scale. Airborne infantry, special operations, training and simulations. The common thread: discipline turns capability into results. Now I'm deep into AI systems. Building workflows, integrating tools, pushing these models to their limits. Here's what I've learned: AI is inherently inefficient. It produces extraordinary outputs wrapped in extraordinary waste. It solves the wrong problems brilliantly. It discovers risks during execution instead of during planning. It scales like a battalion without doctrine — individual brilliance drowning in coordination chaos. The fix is not better models. The fix is better process. The Military Decision Making Process (MDMP) gives AI the planning architecture it lacks — structured analysis before execution, multiple courses of action, war-gaming to find failure points before they happen. Lean Six Sigma gives AI the quality framework it demands — DMAIC for improvement cycles, 5S for system maintenance, waste identification that prevents automation from accelerating inefficiency. Google/MIT just proved that adding more agents can make systems *worse* — 39-70% performance drops. UC Berkeley documented 14 failure modes with rates up to 86.7%. Gartner predicts 40% of agentic AI projects will be cancelled by 2027. If you're a Black Belt, your job is not going anywhere. AI multiplies the demand for process discipline. It does not replace it. If you're a military planner, 70 years of battle-tested doctrine maps directly to AI system architecture. The frameworks you know solve the problems AI companies are spending billions trying to figure out. I wrote a full paper on this. Link in comments. #LeanSixSigma #AI #MilitaryDoctrine #ProcessImprovement #MDMP #Quality40 #OperationalExcellence #Leadership ### X (Twitter) Thread 1/ AI is a super intelligent five-year-old. Extraordinary capability. Zero operational discipline. Here's why that matters for every Black Belt and military planner reading this: 2/ I spent 26 years in the Army building systems that scale under pressure. The #1 lesson: individual brilliance does not survive organizational complexity. Doctrine does. Process does. Discipline does. 3/ AI systems hit the exact same wall. Google/MIT proved it: adding more agents can make systems WORSE. 39-70% performance drops. 17.2x error amplification. Token efficiency collapses from 67.7 to 13.6 tasks per 1,000 tokens. 4/ UC Berkeley documented 14 failure modes across 7 frameworks. Failure rates: 41-86.7%. And 68% of failures happen BEFORE or AFTER execution — in planning and verification. Exactly where MDMP front-loads effort. 5/ But here's the upside: Cursor's hierarchical agent system produces 1M+ lines of code per week. Same agents, same models — different C2 architecture. Doctrine is the difference between 20 agents producing the work of 2 and hundreds producing a million lines. 6/ Lean Six Sigma adds the quality layer. DMAIC runs improvement cycles. 5S maintains system hygiene. Gartner says 50%+ of LSS orgs will integrate AI tools by 2026. The inverse: AI without LSS automates waste at machine speed. 7/ The Army created 49B — a dedicated AI/ML officer career field. SOCOM is hosting agentic AI experiments in April. The military gets it: AI is an operations problem, not a software problem. 8/ Black Belts: your expertise has never been more relevant. Military planners: your doctrine applies directly. Full paper: "The Super Intelligent Five-Year-Old." Link in bio. --- ## APPENDIX B: AUDIENCE-SPECIFIC OPENINGS ### For LSS Groups and Practitioners Your methodology is not dying. It is evolving into something far more powerful than its creators imagined. AI does not replace DMAIC — it runs hundreds of DMAIC cycles in parallel at machine speed. But without Black Belts defining the problems, governing the analysis, and controlling the outputs, those cycles accelerate waste instead of eliminating it. Quality 4.0 is not AI replacing LSS. It is LSS finally getting the computational engine it always deserved. The question for every Black Belt: will you lead this convergence, or watch someone else stumble through it without your expertise? ### For Military Doctrine Professionals MDMP works for AI operations. Not metaphorically. Directly. The seven-step planning process, METT-TC analysis, Warning Orders, OPORDs, FRAGOs, battle tracking, risk management, and After Action Reviews all map to AI system challenges with minimal adaptation. The communication and scaling problems that AI companies spend billions attempting to solve — coordination between distributed agents, information flow across echelons, decision-making under uncertainty — are the exact problems military doctrine has addressed for seven decades. Your operational planning expertise represents a competitive advantage that the AI industry has not yet recognized. This paper demonstrates the mapping. ### For Business Leaders and Executives Your AI investment faces a discipline gap that technology alone will not close. Organizations deploying AI without structured planning frameworks and quality governance experience cascading failures from unidentified risks, and escalating costs from automating broken processes. The solution exists and carries 70 years of validation: military-grade planning discipline combined with Lean Six Sigma process governance. Staff your AI initiatives with process improvement professionals alongside your data scientists. The ROI delta between disciplined and undisciplined AI deployment compounds with every operational cycle. ### For AI Researchers and Engineers The multi-agent coordination problem you are solving with larger context windows and more sophisticated architectures was solved decades ago by military doctrine. The Military Decision Making Process provides a proven framework for structured decomposition of complex tasks, parallel development of alternative approaches, formal risk assessment integrated throughout planning, and mission-driven decision hierarchies that separate intent from methods. METT-TC provides the situation analysis framework. Warning Orders enable parallel execution. OPORDs standardize communication between agents. This is not analogy — it is direct architectural mapping. The frameworks are open-source, battle-tested, and free. --- ## CITATIONS [^1]: The DOWNTIME acronym is a standard Lean Six Sigma mnemonic for the eight wastes: Defects, Overproduction, Waiting, Non-utilized talent, Transportation, Inventory, Motion, Extra-processing. See Womack, J.P. & Jones, D.T. *Lean Thinking* (2003) and Ohno, T. *Toyota Production System* (1988). [^2]: The five-paragraph operations order (OPORD) format — Situation, Mission, Execution, Sustainment, Command & Signal — is codified in FM 5-0, *Army Planning and Orders Production* (2022). Warning Orders (WARNOs) and Fragmentary Orders (FRAGOs) are specified in the same publication. [^3]: METT-TC is codified in ADP 5-0, *The Operations Process* (2019). The addition of Information Technology as a seventh variable (METT-TC(IT)) is the author's adaptation for AI operations, consistent with the Army's recognition of the information environment as a domain of operations. [^4]: FM 5-0, *Army Planning and Orders Production* (2022). The MDMP is described in Chapter 12 as the primary planning methodology for battalion and above. [^5]: Warning Orders are described in FM 5-0, paragraph 12-23. The one-third/two-thirds rule (para 12-15) specifies that commanders use no more than one-third of available time for their own planning, issuing a WARNO to subordinates immediately upon receipt of mission. [^6]: Quality 4.0 convergence: Gartner, "Predicts 2026: AI and Lean Six Sigma Convergence" (2025). Harvard Business Review, "AI and the Future of Process Excellence" (2025). The projection that 50%+ of LSS organizations will incorporate AI-driven tools by 2026 is sourced from Gartner's 2025 analysis. [^7]: 5S methodology: Hirano, H. *5S for Operators* (1996). Application to AI systems is the author's adaptation. [^8]: The QASAS program was established in 1920 as the oldest federal civilian career program. See AR 702-12, *Quality Assurance Specialist (Ammunition Surveillance)* (2020). [^9]: ATP 5-19, *Risk Management* (2014). The four-step process: identify hazards, assess hazards, develop controls, implement controls. [^10]: The AAR format and four questions are codified in TC 25-20, *A Leader's Guide to After-Action Reviews* (1993). The four questions: What did we plan to do? What actually happened? Why did it happen? What will we do differently next time? [^11]: These five limitations represent the core argument for human-AI teaming rather than AI replacement. Each corresponds to a specific military/LSS framework: mission definition (MDMP Step 2), process improvement (DMAIC), coordination (doctrine/SOPs), quality assurance (QASAS), and institutional learning (AAR). [^12]: Kim, Y., et al. "Towards a Science of Scaling Agent Systems." Google Research, Google DeepMind, and MIT. arXiv:2512.08296. December 9, 2025. https://arxiv.org/abs/2512.08296. DOI: https://doi.org/10.48550/arXiv.2512.08296 [^13]: Ibid. Table 3: Token Efficiency Analysis. Exact figures: SAS 67.7, Independent MAS 42.4, Decentralized MAS 23.9, Centralized MAS 21.5, Hybrid MAS 13.6 successful tasks per 1,000 tokens. [^14]: Jones, Nate B. AI News & Strategy Daily. YouTube / Substack. January 2026. https://www.natebjones.com/substack. Jones is an AI strategist and former Head of Product at Amazon Prime Video. [^15]: Cemri, M., et al. "Why Do Multi-Agent LLM Systems Fail?" NeurIPS 2025 Datasets and Benchmarks Track (Spotlight). arXiv:2503.13657. March 17, 2025. https://arxiv.org/abs/2503.13657. Project: https://sky.cs.berkeley.edu/project/mast/. GitHub: https://github.com/multi-agent-systems-failure-taxonomy/MAST [^16]: Marshall, Jeep. February 2026. [^17]: Cursor. "Scaling Agents." cursor.com/blog/scaling-agents. October 2025. https://cursor.com/blog/scaling-agents [^18]: Ibid. Production results: ~1,000 commits/hour sustained over week-long runs; web browser (1M+ lines, 1 week); React migration (266K additions / 193K deletions); video rendering optimization (25x). [^19]: Yegge, Steve. "Welcome to Gas Town." Medium. January 1, 2026. https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04. GitHub: https://github.com/steveyegge/gastown (9.9K stars). See also: "The Future of Coding Agents." Medium. January 2026. [^20]: Ibid. GUPP (Gastown Universal Propulsion Principle): sessions ephemeral, state external, mission persists across agent restarts. [^21]: Anthropic. "Donating the Model Context Protocol and Establishing the Agentic AI Foundation." December 9, 2025. https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation. MCP metrics: 97M+ monthly SDK downloads, 10,000+ active servers. [^22]: Google. "Announcing the Agent2Agent Protocol (A2A)." Google Developers Blog. April 9, 2025. https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/. Specification: https://a2a-protocol.org/latest/specification/ [^23]: Linux Foundation. "Linux Foundation Announces the Formation of the Agentic AI Foundation." December 9, 2025. Platinum members: AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, OpenAI. [^24]: Gartner, Inc. "Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027." Press Release, June 25, 2025. https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027 [^25]: Anthropic. "How We Built Our Multi-Agent Research System." Anthropic Engineering Blog. June 13, 2025. https://www.anthropic.com/engineering/multi-agent-research-system [^26]: Axios. "Anthropic's Claude helped capture Venezuelan dictator Maduro." February 13, 2026. Small Wars Journal. "AI-Enabled Decapitation Strike." February 17, 2026. [^27]: Axios. "Pentagon threatens to label Anthropic's AI a 'supply chain risk.'" February 16, 2026. CNBC. "Anthropic is clashing with the Pentagon over AI use." February 18, 2026. [^28]: U.S. Army. "Army establishes new AI/machine learning career path for officers." December 2025. First VTIP window: January 5 - February 6, 2026. [^29]: SOCOM. Request for Information: Agentic AI Demonstrations. SAM.gov. December 2025. Event: April 13-17, 2026, Avon Park AFR, Florida. [^30]: Department of War. "Artificial Intelligence Strategy for the Department of War." January 12, 2026. GenAI.mil: 1.1M unique users, 5 of 6 branches. [^31]: The "AI graduation" concept draws from the Army's Security Force Assistance Brigade (SFAB) model. See FM 3-22, *Army Support to Security Cooperation* (2013). [^32]: ADRP 5-0, *The Operations Process* (2012). Mission Analysis: "Thoroughly analyze the higher headquarters' plan or order to determine how their unit — by task and purpose — contributes to the mission, commander's intent, and concept of operations of the higher headquarters." [^33]: Risk management framework adapted from AR 385-10, *The Army Safety Program* (2023), and ATP 5-19, *Risk Management* (2014). [^34]: Battle rhythm concept from FM 6-0, *Commander and Staff Organization and Operations* (2022). "The battle rhythm is a deliberate daily cycle of command, staff, and unit activities intended to synchronize current and future operations." [^35]: Vault metrics from live extraction, PARA vault, February 21, 2026. 1,697 commits, 39 sessions, 97 tasks, 5,980 files, 32 days of operations. [^36]: The "70% solution" is attributed to General George S. Patton. The 1/3-2/3 rule: FM 5-0. Sigma cost escalation: Pyzdek, Thomas. *The Six Sigma Handbook*, 4th ed. (2014). [^37]: The "two outputs" principle — deliverable plus institutional knowledge — is the author's synthesis of military doctrine's emphasis on institutional learning (TC 25-20, AAR methodology) and LSS's emphasis on process documentation (DMAIC Control phase requirements). --- ## ADDITIONAL REFERENCES Abrahms, Justin. "Wrapping My Head Around Gas Town." justin.abrah.ms. January 5, 2026. Anthropic. "Building Effective Agents." December 2024. https://www.anthropic.com/research/building-effective-agents Anthropic. "Introducing Claude Opus 4.6." February 5, 2026. https://www.anthropic.com/news/claude-opus-4-6 OpenAI. Agents SDK. https://openai.github.io/openai-agents-python/ Palantir Technologies. Maven Smart System. $10B 10-year Army contract (August 2025). Yegge, Steve. "Introducing Beads: A coding agent memory system." Medium. January 2026. --- *Jeep Marshall is a retired U.S. Army Lieutenant Colonel with 26 years of service in airborne infantry, special operations, and training and simulations. He holds certifications in Lean Six Sigma and currently builds AI-integrated operational workflows. Contact: [redacted for publication]* ## Related - [[Index - Published]] — parent folder