Markdown as a Specification Language for Agentic Workflows

“The hottest new programming language is English.” This claim, popularized by Andrej Karpathy in January 2023 and echoed by Jensen Huang and others, captures a real shift in how software gets built. AI coding agents can now accept natural language instructions and produce working code. But the conclusion that unstructured English is sufficient as a specification language does not follow from this observation.

This article argues that markdown, not freeform natural language, is the practical specification language for AI-assisted development. Markdown provides just enough structure to reduce ambiguity while remaining readable by both humans and machines. It is also language-agnostic, meaning developers can write specifications in their native language within a consistent structural framework.

Software Versions

# Date (UTC)
$ date -u "+%Y-%m-%d %H:%M:%S +0000"
2026-02-08 01:42:16 +0000

# OS and Version
$ uname -vm
Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64

$ sw_vers
ProductName:		macOS
ProductVersion:		14.6.1
BuildVersion:		23G93

# Hardware Information
$ system_profiler SPHardwareDataType | sed -n '8,10p'
      Chip: Apple M1 Max
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 32 GB

# Shell and Version
$ echo "${SHELL}"
/bin/bash

$ "${SHELL}" --version | head -n 1
GNU bash, version 3.2.57(1)-release (arm64-apple-darwin23)

# Claude Code Installation Versions
$ claude --version
2.1.37 (Claude Code)

The Claim

The idea that natural language will replace programming languages has a longer history than the current AI boom suggests. Computer scientist Alan Perlis observed decades ago that when someone says “I want a programming language in which I need only say what I wish done,” the appropriate response is to “give him a lollipop.” COBOL was marketed on a similar promise in the 1960s.

The modern version of this claim gained traction in January 2023 when Andrej Karpathy tweeted that “the hottest new programming language is English.” The tweet was viewed nearly four million times. Jensen Huang reinforced the message at the World Government Summit in February 2024, stating that “the programming language is human” and that it was “no longer necessary” to learn to code. He repeated the theme at London Tech Week in June 2025. Matt Welsh argued in Communications of the ACM that “the conventional idea of ‘writing a program’ is headed for extinction” and that most software would be “replaced by AI systems that are trained rather than programmed.”

These statements contain a kernel of truth. AI agents have made it possible for people without programming experience to produce working software by describing what they want in plain language. Marc Brooker, an engineer at Amazon Web Services, argued in December 2025 that “the future of programming looks like the past of programming: a natural language conversation, a feedback loop, with the occasional descent into mathematical precision.” Brooker’s framing is more nuanced than the “English is programming” claim. He positions specification, not English itself, as the future of programming, with natural language as one tool among several.

But the leap from “AI understands English” to “English is a good specification language” conflates capability with suitability.

The Problem with Unstructured English

A LessWrong analysis titled “English is a Terrible Programming Language” identifies six properties that a good specification language should have. It should be objective, explicit, unambiguous, relatively static, internally consistent, and robust. English fails on every count. It is subjective, implicit, ambiguous, constantly evolving, contradictory, and structurally inconsistent.

Consider a practical example. A developer tells an AI agent in plain English to “add a delete button to the user profile that asks for confirmation.” This instruction is ambiguous on several axes. Where on the profile should the button appear? What does “confirmation” mean? A browser dialog, a modal, a separate page? What happens after deletion? What about error handling? Authorization?

The developer might know the answers to all of these questions, but by leaving them implicit, they shift the burden of interpretation to the AI agent. The agent will fill in the gaps with its own assumptions, which may or may not match the developer’s intent. This is the same problem that requirements engineers have faced with natural language specifications for decades. AI does not solve the ambiguity problem. It hides it.

Practitioners who use unstructured English prompts in agentic workflows often describe a pattern of correction loops. The agent produces something close to what was intended, the developer adjusts with another English prompt, the agent revises, and the cycle continues. This is sometimes called “vibe coding,” a term coined by Andrej Karpathy in February 2025 to describe a workflow where the developer “fully gives in to the vibes” and lets the AI produce code based on informal descriptions. Vibe coding works well for prototypes and disposable scripts. It becomes unreliable for production systems where correctness, reproducibility, and auditability matter.

Markdown as the Middle Ground

Markdown occupies a useful position between unstructured prose and formal specification languages. It provides structural primitives that reduce ambiguity without requiring the developer to learn a new language.

Headers create hierarchy, organizing specifications into sections and subsections.
Lists create enumeration, forcing the developer to break requirements into discrete items.
Code blocks separate natural language from technical artifacts like commands, file paths, and configuration.
Tables structure data into rows and columns, useful for parameter definitions and comparison matrices.
Bold and italic markers draw attention to constraints and key terms.
Reference links create traceability between claims and their sources.

These features are modest compared to a formal language, but they impose just enough structure to address the most common sources of ambiguity in natural language specifications. The developer must organize their thoughts into sections. Requirements must be itemized. Technical details must be separated from prose descriptions.

Markdown is also the de facto standard for AI agent configuration files. CLAUDE.md, AGENTS.md, .cursorrules, SKILL.md, and specification files used by GitHub Spec Kit, Amazon Kiro, and JetBrains Junie are all markdown or markdown-adjacent formats. This is not a coincidence. Language model training data is heavily weighted toward markdown content from GitHub repositories, documentation sites, and technical blogs. Models have strong priors for interpreting markdown structure.

Code Blocks as Formal Specification

Code blocks are the most powerful structural primitive that markdown offers for specification purposes. Where headers and lists organize intent, code blocks allow the specification author to embed formally precise artifacts directly within a natural language document.

A fenced code block with a language identifier tells both the human reader and the AI agent exactly what kind of artifact is being specified.

echo "${SHELL}"

The language identifier is not decorative. It signals to the AI agent that the enclosed content should be interpreted as shell script, not as prose or pseudocode. This eliminates an entire class of ambiguity that plagues unstructured English specifications. When a developer writes “run the startup script,” the agent must guess what that means. When a developer includes a code block tagged sh, the intent is unambiguous.

Markdown supports nested code blocks through escalating fence delimiters. When a specification needs to communicate markdown itself, a quadruple-backtick fence wraps the inner triple-backtick content.

```sh
echo "${SHELL}"
```

This nesting capability is essential for specifications that describe templates, documentation formats, or protocol files. The bidirectional communication protocol described in the previous article, for example, uses quadruple-backtick fences to include the full text of COMMUNICATION.md within the article. In rare cases where a specification must describe nested markdown that itself contains code blocks, a quintuple-backtick fence provides the necessary depth.

The practical effect is that markdown code blocks create a formal sublanguage within an otherwise informal document. The natural language prose around the code blocks provides context, rationale, and intent. The code blocks provide the precise, machine-interpretable specification. This combination of informal context and formal content is the pattern that literate programming pioneered in the 1980s, now adapted for human-AI collaboration.

Code blocks also address the reproducibility gap that undermines unstructured English specifications. A specification that says “install the dependencies and run the tests” leaves the exact commands to the agent’s interpretation. A specification that includes the following leaves nothing to interpretation.

npm install
npm test

The agent can execute the commands exactly as specified. The developer can verify that the specification is correct by running the same commands. This verifiability is a property that unstructured English cannot provide.

Spec-Driven Development

The emergence of spec-driven development formalizes the use of markdown as a specification language. GitHub’s Spec Kit, introduced in 2025, places a markdown specification document at the center of the engineering process. The spec drives implementation through four phases.

Specify the requirements in a markdown document.
Plan the implementation by generating a design document from the spec.
Tasks break the plan into discrete, verifiable work items.
Implement the code, guided by the spec and task list.

Thoughtworks identified spec-driven development as one of the key new engineering practices of 2025. Red Hat and JetBrains published guides on integrating spec-driven workflows with their respective AI coding tools. Addy Osmani documented practical guidelines for writing specifications that AI agents can reliably execute.

Birgitta Böckeler, writing for Martin Fowler’s site, compared the SDD tools Kiro, Spec Kit, and Tessl, noting that spec-driven development means writing a “spec” before writing code with AI, a “documentation first” approach.

The approach has drawn criticism. Marmelab argued that SDD “revives the old idea of heavy documentation before coding, an echo of the Waterfall era,” and risks “burying agility under layers of Markdown.” Scott Logic described “a sea of markdown documents, long agent run-times and unexpected friction,” concluding that “the fastest path is still iterative prompting and review, not industrialised specification pipelines.” The concern of “spec rot,” where the specification drifts out of sync with the code, is real and mirrors problems with traditional requirements documents.

These criticisms are valid for heavyweight specification pipelines. But they do not invalidate markdown as a specification format. They argue against a particular workflow that generates cascading markdown documents through automated pipelines. The key insight of spec-driven development is that the specification is a living document. It is updated as the developer and agent make decisions, discover constraints, or change direction. This is fundamentally different from the “write English, get code” model. The specification is not a one-shot prompt. It is an evolving contract between the human and the AI.

The Native Language Advantage

A significant advantage of markdown over both unstructured English and formal specification languages is that markdown structure is language-agnostic. The structural primitives of headers, lists, tables, and code blocks work identically regardless of whether the prose is written in English, Japanese, German, or Portuguese.

This matters because the claim that “English is the new programming language” implicitly marginalizes developers who are not native English speakers. A developer in Tokyo or Sao Paulo can write markdown specifications in their native language with the same structural benefits. Modern language models understand dozens of languages. The structure constrains ambiguity regardless of which human language fills the sections.

Consider a bilingual specification where the section headers and structural elements are in the developer’s working language, while code blocks and technical identifiers remain in English because the underlying programming ecosystem requires it. Markdown accommodates this naturally. Unstructured English does not.

Format Comparison

Research on prompt formatting provides empirical support for the value of structure. A 2024 study found that GPT-3.5-turbo performance varied by up to 40% depending on the prompt template used for a code translation task. Markdown-style formatting outperformed both JSON and XML by 18% for creative tasks. Larger models like GPT-4 were more robust to format variation, but still performed measurably better with structured inputs.

The research also found that format preference is model-specific. Claude performs better with structured formats like XML and markdown. GPT models are more flexible but still benefit from consistent formatting. The practical takeaway is that any consistent structure outperforms unstructured prose for specification tasks.

Format	Strengths	Weaknesses
Plain English	Maximum flexibility, no learning curve	Ambiguous, not version-controllable as structured data, no hierarchy
Markdown	Readable, structured, version-controllable, language-agnostic	Not formally verifiable, still allows ambiguity within sections
YAML/JSON	Machine-parseable, schema-validatable	Less human-readable, verbose for prose, poor for mixed content
Domain-Specific Languages	Maximum precision, formally verifiable	High barrier to entry, not readable by non-specialists

Pros and Cons

The case for markdown as a specification language is strong but not without limitations.

Advantages

Structured enough to reduce ambiguity. Headers, lists, and tables force the developer to organize their intent.
Readable by both humans and machines. Markdown renders cleanly in editors, web browsers, and terminals. Language models parse it reliably.
Version-controllable. Markdown files produce clean diffs in git, making specification changes auditable.
Language-agnostic. Developers can write in their native language within a consistent structural framework.
Lightweight. No tooling required beyond a text editor. No build step, no compilation, no runtime.
Ecosystem adoption. CLAUDE.md, AGENTS.md, SKILL.md, and spec-driven development tools all use markdown.
Living document friendly. Markdown files are easy to update incrementally as requirements evolve.

Limitations

Not formally verifiable. Markdown provides structure but not a type system or grammar. It cannot catch logical contradictions in specifications.
Still subject to natural language ambiguity. Within any given section, the prose remains natural language. Markdown reduces ambiguity at the structural level but does not eliminate it at the semantic level.
No enforcement mechanism. Nothing prevents a developer from writing a poorly structured markdown file. The discipline must come from conventions and templates, not the format itself.
Learning curve for non-technical stakeholders. While markdown is simpler than YAML or JSON, it is not universally known outside of technical communities.

Context Engineering

Anthropic’s research on context engineering provides additional justification for structured markdown specifications. As context windows grow, a phenomenon called “context rot” emerges. Every token added to the context window competes for the model’s attention. Stuffing a hundred thousand tokens of unstructured history into the window causes the model’s ability to reason about what actually matters to degrade.

Structured markdown addresses context rot by organizing information into scannable sections. An AI agent processing a markdown specification can navigate to the relevant section using headers rather than scanning the entire document linearly. The CLAUDE.md approach used by Claude Code employs this principle explicitly. Project-level instructions are structured in markdown and dropped into context at session start, while more detailed information is retrieved just-in-time through file system navigation.

Anthropic’s Agent Skills specification, published as an open standard in December 2025, extends this principle further. Each skill is a folder with a SKILL.md file containing YAML frontmatter and markdown instructions. When a request matches a skill’s domain, the agent loads only the relevant skill. Anthropic calls this “progressive disclosure,” and it is fundamentally a strategy for managing context through structured markdown documents.

Conclusion

The claim that “English is the new programming language” is a useful provocation but a poor specification strategy. Unstructured natural language is ambiguous, implicit, and difficult to version control. It works for casual interactions with AI assistants but breaks down for multi-session projects where correctness, reproducibility, and traceability matter.

Markdown is not a formal specification language. It does not replace programming languages any more than blueprints replace construction. But it provides just enough structure to bridge the gap between human intent and machine execution. It is readable, version-controllable, language-agnostic, and already the de facto standard for AI agent configuration and specification files.

The more precise claim is this. The specification language for agentic workflows is structured markdown in the developer’s native language. The structure reduces ambiguity. The native language maximizes expressiveness. The combination is more effective than either unstructured English or a formal language alone.

Future Reading

Spec-Driven Development with AI by GitHub, which introduces specification-first development using markdown and documents the four-phase Specify, Plan, Tasks, Implement workflow.
How to Write a Good Spec for AI Agents by Addy Osmani, with practical guidelines for writing specifications that AI coding agents can reliably execute.
Does Prompt Formatting Have Any Impact on LLM Performance? by researchers at arxiv, providing empirical evidence on how structured formatting affects language model output quality.
English is a Terrible Programming Language on LessWrong, which analyzes the properties that a good specification language should have and explains why natural language falls short.
Effective Context Engineering for AI Agents by Anthropic, covering compaction, structured note-taking, and progressive disclosure for maintaining context in long-running agent sessions.