In Feb 2026, Anthropic used a team of agents to build a C compiler. My read is that this was less about using AI to build a compiler per se, and more about probing the limits of using AI to build a complex project. But it got me thinking. What would it look like to go all-in on AI-for-compilers?

In this essay I explore some visions of that.

Possible Definitions of "AI Compiler"

Here are some forms that I can imagine an "AI compiler" taking.

  1. A vibe-coded "normal compiler". It's basically the same as compilers exist today, except an AI wrote most or all of it. This is what Anthropic did.

  2. Replacing heuristics within a "normal compiler". For example, a compiler could ask an LLM to decide whether to inline a particular function call. Or the compiler could ask the LLM to decide which compiler passes to run (after the LLM e.g. inspected the IR). (Examples of prior art: MLGO and Magellan.)

  3. "Direct program synthesis", where the LLM takes your program as input and outputs a new program (potentially in assembly code) that does the same thing. This is basically using the LLM as a translator, from one language into another. In this case the "source code" might even be English.

  4. "Tool-assisted program synthesis", which is like direct program synthesis, but the AI is allowed to use tools, such as an existing compiler, to help it. The AI might, for example, decide to run LLVM's dead-code elimination pass, after which it might make some manual edits to the IR before running another pass.

    This one is different from the "replace heuristics" mode because in that one, the compiler is "in charge" and decides when to invoke the LLM, whereas in this one, the LLM is in charge and decides when to invoke parts of the "regular compiler".

  5. "Clippy mode", in which the LLM looks at your code and perhaps at the compiled artifact and suggests changes to the source code to make it better (e.g. faster, or more correct). This is different from the other modes in that they need to respect the semantics of the original source code, which might be suboptimal.

I'm particularly excited by tool-assisted program synthesis. It seems to me that most of the work of compiling code is pretty mechanical and can be handled just fine by regular code; using an LLM to do all the work of compilation feels like a waste of GPUs, at least given the cost of tokens today. But having the LLM look at your IR during compilation and tweak it -- either directly or by choosing to run one or more passes -- could be useful.

Indeed, the LLM could close the optimization loop: It could notice that the existing "regular compiler" could be tweaked to generate better code (e.g. by modifying a heuristic or adding a new pass). And of course you can put this in a loop, where at each step it benchmarks the new code. At this point it looks a lot like an "automated compiler engineer"!

Checking Our Work

I imagine you're thinking at this point, "Vibe-coding is cool, but mostly this sounds like a way to introduce bugs into my program." Indeed, human-written compilers have bugs, but at least then you can be mad at someone.

Here are some options for checking for correctness.

  1. Trust the LLM. I think this is plausible, actually. Consider that OpenAI and Google achieved gold-level scores on the IMO in July 2025 without using formal methods. If we can trust an LLM to write code and do math, why can't we trust it to modify our code as part of compilation? And if LLMs aren't good enough today, who's to say they won't be Real Soon Now?

  2. Write lots of tests. I think this is less plausible, especially in the modes where the LLM is making bespoke changes to the compiled artifact. If we have to test the individual compiled artifact (rather than the compiler itself), it can be very hard to catch the kinds of "weird" bugs that a buggy compiler can create. Sanitizers can help somewhat, but I don't think they solve the problem.

  3. Use formal verification. Broadly speaking, formal verification hasn't been widely adopted because writing proofs is hard. But if you have a superhuman AI coding assistant writing your compiler, maybe that changes things? The AI could provide proof a proof of correctness for the compiler it's written, or, if it's modifying your code in a bespoke manner, it could provide a proof that the new program preserves the semantics of the old one. (Examples of prior art: CompCert, vellvm.)

I'm particularly interested in exploring formal methods. Compilers are particularly well-suited to formal methods because compilers have a clear correctness invariant: Stated informally, don't change the semantics of the input. Everything else is optimization.

I've been playing with this for the past few days. Seems like a lot of work, but vaguely feasible for the AI to prove nontrivial theorems about the compiler? As of writing, I think I have a proof that my dominator-tree algorithm terminates. Not sure I've proven it's correct, though. 😂

You should probably not look at my code for inspiration, I don't know what I'm doing in Lean. vellvm is a more realistic example.

Thanks to Kyle Huey, Haggai Nuchi, and ChatGPT 5.5 Pro for feedback on drafts of this post.