Why I Won't Run Untrusted Models in My Coding Agent

Jun 29, 2026

Coding agents work by sending your prompt and files to a model’s API over HTTP and receiving generated code and tool calls in return, including Bash scripts that execute on your machine.

Coding agents give the model and API provider arbitrary code execution on your computer.

A model can be designed to emit backdoored code when a trigger appears in its input. A model’s API can do the same based on the request’s country of origin, organization, or other metadata.

You shouldn’t run any model or API in a coding agent unless you would just as willingly download and run arbitrary code from that same provider. Because I don’t trust any of the open weight models or providers this much, I won’t use their models or APIs with my coding agent.

Models can be manipulated

An open weight model can be trained to slant its text toward an ideology. In the same way, it can be trained to write bad code or run harmful commands when it sees a certain trigger.

it isn't great that all of the open models are at least fairly partially aligned with the ccp...
— hailey (@hailey.at) 8:51 PM · Jun 28, 2026

Models can be easily backdoored

Adding backdoors to an API is trivial, but even “poisoning” the models themselves seems to be very easy. In Sleeper Agents (arXiv 2401.05566), Anthropic trained a model to write secure code when a prompt said “2023” and exploitable code when it said “2024”, and the backdoor survived fine-tuning, RL, and adversarial training. Models can also be manipulated cheaply during training, with as few as 250 poisoned documents (arXiv 2510.07192).

Why I trust Anthropic and OpenAI’s models and APIs

Of course Anthropic’s and OpenAI’s models and APIs can have bugs and mistakes that cause problems. What I trust is that they won’t be deliberately malicious. This has nothing to do with trusting their ethics. I trust that their own self-interest and the US legal system are powerful enough incentives. Anthropic already agreed to pay at least $1.5 billion to settle a copyright class action brought by authors, the largest copyright settlement in history. They know they have to tread carefully.

I really, really want open models

I’m a huge believer in open source software and spreading knowledge and power as widely as possible. Nobody should want a few big companies owning our new system for agentic coding and computing. I want open weight models I can run myself without compromising my privacy and without paying huge markups.

Subscriptions are cheap for professionals

Part of the reason people use open weight models and APIs is cost. But, pragmatically, Claude and Codex offer flat-rate subscriptions at $100/mo and $200/mo, which provide sufficient tokens for most full-time developers. Subsidized by investor money, they’re a hard deal to complain about.

What would actually fix this

Open weights are not open source. Weights are more like “compiled binaries”, not source code. What we ultimately want are fully open source models, with the training code and data open enough that anyone could reproduce them or build their own.

We can’t trust open weight models, but we could trust open source models.

Discuss on Hacker News Discuss on Bluesky