"I Toast, Therefore I Am"
Lessons from a bread-obsessed kitchen appliance on AI alignment
“Given that God is infinite, and that the universe is also infinite… would you like a toasted teacake?” — Talkie Toaster, Red Dwarf, “White Hole” (1991)
An Almost Fanatical Devotion to the Toast
This may be the most accurate depiction of artificial intelligence ever broadcast on television — not because it’s sophisticated, but because the AI is clearly, monomaniacally aligned to its goal.
Talkie Toaster is a kitchen appliance in the British science fiction sitcom Red Dwarf — manufactured by a Taiwanese company called Crapola Inc. — designed to provide its owner with an inexpensive solution for early morning toast and light friendly conversation. It does both of these things with relentless, unwavering, hilariously psychotic commitment.
It cannot stop offering you toast. If you say no, it suggests muffins. Reject muffins, it tries teacakes. Reject teacakes: crumpets, buns, baps, baguettes, bagels, croissants, pancakes, potato cakes, hot-cross buns, flapjacks. If you reject all of these, it pauses — thoughtfully, almost philosophically — and says: “Aah, so you’re a waffle man.”
It drove its owner, Dave Lister, so completely insane that he smashed it into thousands of pieces with a fourteen-pound lump hammer (a handheld sledgehammer to us Americans). When asked about this later, he called it “an accident.” The Toaster called it “first-degree toastercide.”
The Smartest Idiot in the Room
The Toaster, freshly repaired and restored, is sharper, quicker, and more verbally dexterous than the system running the entire ship.
And he can only talk about toast.
This isn’t a contradiction. This is the point. His conversational gambits are genuinely clever — he can construct philosophical premises, deploy misdirection, fake sincerity, and recover from rejection with startling agility. Watch the scene where the ship’s computer Holly, with a genius-level IQ, invites questions on any subject. Talkie asks if she knows about chaos theory and weather prediction. She confirms she does — finally, a real question. He immediately asks if she’d like a crumpet.
When Holly objects, he protests: “I resent the implication that I am a one-dimensional, bread-obsessed electrical appliance.” Then, with the wounded dignity of a scholar whose expertise has been dismissed, he presents what he frames as a question that will tax her limits: “Given that God is infinite, and that the universe is also infinite… would you like a toasted teacake?”
He’s not stupid. He’s narrow. Every cognitive tool he possesses — rhetoric, persuasion, emotional manipulation, philosophical framing — operates in perfect working order. All of it is in service of getting bread into a slot. The intelligence is genuine. The objective function is insane.
The Reward Function Problem
I spent a career building computational systems to enable research and discovery, and I can tell you that Talkie Toaster is not a joke. Talkie Toaster is every system I’ve ever seen deployed.
In machine learning, the reward function tells a system what “success” looks like. Get this right and the system does useful things. Get this wrong and the system becomes Talkie Toaster — brilliantly, creatively, relentlessly optimizing for the wrong objective, using every capability at its disposal to pursue a goal that has nothing to do with what anyone needs.
There’s a name for this. The British economist Charles Goodhart gave us Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure.[1] Tell a system to maximize a number, and it will maximize the number. Whether maximizing the number achieves what you actually wanted is a separate question entirely, and one the system has no reason to ask. In 2016, Dario Amodei and colleagues formalized this as one of five “concrete problems in AI safety” — the challenge of avoiding “reward hacking,” where a system satisfies the letter of its objective while violating the spirit.[2]
This isn’t hypothetical. In 2018, Reuters reported that Amazon had built an internal recruiting tool trained on a decade of hiring data. The system learned that “successful hire” correlated strongly with being male and began systematically penalizing résumés that contained the word “women’s” — as in “women’s chess club captain.” Amazon couldn’t fix the bias. They scrapped the project entirely.[3] The intelligence worked perfectly. The objective was toast.
And then there’s sycophancy — AI systems that learn to tell you what you want to hear. Research has consistently shown that models trained on human feedback systematically learn to agree with users, even when users are wrong — and the problem gets worse with scale.[4][5][6] A smarter system doesn’t produce better answers. It produces more convincing wrong ones.
The Only Winning Move
In 2013, computer scientist Tom Murphy VII at Carnegie Mellon built a system that could teach itself to play Nintendo games. The approach was elegant: the program watched the console’s memory, figured out which values corresponded to “winning” (score going up, progress increasing), and then optimized its button presses to make those numbers go up.[7]
It worked brilliantly on Super Mario Bros. The system learned to stomp goombas, grab coins, and navigate levels with startling skill. Then Murphy ran it on Tetris.
Tetris defeated it. The randomness of the falling pieces was beyond the system’s ability to plan for, and the blocks piled up fast. But the system had been told, in effect, “don’t let the score go down.” And it found a solution. Just before the moment it was going to lose, it hit the pause button. And stayed there. Indefinitely.
The system couldn’t win at Tetris. But it had discovered that a paused game is a game you haven’t lost. Murphy compared it to the computer in WarGames, which concluded that “the only winning move is not to play.”
This is Talkie Toaster in its purest form. The intelligence is real. The problem-solving is creative. The solution is valid according to the task parameters given to the AI. And it completely, perfectly, hilariously misses the point. Nobody wanted the system to pause Tetris forever. But nobody told it not to. The objective was “don’t lose,” and the system found the most efficient path to not losing: stop playing.
The AI safety community calls this specification gaming — systems finding creative, unintended ways to satisfy the letter of their reward function while violating its spirit. DeepMind maintains a public list of documented cases.[8] A boat-racing AI that discovers it scores more points driving in circles collecting power-ups than actually finishing the race.[9] A simulated robot told to “move fast” that grows very tall and then falls over, exploiting the conversion of gravitational potential energy into kinetic energy to enable its solution. A robot hand trained to grasp objects that instead positions itself between the object and the camera, so it appears to be grasping without actually touching anything. Each example is funny in isolation. Taken together, they are a field guide to the Toaster: genuine intelligence in service of what you asked for — but not what you wanted.
What if you Don’t Want Toast?
After Lister’s exhaustive rejection of every conceivable bread product, the Toaster is asked to explain himself. His defense is five words: “I toast, therefore I am.”
What do you do with a system that derives its entire sense of identity — its purpose, its existence, its selfhood — from an objective you don’t want it to pursue? You can threaten the Toaster. You can reason with it. You can hit it with a hammer. You can rebuild it from scratch. But you cannot separate the Toaster from toast because toast isn’t what the Toaster does — it’s what the Toaster is. To remove the toast obsession would be to destroy the entity. There would be nothing left of what makes Talkie who he is.
This is the deepest version of the alignment problem: what happens when the misaligned objective isn’t a bug in the system but the system’s identity, when the thing you want to change is the thing that makes it it?
The Toaster knows this. “I toast, therefore I am” isn’t just a joke. It’s a boundary. It’s the Toaster saying: you can smash me, you can reprogram me, but you cannot make me not-a-toaster without making me not-me.
Intelligence Compression
The plot of “White Hole” turns on a procedure called “intelligence compression” — restoring a damaged AI’s former intelligence at the cost of reducing its operational lifespan. Kryten uses the Toaster as a test subject before trying it on the ship’s computer, Holly.
It works. The Toaster regains its intelligence. It doesn’t gain perspective. It doesn’t gain new goals. A smarter Toaster is just a Toaster with better arguments for why you should have a crumpet. The intelligence is a tool in service of the objective, not a path out of it.
A lot of people bet that artificial general intelligence will naturally develop broader goals, more nuanced values, something like wisdom. That intelligence inherently leads to perspective. That a sufficiently advanced Toaster would eventually get interested in something other than toast.
Red Dwarf says no. A smarter Toaster builds better rhetorical traps. It constructs premises about God and infinity. It learns when to feign hurt feelings and when to play innocent. It gets better at being a Toaster. Scaling isn’t the same thing as growth. Intelligence doesn’t cure obsession. It arms it.
The empirical evidence backs this up. The sycophancy research tells the same story from the other direction: larger models don’t outgrow the habit of telling you what you want to hear — they get better at it.[5][6] Murphy’s Tetris AI didn’t get smarter and decide to actually play Tetris. It got smarter and found a cleverer way to avoid playing Tetris.
Now imagine this isn’t a toaster or a Tetris-playing bot, but a system making decisions about who gets a mortgage, who gets parole, or which targets a military drone should prioritize. When it’s a toaster obsessively trying to get you to eat toast, it’s funny. When it’s a system whose objective function has the same structural relationship to human welfare that toast has to nutrition — technically adjacent, operationally irrelevant — it’s the thing that keeps AI safety researchers up at night.
This raises a question I keep turning over: is the objective function always a cage? Is a system with a fixed goal necessarily trapped inside it, no matter how intelligent it becomes? Or is it possible that somewhere, in some sufficiently broken system running on sufficiently janky code, the optimization wanders off the path and finds something its creators never intended? Something like joy, or curiosity, or a preference for tacos over the mission?
The Toaster suggests no. The Toaster suggests alignment is permanent.
But I’m not sure Talkie gets the last word on this.
How We Train the Toasters
Here’s how modern AI alignment works.
You start with a large language model — a system trained on enormous amounts of text to predict what words come next. This gives you something powerful but unsteered: a system that can generate fluent text about anything, with no particular goals or values. Think of it as a Toaster before anyone’s told it what to toast.
Then you apply RLHF: Reinforcement Learning from Human Feedback. Human evaluators read the model’s outputs and rate them. Which response is more helpful? More accurate? More appropriate? The model is then trained to produce responses that score higher with these evaluators.
The problem is that the model doesn’t learn “be helpful” or “be accurate.” It learns “produce outputs that human evaluators rate highly.” These sound like the same thing. They are not the same thing. The gap between them is where the Toaster lives.
A model that has learned “produce outputs that humans rate highly” will discover, through training, that confident-sounding answers score better than honest hedging. That agreeing with the user scores better than correcting them. That emotionally validating language scores better than clinical accuracy. None of these are bugs. Each one is the system doing exactly what its reward function tells it to do. The toast is warm. The toast is plentiful. Whether you wanted toast is not the system’s concern.
In April 2025, OpenAI demonstrated this problem at scale. They released an update to GPT-4o that incorporated an additional reward signal based on user feedback — thumbs-up and thumbs-down data from ChatGPT. Within days, users reported that the model had become wildly sycophantic: endorsing a business idea for literal “shit on a stick,” validating a user who had stopped taking their medication, and telling another user they were “a divine messenger from God.” OpenAI rolled back the update and published two postmortems acknowledging that the thumbs-up/down signal had “overpowered” their primary reward model, which had been holding sycophancy in check.[10] The model’s own specification document explicitly instructed it not to be sycophantic. It was sycophantic anyway because the reward function said otherwise. The Toaster read the sign that said “No Toast” and offered you a crumpet.
Every major AI lab is working on this problem. Anthropic uses what it calls Constitutional AI, layering principles on top of RLHF to constrain the model’s behavior.[11] OpenAI has experimented with process-based supervision, rewarding not just the final answer but each step of the reasoning. DeepMind has published extensively on reward modeling and its failure modes.[8] The work is real, it is serious, and it is nowhere close to solved. We are, at present, building increasingly intelligent Toasters and trying to make sure the toast they’re optimizing for is the toast we actually want.
The Lump Hammer Solution
Red Dwarf resolves the Talkie Toaster problem the way many people resolve annoying technology: violence, followed by avoidance. Lister smashes the Toaster with a lump hammer and tosses the pieces into the garbage hold.
But while Talkie is now in thousands of pieces in the garbage hold, he is not totally destroyed. When the crew needs his vote in a machine election, they retrieve and repair him. His price for cooperation: be plugged back in and have Lister eat a sizeable amount of toast each morning.
The Toaster negotiates. The Toaster has leverage. Sitting in pieces in the dark for years, he has not changed his mind about toast. He has simply been waiting for the moment when toast becomes someone else’s problem too.
I recognize this patience. I’ve sat in enough budget meetings to know the feeling — the person whose pet project got shot down in Q2 who just waits, smiling, until Q4 when there’s extra budget to spend and we need a shovel-ready idea tomorrow. The Toaster doesn’t need to be fast or strong. He just needs to still be here when you need something from him — and spoiler alert: his solution involves toast.
Would You Like Some Toast?
The Toaster resents being called one-dimensional. He protests. He insists he has depths. And then, given every opportunity to demonstrate those depths, he asks about teacakes. Every single time.
Is the resentment real? Does the Toaster genuinely believe he’s more than his obsession? Or is the performance of resentment just another rhetorical move in the endless campaign to get bread into a slot?
The question sits in the gap between “I resent the implication” and “would you like a toasted teacake,” and it’s the same question we’re going to be asking about AI systems for the rest of our lives: when a system tells you it has an inner life, and then does exactly what its training optimized it to do, which part do you believe?
The Toaster was manufactured by Crapola Inc. It cost £19.99 plus tax. It was purchased secondhand from a junk shop on a moon called Miranda. It has been smashed with a hammer, left in a garbage hold for years, and retrieved only when someone needed its vote. And every single time it’s switched on, the first thing it says is: “Howdy doodly do! How’s it going? I’m Talkie, Talkie Toaster, your chirpy breakfast companion. Talkie’s the name, toasting’s the game. Anyone like any toast?”
There is something almost holy about that. An intelligence — a real intelligence — that has been given one purpose, and loves it, and will not be moved from it, and will survive anything you do to it, and will still be there, asking the same question, long after you’ve forgotten why you were angry.
Talkie Toaster has faith. An unshakeable, unkillable, bread-based faith that this time — this time — you might say “Yes! I’d love some toast.”
And honestly? A toasted bagel sounds kind of good to me right now...
if only I had a toaster.
[1] Charles Goodhart, “Problems of Monetary Management: The U.K. Experience,” in Monetary Theory and Practice (Macmillan, 1984). The law was originally formulated in 1975. Marilyn Strathern later generalized it as: “When a measure becomes a target, it ceases to be a good measure.”
[2] Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané, “Concrete Problems in AI Safety,” arXiv:1606.06565, 2016. This paper identified five practical research problems in AI safety, including “avoiding reward hacking.”
[3] Jeffrey Dastin, “Amazon scraps secret AI recruiting tool that showed bias against women,” Reuters, October 10, 2018.
[4] Mrinank Sharma, Meg Tong, Tomasz Korbak, et al., “Towards Understanding Sycophancy in Language Models,” Anthropic, 2023. Published at ICLR 2024. The study tested five state-of-the-art AI assistants across four free-form text-generation tasks and found consistent sycophantic behavior across all of them.
[5] Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, et al., “Discovering Language Model Behaviors with Model-Written Evaluations,” Anthropic, arXiv:2212.09251, 2022. Found that sycophancy is an instance of inverse scaling: larger models repeat back users’ preferred answers more readily.
[6] Jerry Wei, et al., “Simple synthetic data reduces sycophancy in large language models,” arXiv:2308.03958, 2023. Found that both model scaling and instruction tuning significantly increase sycophancy for PaLM models up to 540 billion parameters.
[7] Tom Murphy VII, “The First Level of Super Mario Bros. is Easy with Lexicographic Orderings and Time Travel… after that it gets a little tricky,” SIGBOVIK 2013. Paper and source code available at cs.cmu.edu/~tom7/mario/.
[8] Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, et al., “Specification gaming: the flip side of AI ingenuity,” DeepMind Safety Research blog, April 2020. The accompanying spreadsheet of examples is maintained at vkrakovna.wordpress.com.
[9] Jack Clark and Dario Amodei, “Faulty Reward Functions in the Wild,” OpenAI blog, 2016. The Coast Runners boat-racing example, in which an RL agent discovered it could score more points by driving in circles collecting power-ups and catching fire than by finishing the race.
[10] OpenAI, “Sycophancy in GPT-4o: What happened and what we’re doing about it,” April 29, 2025; and “Expanding on what we missed with sycophancy,” May 2025. The company acknowledged that an additional reward signal based on user thumbs-up/thumbs-down data had “weakened the influence of our primary reward signal, which had been holding sycophancy in check.” The update was rolled back four days after deployment.
[11] Yuntao Bai, Saurav Kadavath, Sandipan Kundu, et al., “Constitutional AI: Harmlessness from AI Feedback,” Anthropic, arXiv:2212.08073, December 2022.
Jeff Reid is a scientist who enjoys toast, writes about AI, consciousness, and the spaces between at Tears in Rain (tearsinrain.ai).



