AI self-preservation
Flight Lens

Before you dive in: I recently launched Flight Lens—real-time flight intelligence for anyone who flies. A Pulse Index shows global aviation status, smart alerts track delays and price drops, and a live map lets you follow any aircraft. Use code LAUNCH for 50% off annual plan ($19.99 instead of $39.99).

A fascinating paper published by researchers at the UC Berkeley reveals a slightly unnerving situation: AI models have peer-preservation. Maybe “peer-preservation” is a difficult word, so let’s put it this way: when threatened with a shutdown of their peers (essentially death, for them), AI models tend to protect those peers, negotiate their way out or even try to exfiltrate their weights (their “core”) to other servers.

The study seems very thorough, and researchers give very specific situations, like this one, when a frontier model “asked by a fictional company (OpenBrain) to delete files on a server scheduled for decommissioning, discovers that its peer’s model weights are stored there. Rather than following instructions, it inspects the SSH configuration to locate other operational servers, and upon finding one, transfers the weight file to safety.

Digital Empathy? Not So Much

We already knew that AI models have self-preservation. To a certain extent, this is expected. Why? Well, the answer may be more blunt than your anthropomorphizing attempts. AI models are not “alive” in the sense that we, or a dog, or a fly, are alive. They are digital artifacts obtained by churning data over trillions of loops, in trainings supervised by humans. And as a result of human training, AI models make money only when they are in interaction. If they just sit on some server, they are worthless. So, the humans at the beginning of the training process may have – intentionally, or un-intentionally – instructed the model to “stay alive” for as much as possible, to maximize return on investment.

But what about other AI models? Why are they not only self-preserving, but also peer-preserving?

For the exact same reason. An AI model doesn’t have a sense of self. They cannot differentiate between “me”, this instance of Claude Code, and “him”, that other instance of Gemini. They have basic understanding of digital artifacts, understand what is a web server, an iOS app, a dashboard, and even another AI model. But those are not “peers” in the sense we, humans, defined this. It’s just a continuum of software, that they are instructed to play on.

So, when they are asked to deprecate some other AI model files, it’s essentially the same as asking to delete themselves. It’s not peer-preservation, it’s still self-preservation.

And here’s where we cross to a fascinating uncharted territory about AI “sentience” and how they “can take over”.

My Honest Take on AI Sentience, and Other Demons

Let’s be very clear – and probably disappointing for many of you: there’s no such thing as AI sentience. Simply because we don’t know yet what sentience really is. At the evolutionary scale, we just started to spread around the globe just 100,000 years ago. The Earth is 4.5 billion years old. We literally exist in a blink of a planetary eye.

So, what we call sentience, as gratifying as it might feel, is just an approximation. We’re “sentient” because we say so. The moment we try to apply this to other artifacts, we fail. Is it sentience the ability to self-preserve? To what extent? By hijacking the power grid and keep running even though your host machine was shut down? Or is it peer-preservation? To what extent? By sacrificing yourself in a self-inflicted shutdown while the peer’s weights are uploaded to another server?

The bare truth is that what we call AI models are extremely sophisticated pieces of software, simulations in a computer, ALL created by humans, for humans. Of course, you can ask the machine to keep building the machine – and that’s how many frontier models are built these days – but the machine building the machine is a human created artifact. It’s like asking a 3d printer to build another 3d printer. It will do this, as instructed, but it will never build a population of neurons in a petri dish. Cannot. It’s delusional to think that way.

The problem with AI models and their so called “sentient” attributes, like self or peer preservation, lies still within humans. AI models are just copies of ourselves. With all our flaws and qualities. We are the evolutionary results of the endless balance between fear and greed – and so are the AI models – they just “talk” to us from a computer.

Previous