Analysis

AI alignment and Physical AI safety are different problems

May 26, 20262 min readMati Melchior

Two problems get conflated in almost every AI safety discussion. They shouldn't be.

Alignment asks: does the AI want what we want? Does its objective function match human intent? If you tell it to "minimize harm," does it interpret harm the way you do? Alignment is about the fidelity between what the human intended and what the system optimizes for. The entire field of RLHF, constitutional AI, and reward modeling exists to close this gap.

Physical safety asks a different question: what can the AI actually do in the physical world — and what hard limits prevent damage regardless of intent? This is the domain of functional safety engineering: hardware interlocks, force limiters, safety-rated speed bounds, and independent monitoring channels that operate below the software layer.

These are not the same problem. And treating them as interchangeable creates blind spots.

Consider two scenarios. In the first, an AI controlling a robotic arm is misaligned — it optimizes for the wrong objective. But the arm has hardware-enforced torque limits, speed caps, and an independent hardware safety layer that cuts power if force exceeds a threshold. The AI does the wrong thing, but the damage is bounded. The outcome is survivable.

In the second scenario, the AI is perfectly aligned — it understands exactly what the human wants. But a software bug corrupts its control signal. Or a cosmic ray flips a bit in memory. Or an adversary injects a malicious command through a network connection. Without hardware-enforced bounds, the aligned AI has no physical limit on what it can do in the moment of failure. The outcome is unpredictable.

The distinction matters because the two problems require different expertise. Alignment researchers work on objective functions, reward models, interpretability, and value learning. Physical safety engineers work on redundancy architectures, diagnostic coverage, fail-safe states, and hardware fault tolerance. Both disciplines are essential. Neither one subsumes the other.

The AI safety conversation today is dominated by alignment. The physical safety conversation barely exists in the AI community. In traditional industries — aviation, nuclear, rail — this conversation has been happening for decades under the framework of IEC 61508 and its derivatives. Physical AI needs both conversations, running in parallel, with engineers who understand that solving alignment does not automatically solve the problem of a robot arm that can crush a human hand.

Sources

Mati Melchior · Physical AI Safety Researcher

LinkedIn X About →

AI alignment and Physical AI safety are different problems

Sources

Fewer than 30 of 123 Physical AI companies are certifying. January 2027 is 7 months away.

2006 vs 2026 — everything changed except the safety layer

Sources

Physical AI Safety Dispatch

Related analysis

Fewer than 30 of 123 Physical AI companies are certifying. January 2027 is 7 months away.

2006 vs 2026 — everything changed except the safety layer