I think, we spend so much time marveling at GPT’s ability to write poetry or solve coding problems that we sometimes forget that most of our world isn’t made of words and pixels. It’s made of things that move, break, collide, and interact in wonderfully complex ways.
The next fundamental shift in how we think about artificial intelligence is physical AI. Not just AI that lives in data centers, but AI that walks, flies, and manipulates the physical world around us.
Recently, I watched Daniela Rus’s talk at the Digital Economy Lab at Stanford University, and it crystallized something I’ve been pondering. She made this distinction between physical AI and embodied intelligence that really stuck with me. Physical AI is about giving machines better brains using AI technologies, simply making pre-programmed machines smarter at their tasks. Embodied intelligence goes further; it’s about using the machine’s body itself to make intelligent decisions about interacting with the environment.
But here’s where the most of the opportunity lies: the gap between what AI can do in the digital realm and what it struggles with in the physical world.
Think about it. We have AI models that can write well, but put that same intelligence in a robot trying to pick up an egg without breaking it? Suddenly, we’re back to square one. The challenge isn’t just about computing power or algorithms, it’s about understanding physics, dynamics, and the messy reality of the real world. The energy problem is real, too. Current large language models require massive server farms and consume enormous amounts of energy. Some projections suggest AI could use up to 12% of US power demand by 2028. But a robot running on batteries? It can’t carry a server farm on its back. We need AI that’s not just smart, but efficient.
This is where things get interesting. New approaches like liquid networks (inspired by the 302 neurons of a tiny worm, if you can believe it) show promise. Instead of hundreds of thousands of neurons, these networks can accomplish complex tasks with just 19 or 22 neurons. They adapt after training, work efficiently, and most importantly they learn causality, not just correlation. Here’s a practical example that blew my mind: a self-driving car trained with traditional deep learning looks all over the image when making decisions – trees, bushes, sky. But with liquid networks? The attention focuses cleanly on the road horizon and edges, just like a human driver would. The implications go beyond efficiency. When AI understands physics and causality, it can generalize in ways current systems can’t. Train a robot to avoid a deer, and it can figure out how to avoid people, trees, and benches without seeing thousands of examples of each scenario.
It is extremely exciting to see different approaches converge on similar insights. In my own lab, we’re exploring multi-agent reinforcement learning with symbiotic architectures inspired by nature. And you know what? The philosophical alignment with physical AI is striking. I commented recently about how RL demands a fundamentally different mindset than prompt engineering. It’s about understanding parameter dynamics, convergence patterns, and the subtle art of shaping behavior through incentives. There’s an architectural honesty to RL that I love; no hallucinations, no made-up facts! Just agents that either learn to solve the problem or don’t. This connects directly to the physical AI challenge. When you’re designing robots that must interact with the real world, you can’t hide behind clever prompts or statistical patterns. The robot either picks up the egg without breaking it, or it doesn’t. The drone either navigates the obstacle, or it crashes. The finesse and patience required for reward engineering is tremendous, but that’s exactly why it’s so powerful for alignment. We’re not just teaching machines to mimic our words; we’re encoding our actual priorities into their objective functions. In our recent work on symbiotic multi-robot RL systems, we’ve found that reward shaping remains one of the biggest challenges – but it’s a challenge I prefer to take every day of the week.
The biomimicry aspect adds another layer. Just as liquid networks draw inspiration from the simple nervous system of a worm, our symbiotic architectures look to nature’s successful patterns of cooperation and mutual benefit. When multiple agents must work together in the physical world – think robot swarms coordinating to move heavy objects – the principles of symbiosis offer powerful design patterns.
We’re also seeing fascinating work in learning from humans. Instead of programming every movement, robots can learn by watching us. Not just the visual movements, but understanding forces, torques, and the subtle dynamics of manipulation. This could revolutionize how we deploy automation in everyday settings.
Of course, challenges remain. Safety is paramount, we need AI that doesn’t just usually work, but provably works within safe parameters. The concept of barrier nets, wrapping AI decisions in mathematical safety guarantees, offers a promising path forward. I believe we’re approaching a convergence point. The separation between AI researchers and roboticists building physical systems is starting to blur. I strongly believe that the future isn’t just about smarter chatbots, it’s about intelligence that can touch, move, and shape our physical world but surely we need to explore and exploit the ways doing this right.
To be honest, this transformation won’t happen overnight, and it will require significant resources and collaboration. But the potential is incredible! To me the merge of physical and digital intelligence isn’t just a technical achievement, it’s a fundamental expansion of what machines can do for us and with us. And that’s worth getting excited about.
