Google I/O 26: Welcome to the Age of Glorified AI Interns Artwork

The Deepdive

Join Allen and Ida as they dive deep into the world of tech, unpacking the latest trends, innovations, and disruptions in an engaging, thought-provoking conversation. Whether you’re a tech enthusiast or just curious about how technology shapes our world, The Deepdive is your go-to podcast for insightful analysis and passionate discussion.

Tune in for fresh perspectives, dynamic debates, and the tech talk you didn’t know you needed!

Read the companion article on https://medium.com/@allanandida

All Episodes

The Deepdive

Google I/O 26: Welcome to the Age of Glorified AI Interns

May 20, 2026 • Allen & Ida • Season 3 • Episode 61

0:00 | 20:23

Google just spent an hour telling us AI agents will run our lives, and somehow it still sounds like a very overqualified intern with push notifications.

Read the companion article on https://medium.com/@allanandida

At Google I/O 26, Gemini 3.5, Omni and Spark were pitched as the start of an “agentic era” where background bots quietly schedule your day, rewrite your inbox and shop on your behalf while you do more “important” things. In this episode, we unpack what Google actually shipped – from Daily Brief, Universal Cart and voice‑driven Gmail to Android XR glasses – and ask whether these agents are true coworkers or just glorified task rabbits wrapped in trillion‑token branding. Along the way, we look at what this means for users, developers and the open AI ecosystem when one company decides its interns now live inside your calendar, browser and credit card.

Leave your thoughts in the comments and subscribe for more tech updates and reviews.

A Number That Breaks Your Brain

Ida 0:05

I want you to just um picture something for a second. You're standing there, maybe you've got your morning coffee in hand, and someone casually drops this number on you. 3.2 quadrillion.

Allan 0:17

I mean, it genuinely sounds like a number a little kid invents, right? Like when they're losing a playground argument. Well, I have 3.2 quadrillion invisible force fields.

Ida 0:26

Aaron Powell Yes. I thought the exact same thing, but this isn't a playground argument. This is, you know, Google CEO Sundar Pachai standing under the bright lights at the 2026 i slash O keynote. Oh yeah. And that number. 3.2 quadrillion is the number of data tokens that Google's AI models are processing every single month. And a token is essentially the fundamental building block of a problem being solved.

Allan 0:51

Aaron Powell We really have to put that in historical context to understand the sheer violence of that scale. Just 12 months prior, that number was 480 trillion.

Ida 0:59

Which we thought was huge.

Allan 1:00

Exactly. We thought 480 trillion was this incomprehensible deluge of computation. But we are looking at a seven-fold jump in a single year. It represents a physical infrastructure shift that is, frankly, difficult to even map in your head. The server farms, the cooling, the silicon required to churn through quadrillions of tokens asynchronously. It's a completely new paradigm of compute.

Ida 1:24

So we are drowning in computation. And the natural question you have to ask is, well, what are we actually doing with it?

Allan 1:29

Right. Are we solving the mysteries of the universe?

Ida 1:32

Aaron Powell I mean, during the keynote, they briefly touched on predicting category five hurricanes and, you know, modeling complex proteins. But when you look at the actual consumer demos for this new agentic Gemini era, it's completely different.

Allan 1:46

It really is.

Ida 1:47

We are primarily using this god-like frontier intelligence to email dog kennels and organize neighborhood block parties.

Allan 1:53

Aaron Powell The justiposition is wild. And that's exactly our mission for you on this deep dive today. We are exploring the architecture and the implications of this newly inaugurated agentic era.

Ida 2:03

Right.

Why Agentic AI Changes Everything

Allan 2:04

The core theme that emerged from every single demo is that Google has essentially decided human decision making, even the tiny micro decisions, is just too exhausting.

Ida 2:14

Yeah, we're just done making choices.

Allan 2:15

Exactly. They are building an infrastructure designed to let you completely outsource your cognitive load to the cloud. We are shifting from asking AI to generate text to letting AI execute complex multi-step actions on our behalf.

Ida 2:32

Okay, let's unpack this because if we are burning through quadrillions of tokens, the architecture here has to be fundamentally

Gemini Spark Lives In The Cloud

Ida 2:39

different than just like a chatbot on my phone.

Allan 2:41

Completely different, yeah.

Ida 2:42

And that brings us to the star of their show, Gemini Spark. They are billing it as a 24-7 personal AI agent. But what actually makes it agentic? I mean, why couldn't my phone's local processor just handle a to-do list?

Allan 2:57

Well, your phone's processor is brilliant for immediate localized tasks. But Spark doesn't live on your phone. It runs on dedicated virtual machines in Google Cloud.

Ida 3:06

Okay, so it's entirely off-device.

Allan 3:08

Right. And think of a virtual machine not as physical hardware, but as a simulated, self-contained computer running within Google's massive server farms. By decoupling the agent from your local hardware, it gains persistent statefulness.

Ida 3:21

Meaning it doesn't sleep just because my phone battery died?

Allan 3:24

Precisely. You can assign Spark a complex multi-day task, completely close your laptop, and just walk away. The virtual machine keeps the agent active in the background. That's wild. It's constantly monitoring data streams, waiting for conditions to be met, and executing actions asynchronously.

Ida 3:42

Which they demonstrated with perhaps the most painfully relatable modern chore in existence, planning a

The Block Party That Runs Itself

Ida 3:47

neighborhood block party.

Allan 3:49

Oh, that demo, yes.

Ida 3:50

In the live demo, the presenter essentially brainded a vague mandate onto Spark. They didn't code a workflow, they just told it to handle the party. And the AI autonomously spun up a live Google Sheet to track incoming RSVPs.

Allan 4:04

But it didn't just passively read the sheet. It used a mechanism called retrieval augmented generation to monitor incoming Gmail replies.

Ida 4:12

Right. It actually read the emails.

Allan 4:14

Yes. And it extracted the unstructured intent of the neighbor's email, whether they were a yes, a no, or a maybe if I could find a babysitter, and then it updated the structured rows and columns of the spreadsheet dynamically.

Ida 4:25

It then drafted polite reminder emails to the neighbors who hadn't responded yet. And it even generated a slide deck to hype up a bounce house for the kids.

Allan 4:33

The bounce house?

Ida 4:34

Yes. But the detail that absolutely broke me, the one that really highlights the reasoning capability, was the HOA rule.

Allan 4:41

The agent's situational awareness there was incredible.

Ida 4:44

Unbelievable. The AI autonomously dug into the user's Google Drive, searched through a massive folder of boring documents, found the homeowners association bylaws, and read them.

Allan 4:55

And then stopped them from breaking the rule.

Ida 4:57

Exactly. It actively alerted the user that they were legally prohibited from setting up the bounce house in the cul-de-sac before Friday afternoon.

Allan 5:06

What does this say about us as a society? We used to worry AI would steal our jobs, now we're begging it to navigate the passive aggressive politics of our homeowners association.

Ida 5:15

It's the ultimate fantasy of avoiding social friction. You are literally outsourcing the anxiety of neighborhood politics to a virtual machine.

Allan 5:23

It really is.

Ida 5:24

And Google is weaving this lack of friction deep into the operating system level, which we saw with the new

Voice Control Meets Messy Desktops

Ida 5:29

Gemini app for Mac. The voice command demo for the Mac integration completely changes how we interact with local files. The presenter is just looking at their messy desktop.

Allan 5:38

Like most of our desktops.

Ida 5:40

Right, exactly. They use their mouse to highlight a bunch of random unstructured files, a couple of PDFs, some images of vet invoices. They hold down a functional lee and just start talking naturally.

Allan 5:52

They asked Gemini to draft an introductory email to a dog kennel for their two dogs. One was named Hank.

Ida 5:58

And the other was named Lou Cinnamon, which is objectively a top-tier name for a dog.

Allan 6:02

Truly the best name.

Ida 6:03

But beyond the names, think about the mechanics of what the AI just did. It took static image files and messy PDFs, ran a localized vision model to extract the text, and used semantic reasoning to understand what was a vaccine date versus what was a random phone number.

Allan 6:19

And then it just built that table.

Ida 6:20

Yeah, it structured all of Hank and Louis Cinnamon's allergies into a perfectly formatted HTML table inside an outgoing email draft.

Allan 6:29

All in seconds. That process is completely eradicated.

Ida 6:37

It's just gone.

Allan 6:39

But to accomplish these seamless behind-the-scenes tasks on a larger scale, the AI has to do more than just read files. It is fundamentally changing the fabric of the internet itself.

Ida 6:50

Oh, for sure.

Allan 6:51

We are moving away from finding information to demanding the web recompile itself

Search Stops Linking And Starts Building

Allan 6:56

to serve our immediate needs.

Ida 6:58

This was the moment in the keynote where I realized the internet as we know it, you know, the classic 10 blue links on a search page is dead.

Allan 7:06

Completely dead.

Ida 7:07

The traditional search box is being replaced by an intelligent search box. You aren't just typing text anymore, you are dropping in files, videos, or even open Chrome tabs as direct inputs.

Allan 7:18

Google is calling the underlying engine for this anti-gravity 2.0. It's a developer tool powered by their Gemini 3.5 flash model, and it introduces agenc coding directly into the consumer search experience.

Ida 7:31

So it's coding for you.

Allan 7:33

Exactly. When you ask a complex question now, the search engine doesn't go look for a webpage that holds the answer. It acts like a senior software engineer and literally builds a custom application, complete with a user interface right there in the results.

Ida 7:45

Aaron Powell The Astrophysics example they showed perfectly illustrates this. A user asks how binary black holes create gravitational waves.

Allan 7:53

Aaron Powell A pretty heavy question.

Ida 7:54

Yeah. In the old internet, you get a Wikipedia link. In the anti-gravity 2.0 internet, search dynamically codes and deploys a fully interactive, custom 3D simulation of black holes spiraling into each other. You have actual sliders to adjust the mass and the orbital separation.

Allan 8:12

Aaron Powell Let's just break down the technical marvel of what is happening under the hood there. The AI isn't pulling a pre-made widget from a library.

Ida 8:19

Right. It's building it from scratch.

Allan 8:21

The Gemini model is interpreting your prompt, planning a software architecture, writing the underlying physics logic, coding the front-end graphical interface, compiling it, and deploying it inside a secure sandbox container on your browser.

Ida 8:34

In milliseconds.

Allan 8:36

It is writing the spoke software for your highly specific question in milliseconds. It's unbelievable.

Ida 8:41

And it does this for deeply personal queries, too. Another demo showed someone planning a weekend trip for their family. Search utilized something called personal intelligence to securely parse data from the user's Gmail and calendar.

Allan 8:54

And then it coded a persistent mini app for the weekend.

Ida 8:56

Right. It wasn't a static itinerary, it was a custom dashboard tracking live restaurant reservations. It even integrated chess tutorials and activities because it deduced from earlier emails that the oldest kid was learning to play chess.

Allan 9:10

Building a stateful personalized application on the fly is impressive enough. But to truly prove the raw, unadulterated power of Anti-Gravity 2.0 and the Gemini 3.5 flash model, the Google engineering team did something completely unhinged during the developer portion of the keynote.

Ida 9:28

Oh, the absolute most absurd tech flex I have ever witnessed.

Allan 9:31

Just so over the top.

Ida 9:32

They wanted to prove how good these

Ninety-Three Agents Code An OS

Ida 9:33

multi-agent systems are at coding. So they unleashed 93 autonomous AI subagents and gave them a single prompt. Build a fully functional computer operating system entirely from scratch.

Allan 9:45

We need to be clear about the difficulty of that prompt. Building an operating system is an incredibly complex orchestration of memory management, file systems, kernel architecture, and hardware abstraction.

Ida 9:58

It's not a weekend project.

Allan 9:59

No, it usually takes a team of human engineers months, if not years.

Ida 10:04

And these 93 agents worked in parallel for 12 hours. The architecture of this is mind-blowing. Think of it like a highly specialized automated construction crew.

Allan 10:14

Oh, that's a good way to put it.

Ida 10:15

You have an architect agent designing the kernel, a plumber agent handling the memory allocation, and a supervisor agent checking for merge conflicts in the code. They are constantly talking to each other, writing code, testing it, failing, and iterating.

Allan 10:30

And the numbers are staggering.

Ida 10:32

They processed 2.6 billion tokens in that 12-hour window. They wrote every single line of code, and the total compute cost was less than $1,000 in API credits.

Allan 10:43

And the ultimate payoff for this monumental feat of multi-agent software engineering.

Ida 10:47

They booted up this bespoke AI-generated operating system live on stage just to play the classic 1993 video game Doom. This is simultaneously impressive and completely ridiculous. It's like asking a librarian to help you find a book, and instead, they build a custom printing press in front of you.

Allan 11:07

It is the ultimate flex of raw capability.

Ida 11:10

It proves that you essentially drop items into this persistent cart from wherever you happen to be browsing.

A Universal Cart With Real Reasoning

Ida 11:16

Once an item is in there, the cart becomes this tireless background bargain hunter. It never sleeps. Exactly. It continuously queries merchant APIs to track price histories, monitor for restocks, and hunt for flash deals.

Allan 11:30

Aaron Powell But the critical evolution here is that the cart acts as a highly analytical financial chaperone. It possesses semantic reasoning about the physical properties of the products themselves.

Ida 11:41

The PC building example they used to demonstrate this was brilliant. Let's say you are building a custom gaming rig. You add a high-end processor to your universal cart from a tech blog. Okay. But three days ago, you had added a specific motherboard from a YouTube review. The AI cart proactively flags a warning. It recognizes that the processor requires a different physical socket type than the motherboard you already selected.

Allan 12:05

That is so helpful.

Ida 12:07

It stops you from buying incompatible parts and dynamically suggests alternatives that fit your exact bill.

Allan 12:12

It understands the spatial and physical compatibility constraints of consumer goods. And to cross the final hurdle, making the actual purchasing completely seamless, Google introduced the agent payments protocol, or AP2.

Ida 12:26

The AP2 protocol is where we cross firmly into sci-fi territory. This protocol allows your AI agent to securely execute financial transactions on your behalf without you ever clicking a checkout button.

Allan 12:39

Which sounds terrifying.

Ida 12:40

My immediate thought was how do I know it won't just empty my bank account on random gadgets?

Allan 12:45

Because AP2 isn't just handing your credit card number to a chatbot. It utilizes cryptographic tokenized parameters. You set strict, immutable boundaries. For instance, you tell the agent you only want a specific brand of monitor and your hard spend limit is $400.

Ida 12:59

So it's locked in.

Allan 13:00

Right. The agent generates a single-use token bound by those exact smart contract-like constraints. When it finds a deal that matches, it initiates a handshake with the merchant processor. And if they try to overcharge, if the merchant tries to charge $401 or swap the brand, the transaction mathematically fails. It creates a tamper-proof digital paper trail that links your constraints, the merchant, and the payment processor so there are no disputes.

Ida 13:24

Okay, but here's the thing. The AI is smart enough to know you're buying the wrong PC parts, but presumably polite enough not to mention you don't actually need another gaming rig. How long until my AI is just negotiating with your AI to buy things neither of us wants?

Allan 13:40

It's a highly valid concern regarding induced demand. We are handing these systems an unprecedented level of agency over our wallets.

Ida 13:47

We really are.

Allan 13:48

But Google's core bet is that the sheer convenience, the complete removal of the physical exhaustion of cross-referencing, motherboard compatibility sheets, and hunting for discount codes will easily override our skepticism.

Ida 14:00

So far, this automated, frictionless existence has been trapped behind glass. We are looking at laptops,

Smart Glasses That Translate The World

Ida 14:06

phones, and desktop monitors. But Google wants to take this environmental intelligence and integrate it directly onto our faces, which brings us to the hardware: the new Android XR smart glasses.

Allan 14:17

Historically, smart glasses have faced a massive uphill battle regarding public perception.

Ida 14:22

Because nobody wants to walk down the street looking like a rogueborg drone with a glowing camera strapped to their temple. The cyborg aesthetic is deeply off-putting.

Allan 14:32

Google is hyper-aware of that stigma, which is why their hardware strategy this time is entirely dependent on partnerships with top-tier fashion eyewear brands. They handed the external chassis design over to Warby Parker and Gentlemonster.

Ida 14:46

So they actually look like high-end stylish fashion accessories. You wouldn't know there were tech devices at a glance. Not at all. And to further reduce the friction of adoption, the initial rollout this fall only features audio glasses. There is no visual display, no augmented reality holograms projecting into your retinas.

Allan 15:03

They feature high-resolution onboard cameras so the multimodal AI can process your visual environment, but the output is delivered via directional spatial audio.

Ida 15:12

It's just sound.

Allan 15:13

It is literally Gemini whispering directly into your ear, providing constant hands-free context about the world around you without ever demanding you look down at a screen.

Ida 15:23

The live, hands-on test of the spatial audio was staggering. A journalist used the XR glasses to facilitate a highly complex three-way conversation in a crowded room. You had a Spanish speaker, a Serbian speaker, and an English speaker.

Allan 15:38

All talking at once.

Ida 15:39

Yeah. The glasses seamlessly captured the audio, processed the translation via the cloud, and whispered the real-time English translation into the journalist's ear.

Allan 15:48

The technical hurdle there isn't just translation, it's acoustic isolation. The AI demonstrated intense situational awareness. By utilizing vocal footprint isolation and directional mics, the agent adeptly ignored the background chatter of other English speakers in the room.

Ida 16:05

That's the crazy part to me.

Allan 16:06

It knew exactly which voices were part of the targeted conversation and dynamically filtered out the ambient noise.

Ida 16:11

It's essentially the babblefish from Hitchhiker's Guide to the Galaxy, but styled by a high-end Korean fashion house. And the glasses aren't just for passive observation. In another onstage demo, they showed a woman using the glasses to navigate the physical world and trigger digital actions simultaneously. She asks Gemini for walking directions to a local coffee shop.

Allan 16:33

And here's where the API integrations really shine. While she's walking, the glasses, which are wirelessly tethered to her phone, autonomously utilize deep links to open the DoorDash app sitting dormant in her pocket.

Ida 16:47

The AI navigates the hidden architecture of the phone app she isn't even looking at and places her usual order for a nitro cold brew so that the transaction is completed and the coffee is waiting on the counter the second she arrives.

Allan 17:00

It is the ultimate manifestation of their goal: removing the friction of interacting with both digital interfaces and physical environments. But of course, because this is a Silicon Valley tech demo, they couldn't resist throwing in something profoundly bizarre.

Ida 17:13

Oh, the blimp. I almost forgot about the blimp. So she gets her nitro cold brew, looks out at the live audience in the auditorium, and asks Gemini to use the nano banana image model.

Allan 17:23

Naturally.

Ida 17:24

The onboard cameras on the glasses take a photo of the crowd. The AI processes the image, hallucinates a massive cartoon blimp floating in the sky above the audience, and instantly sends that newly generated image to her paired smartwatch. Wait, it gets better. We finally have high-end Korean fashion glasses that can instantly translate Serbian, and we use them to hallucinate cartoon blimps. I love that this exists, but also why?

Allan 17:50

Because they possess the compute power to do it. And they want developers to know the image models can run with near zero latency based on live camera feeds.

Ida 17:58

I guess that makes sense.

Allan 17:59

But your question of why actually perfectly highlights the core tension of this entire deep dive. If we zoom

When Friction Disappears, What’s Left

Allan 18:05

out and synthesize the sheer scale of everything we've unpacked today, we have to circle back to that breathtaking number from the intro. 3.2 quadrillion tokens.

Ida 18:14

3.2 quadrillion units of thought, of logical deduction, of multi-agent coordination, churning away in the cloud every single month.

Allan 18:24

We are harnessing an unprecedented scale of computational physics to completely remove the friction from the most mundane, deeply human tasks. We are stepping into a paradigm where software engineers don't write code, the agents do.

Ida 18:39

Right.

Allan 18:39

The shopping cart catches your physical compatibility mistakes. Your glasses translate the world and buy your coffee before your brain even fully registers the desire.

Ida 18:49

It's an invisible, hyper competent infrastructure. We have unlocked the power to simulate the gravitational waves of binary black holes, and we are aggressively deploying it to figure out if our neighbor's kids allergy allows for peanut butter at the weekend lock party. It is gloriously, wonderfully absurd.

Allan 19:05

It is absurd, but it is undeniably beautiful in its utility. We are stepping away from the keyboard entirely and letting the machines negotiate with the machines.

Ida 19:13

Which leaves us with a lingering provocative thought for you to mull over as you go about your day. If the AI handles our scheduling, our shopping, our HOA bylaws, and our coffee runs, what happens to the social friction that actually forces us to connect?

Allan 19:28

That is the real question.

Ida 19:29

Think about it. The messy friction of navigating someone else's schedule, deciphering a confusing menu in a foreign language, or accidentally buying the wrong PC part and having to ask a friend for help? Those tiny points of friction are often where serendipity and human empathy happen.

Allan 19:45

Yeah, that's where life happens.

Ida 19:46

If machines are doing all the talking, all the organizing, and all the apologizing for us, do we risk letting our own capacity for grace and human-to-human relationship building atrophy? When the friction is gone, do we lose the spark that makes the interactions meaningful in the first place?

Allan 20:03

It is a profound question. If the agents are perfect, we lose the beautiful vulnerability of making mistakes together.

Ida 20:10

It's definitely a lot to think about. But for now, we'll let you get back to your frictionless reality. Hopefully, without any cartoon blimps blocking your view. Thanks for joining us on this deep dive tailored just for you. Until next time.

Allan

Host

Ida

Host