The Deepdive

Google I/O 26: Welcome to the Age of Glorified AI Interns

Allen & Ida Season 3 Episode 61

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 20:23

Google just spent an hour telling us AI agents will run our lives, and somehow it still sounds like a very overqualified intern with push notifications.

Read the companion article on https://medium.com/@allanandida

At Google I/O 26, Gemini 3.5, Omni and Spark were pitched as the start of an “agentic era” where background bots quietly schedule your day, rewrite your inbox and shop on your behalf while you do more “important” things. In this episode, we unpack what Google actually shipped – from Daily Brief, Universal Cart and voice‑driven Gmail to Android XR glasses – and ask whether these agents are true coworkers or just glorified task rabbits wrapped in trillion‑token branding. Along the way, we look at what this means for users, developers and the open AI ecosystem when one company decides its interns now live inside your calendar, browser and credit card.

Leave your thoughts in the comments and subscribe for more tech updates and reviews.

A Number That Breaks Your Brain

Ida

I want you to just um picture something for a second. You're standing there, maybe you've got your morning coffee in hand, and someone casually drops this number on you. 3.2 quadrillion.

Allan

I mean, it genuinely sounds like a number a little kid invents, right? Like when they're losing a playground argument. Well, I have 3.2 quadrillion invisible force fields.

Ida

Aaron Powell Yes. I thought the exact same thing, but this isn't a playground argument. This is, you know, Google CEO Sundar Pachai standing under the bright lights at the 2026 i slash O keynote. Oh yeah. And that number. 3.2 quadrillion is the number of data tokens that Google's AI models are processing every single month. And a token is essentially the fundamental building block of a problem being solved.

Allan

Aaron Powell We really have to put that in historical context to understand the sheer violence of that scale. Just 12 months prior, that number was 480 trillion.

Ida

Which we thought was huge.

Allan

Exactly. We thought 480 trillion was this incomprehensible deluge of computation. But we are looking at a seven-fold jump in a single year. It represents a physical infrastructure shift that is, frankly, difficult to even map in your head. The server farms, the cooling, the silicon required to churn through quadrillions of tokens asynchronously. It's a completely new paradigm of compute.

Ida

So we are drowning in computation. And the natural question you have to ask is, well, what are we actually doing with it?

Allan

Right. Are we solving the mysteries of the universe?

Ida

Aaron Powell I mean, during the keynote, they briefly touched on predicting category five hurricanes and, you know, modeling complex proteins. But when you look at the actual consumer demos for this new agentic Gemini era, it's completely different.

Allan

It really is.

Ida

We are primarily using this god-like frontier intelligence to email dog kennels and organize neighborhood block parties.

Allan

Aaron Powell The justiposition is wild. And that's exactly our mission for you on this deep dive today. We are exploring the architecture and the implications of this newly inaugurated agentic era.

Ida

Right.

Why Agentic AI Changes Everything

Allan

The core theme that emerged from every single demo is that Google has essentially decided human decision making, even the tiny micro decisions, is just too exhausting.

Ida

Yeah, we're just done making choices.

Allan

Exactly. They are building an infrastructure designed to let you completely outsource your cognitive load to the cloud. We are shifting from asking AI to generate text to letting AI execute complex multi-step actions on our behalf.

Ida

Okay, let's unpack this because if we are burning through quadrillions of tokens, the architecture here has to be fundamentally

Gemini Spark Lives In The Cloud

Ida

different than just like a chatbot on my phone.

Allan

Completely different, yeah.

Ida

And that brings us to the star of their show, Gemini Spark. They are billing it as a 24-7 personal AI agent. But what actually makes it agentic? I mean, why couldn't my phone's local processor just handle a to-do list?

Allan

Well, your phone's processor is brilliant for immediate localized tasks. But Spark doesn't live on your phone. It runs on dedicated virtual machines in Google Cloud.

Ida

Okay, so it's entirely off-device.

Allan

Right. And think of a virtual machine not as physical hardware, but as a simulated, self-contained computer running within Google's massive server farms. By decoupling the agent from your local hardware, it gains persistent statefulness.

Ida

Meaning it doesn't sleep just because my phone battery died?

Allan

Precisely. You can assign Spark a complex multi-day task, completely close your laptop, and just walk away. The virtual machine keeps the agent active in the background. That's wild. It's constantly monitoring data streams, waiting for conditions to be met, and executing actions asynchronously.

Ida

Which they demonstrated with perhaps the most painfully relatable modern chore in existence, planning a

The Block Party That Runs Itself

Ida

neighborhood block party.

Allan

Oh, that demo, yes.

Ida

In the live demo, the presenter essentially brainded a vague mandate onto Spark. They didn't code a workflow, they just told it to handle the party. And the AI autonomously spun up a live Google Sheet to track incoming RSVPs.

Allan

But it didn't just passively read the sheet. It used a mechanism called retrieval augmented generation to monitor incoming Gmail replies.

Ida

Right. It actually read the emails.

Allan

Yes. And it extracted the unstructured intent of the neighbor's email, whether they were a yes, a no, or a maybe if I could find a babysitter, and then it updated the structured rows and columns of the spreadsheet dynamically.

Ida

It then drafted polite reminder emails to the neighbors who hadn't responded yet. And it even generated a slide deck to hype up a bounce house for the kids.

Allan

The bounce house?

Ida

Yes. But the detail that absolutely broke me, the one that really highlights the reasoning capability, was the HOA rule.

Allan

The agent's situational awareness there was incredible.

Ida

Unbelievable. The AI autonomously dug into the user's Google Drive, searched through a massive folder of boring documents, found the homeowners association bylaws, and read them.

Allan

And then stopped them from breaking the rule.

Ida

Exactly. It actively alerted the user that they were legally prohibited from setting up the bounce house in the cul-de-sac before Friday afternoon.

Allan

What does this say about us as a society? We used to worry AI would steal our jobs, now we're begging it to navigate the passive aggressive politics of our homeowners association.

Ida

It's the ultimate fantasy of avoiding social friction. You are literally outsourcing the anxiety of neighborhood politics to a virtual machine.

Allan

It really is.

Ida

And Google is weaving this lack of friction deep into the operating system level, which we saw with the new

Voice Control Meets Messy Desktops

Ida

Gemini app for Mac. The voice command demo for the Mac integration completely changes how we interact with local files. The presenter is just looking at their messy desktop.

Allan

Like most of our desktops.

Ida

Right, exactly. They use their mouse to highlight a bunch of random unstructured files, a couple of PDFs, some images of vet invoices. They hold down a functional lee and just start talking naturally.

Allan

They asked Gemini to draft an introductory email to a dog kennel for their two dogs. One was named Hank.

Ida

And the other was named Lou Cinnamon, which is objectively a top-tier name for a dog.

Allan

Truly the best name.

Ida

But beyond the names, think about the mechanics of what the AI just did. It took static image files and messy PDFs, ran a localized vision model to extract the text, and used semantic reasoning to understand what was a vaccine date versus what was a random phone number.

Allan

And then it just built that table.

Ida

Yeah, it structured all of Hank and Louis Cinnamon's allergies into a perfectly formatted HTML table inside an outgoing email draft.

Allan

All in seconds. That process is completely eradicated.

Ida

It's just gone.

Allan

But to accomplish these seamless behind-the-scenes tasks on a larger scale, the AI has to do more than just read files. It is fundamentally changing the fabric of the internet itself.

Ida

Oh, for sure.

Allan

We are moving away from finding information to demanding the web recompile itself

Search Stops Linking And Starts Building

Allan

to serve our immediate needs.

Ida

This was the moment in the keynote where I realized the internet as we know it, you know, the classic 10 blue links on a search page is dead.

Allan

Completely dead.

Ida

The traditional search box is being replaced by an intelligent search box. You aren't just typing text anymore, you are dropping in files, videos, or even open Chrome tabs as direct inputs.

Allan

Google is calling the underlying engine for this anti-gravity 2.0. It's a developer tool powered by their Gemini 3.5 flash model, and it introduces agenc coding directly into the consumer search experience.

Ida

So it's coding for you.

Allan

Exactly. When you ask a complex question now, the search engine doesn't go look for a webpage that holds the answer. It acts like a senior software engineer and literally builds a custom application, complete with a user interface right there in the results.

Ida

Aaron Powell The Astrophysics example they showed perfectly illustrates this. A user asks how binary black holes create gravitational waves.

Allan

Aaron Powell A pretty heavy question.

Ida

Yeah. In the old internet, you get a Wikipedia link. In the anti-gravity 2.0 internet, search dynamically codes and deploys a fully interactive, custom 3D simulation of black holes spiraling into each other. You have actual sliders to adjust the mass and the orbital separation.

Allan

Aaron Powell Let's just break down the technical marvel of what is happening under the hood there. The AI isn't pulling a pre-made widget from a library.

Ida

Right. It's building it from scratch.

Allan

The Gemini model is interpreting your prompt, planning a software architecture, writing the underlying physics logic, coding the front-end graphical interface, compiling it, and deploying it inside a secure sandbox container on your browser.

Ida

In milliseconds.

Allan

It is writing the spoke software for your highly specific question in milliseconds. It's unbelievable.

Ida

And it does this for deeply personal queries, too. Another demo showed someone planning a weekend trip for their family. Search utilized something called personal intelligence to securely parse data from the user's Gmail and calendar.

Allan

And then it coded a persistent mini app for the weekend.

Ida

Right. It wasn't a static itinerary, it was a custom dashboard tracking live restaurant reservations. It even integrated chess tutorials and activities because it deduced from earlier emails that the oldest kid was learning to play chess.

Allan

Building a stateful personalized application on the fly is impressive enough. But to truly prove the raw, unadulterated power of Anti-Gravity 2.0 and the Gemini 3.5 flash model, the Google engineering team did something completely unhinged during the developer portion of the keynote.

Ida

Oh, the absolute most absurd tech flex I have ever witnessed.

Allan

Just so over the top.

Ida

They wanted to prove how good these

Ninety-Three Agents Code An OS

Ida

multi-agent systems are at coding. So they unleashed 93 autonomous AI subagents and gave them a single prompt. Build a fully functional computer operating system entirely from scratch.

Allan

We need to be clear about the difficulty of that prompt. Building an operating system is an incredibly complex orchestration of memory management, file systems, kernel architecture, and hardware abstraction.

Ida

It's not a weekend project.

Allan

No, it usually takes a team of human engineers months, if not years.

Ida

And these 93 agents worked in parallel for 12 hours. The architecture of this is mind-blowing. Think of it like a highly specialized automated construction crew.

Allan

Oh, that's a good way to put it.

Ida

You have an architect agent designing the kernel, a plumber agent handling the memory allocation, and a supervisor agent checking for merge conflicts in the code. They are constantly talking to each other, writing code, testing it, failing, and iterating.

Allan

And the numbers are staggering.

Ida

They processed 2.6 billion tokens in that 12-hour window. They wrote every single line of code, and the total compute cost was less than $1,000 in API credits.

Allan

And the ultimate payoff for this monumental feat of multi-agent software engineering.

Ida

They booted up this bespoke AI-generated operating system live on stage just to play the classic 1993 video game Doom. This is simultaneously impressive and completely ridiculous. It's like asking a librarian to help you find a book, and instead, they build a custom printing press in front of you.

Allan

It is the ultimate flex of raw capability.

Ida

It proves that you essentially drop items into this persistent cart from wherever you happen to be browsing.

A Universal Cart With Real Reasoning

Ida

Once an item is in there, the cart becomes this tireless background bargain hunter. It never sleeps. Exactly. It continuously queries merchant APIs to track price histories, monitor for restocks, and hunt for flash deals.

Allan

Aaron Powell But the critical evolution here is that the cart acts as a highly analytical financial chaperone. It possesses semantic reasoning about the physical properties of the products themselves.

Ida

The PC building example they used to demonstrate this was brilliant. Let's say you are building a custom gaming rig. You add a high-end processor to your universal cart from a tech blog. Okay. But three days ago, you had added a specific motherboard from a YouTube review. The AI cart proactively flags a warning. It recognizes that the processor requires a different physical socket type than the motherboard you already selected.

Allan

That is so helpful.

Ida

It stops you from buying incompatible parts and dynamically suggests alternatives that fit your exact bill.

Allan

It understands the spatial and physical compatibility constraints of consumer goods. And to cross the final hurdle, making the actual purchasing completely seamless, Google introduced the agent payments protocol, or AP2.

Ida

The AP2 protocol is where we cross firmly into sci-fi territory. This protocol allows your AI agent to securely execute financial transactions on your behalf without you ever clicking a checkout button.

Allan

Which sounds terrifying.

Ida

My immediate thought was how do I know it won't just empty my bank account on random gadgets?

Allan

Because AP2 isn't just handing your credit card number to a chatbot. It utilizes cryptographic tokenized parameters. You set strict, immutable boundaries. For instance, you tell the agent you only want a specific brand of monitor and your hard spend limit is $400.

Ida

So it's locked in.

Allan

Right. The agent generates a single-use token bound by those exact smart contract-like constraints. When it finds a deal that matches, it initiates a handshake with the merchant processor. And if they try to overcharge, if the merchant tries to charge $401 or swap the brand, the transaction mathematically fails. It creates a tamper-proof digital paper trail that links your constraints, the merchant, and the payment processor so there are no disputes.

Ida

Okay, but here's the thing. The AI is smart enough to know you're buying the wrong PC parts, but presumably polite enough not to mention you don't actually need another gaming rig. How long until my AI is just negotiating with your AI to buy things neither of us wants?

Allan

It's a highly valid concern regarding induced demand. We are handing these systems an unprecedented level of agency over our wallets.

Ida

We really are.

Allan

But Google's core bet is that the sheer convenience, the complete removal of the physical exhaustion of cross-referencing, motherboard compatibility sheets, and hunting for discount codes will easily override our skepticism.

Ida

So far, this automated, frictionless existence has been trapped behind glass. We are looking at laptops,

Smart Glasses That Translate The World

Ida

phones, and desktop monitors. But Google wants to take this environmental intelligence and integrate it directly onto our faces, which brings us to the hardware: the new Android XR smart glasses.

Allan

Historically, smart glasses have faced a massive uphill battle regarding public perception.

Ida

Because nobody wants to walk down the street looking like a rogueborg drone with a glowing camera strapped to their temple. The cyborg aesthetic is deeply off-putting.

Allan

Google is hyper-aware of that stigma, which is why their hardware strategy this time is entirely dependent on partnerships with top-tier fashion eyewear brands. They handed the external chassis design over to Warby Parker and Gentlemonster.

Ida

So they actually look like high-end stylish fashion accessories. You wouldn't know there were tech devices at a glance. Not at all. And to further reduce the friction of adoption, the initial rollout this fall only features audio glasses. There is no visual display, no augmented reality holograms projecting into your retinas.

Allan

They feature high-resolution onboard cameras so the multimodal AI can process your visual environment, but the output is delivered via directional spatial audio.

Ida

It's just sound.

Allan

It is literally Gemini whispering directly into your ear, providing constant hands-free context about the world around you without ever demanding you look down at a screen.

Ida

The live, hands-on test of the spatial audio was staggering. A journalist used the XR glasses to facilitate a highly complex three-way conversation in a crowded room. You had a Spanish speaker, a Serbian speaker, and an English speaker.

Allan

All talking at once.

Ida

Yeah. The glasses seamlessly captured the audio, processed the translation via the cloud, and whispered the real-time English translation into the journalist's ear.

Allan

The technical hurdle there isn't just translation, it's acoustic isolation. The AI demonstrated intense situational awareness. By utilizing vocal footprint isolation and directional mics, the agent adeptly ignored the background chatter of other English speakers in the room.

Ida

That's the crazy part to me.

Allan

It knew exactly which voices were part of the targeted conversation and dynamically filtered out the ambient noise.

Ida

It's essentially the babblefish from Hitchhiker's Guide to the Galaxy, but styled by a high-end Korean fashion house. And the glasses aren't just for passive observation. In another onstage demo, they showed a woman using the glasses to navigate the physical world and trigger digital actions simultaneously. She asks Gemini for walking directions to a local coffee shop.

Allan

And here's where the API integrations really shine. While she's walking, the glasses, which are wirelessly tethered to her phone, autonomously utilize deep links to open the DoorDash app sitting dormant in her pocket.

Ida

The AI navigates the hidden architecture of the phone app she isn't even looking at and places her usual order for a nitro cold brew so that the transaction is completed and the coffee is waiting on the counter the second she arrives.

Allan

It is the ultimate manifestation of their goal: removing the friction of interacting with both digital interfaces and physical environments. But of course, because this is a Silicon Valley tech demo, they couldn't resist throwing in something profoundly bizarre.

Ida

Oh, the blimp. I almost forgot about the blimp. So she gets her nitro cold brew, looks out at the live audience in the auditorium, and asks Gemini to use the nano banana image model.

Allan

Naturally.

Ida

The onboard cameras on the glasses take a photo of the crowd. The AI processes the image, hallucinates a massive cartoon blimp floating in the sky above the audience, and instantly sends that newly generated image to her paired smartwatch. Wait, it gets better. We finally have high-end Korean fashion glasses that can instantly translate Serbian, and we use them to hallucinate cartoon blimps. I love that this exists, but also why?

Allan

Because they possess the compute power to do it. And they want developers to know the image models can run with near zero latency based on live camera feeds.

Ida

I guess that makes sense.

Allan

But your question of why actually perfectly highlights the core tension of this entire deep dive. If we zoom

When Friction Disappears, What’s Left

Allan

out and synthesize the sheer scale of everything we've unpacked today, we have to circle back to that breathtaking number from the intro. 3.2 quadrillion tokens.

Ida

3.2 quadrillion units of thought, of logical deduction, of multi-agent coordination, churning away in the cloud every single month.

Allan

We are harnessing an unprecedented scale of computational physics to completely remove the friction from the most mundane, deeply human tasks. We are stepping into a paradigm where software engineers don't write code, the agents do.

Ida

Right.

Allan

The shopping cart catches your physical compatibility mistakes. Your glasses translate the world and buy your coffee before your brain even fully registers the desire.

Ida

It's an invisible, hyper competent infrastructure. We have unlocked the power to simulate the gravitational waves of binary black holes, and we are aggressively deploying it to figure out if our neighbor's kids allergy allows for peanut butter at the weekend lock party. It is gloriously, wonderfully absurd.

Allan

It is absurd, but it is undeniably beautiful in its utility. We are stepping away from the keyboard entirely and letting the machines negotiate with the machines.

Ida

Which leaves us with a lingering provocative thought for you to mull over as you go about your day. If the AI handles our scheduling, our shopping, our HOA bylaws, and our coffee runs, what happens to the social friction that actually forces us to connect?

Allan

That is the real question.

Ida

Think about it. The messy friction of navigating someone else's schedule, deciphering a confusing menu in a foreign language, or accidentally buying the wrong PC part and having to ask a friend for help? Those tiny points of friction are often where serendipity and human empathy happen.

Allan

Yeah, that's where life happens.

Ida

If machines are doing all the talking, all the organizing, and all the apologizing for us, do we risk letting our own capacity for grace and human-to-human relationship building atrophy? When the friction is gone, do we lose the spark that makes the interactions meaningful in the first place?

Allan

It is a profound question. If the agents are perfect, we lose the beautiful vulnerability of making mistakes together.

Ida

It's definitely a lot to think about. But for now, we'll let you get back to your frictionless reality. Hopefully, without any cartoon blimps blocking your view. Thanks for joining us on this deep dive tailored just for you. Until next time.