The Deepdive
Join Allen and Ida as they dive deep into the world of tech, unpacking the latest trends, innovations, and disruptions in an engaging, thought-provoking conversation. Whether you’re a tech enthusiast or just curious about how technology shapes our world, The Deepdive is your go-to podcast for insightful analysis and passionate discussion.
Tune in for fresh perspectives, dynamic debates, and the tech talk you didn’t know you needed!
The Deepdive
Perplexity AI And The Hidden Data Pipeline
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
You type a sensitive question into an AI search box and feel the same relief as whispering into a private confessional. Now imagine learning that the “confessional” may be wired to the biggest ad networks on earth. That’s the unsettling thread we pull today as we unpack a series of major legal filings aimed at Perplexity AI, including privacy class actions, a copyright mega-suit that reaches across the generative AI industry, and Amazon’s federal injunction over autonomous browsing.
We walk through the core privacy allegations in plain language: tracking pixels, third-party analytics scripts, and forensic-style request logs that purportedly show chat text and AI responses leaving a user’s device. We also dig into the psychology of “incognito mode” and why a privacy toggle can feel protective while the underlying data architecture still routes information outward. Along the way, we ask what it means if intimate queries about money, health, relationships, or legal fears become raw material for targeted advertising profiles.
Then we shift to agentic AI with Perplexity’s Comet, where the stakes move from speech to action. Amazon’s injunction forces a sharp question: even if you give an AI agent your credentials and consent, can a platform still ban that agent and treat continued access as unauthorized under the Computer Fraud and Abuse Act? Finally, we connect the dots to the copyright wars, shadow libraries, BitTorrent downloads, stealth crawlers, and retrieval augmented generation, all pointing to a single pattern: boundary-breaking data acquisition as the default fuel for AI capabilities.
If this raised your eyebrows, subscribe for more deep dives, share this with a friend who uses AI for sensitive questions, and leave a review. What’s your line, what should never be collected or automated by a chatbot?
Leave your thoughts in the comments and subscribe for more tech updates and reviews.
So uh I want you to imagine something for a second. Just picture you're walking into this like totally soundproof confessional booth.
IdaOkay, setting the scene. I like it.
AllanRight. It's totally dark, completely private, and you are there to ask your absolute deepest, most sensitive, maybe you know, slightly embarrassing questions.
IdaWe all have them.
AllanExactly. So you sit down, you whisper your secrets into the dark, and you get this immense sense of relief.
IdaSounds nice, honestly.
AllanIt does. But then uh you walk outside, you look up, and you realize that Booth had a hidden microphone wired directly to a giant glowing billboard in the middle of Times Square.
IdaOh no!
AllanYeah, broadcasting literally every word you just said, right next to a targeted ad for like whatever you just confessed to.
IdaIt is a genuinely terrifying visual.
AllanYeah.
IdaAnd while based on the massive stack of legal filings we are unpacking today, it is uncomfortably close to how the artificial intelligence industry is currently operating behind closed doors.
AllanWelcome to today's deep dive. So our mission today is to look at this whole series of recent, honestly bombshell lawsuits aimed squarely at major AI companies.
IdaBut with a very specific, intense focus on perplexity AI.
AllanYes, exactly. We are looking at a 135-page privacy class action lawsuit, a massive sweeping copyright infringement case, and uh a federal injunction from Amazon.
IdaAaron Powell Which is a heavy lineup, but our goal here isn't just to, you know, list off court cases like a textbook.
AllanRight. Nobody wants that. We want to look under the hood at the mechanics of these systems and understand how the AI sausage is actually made.
IdaAaron Powell And the contrast here is what makes this so fascinating to me because AI search engines and perplexity in particular have so aggressively pitched themselves to you, the user, as the sleek, clean alternative. Trevor Burrus, Jr.
AllanYeah, the privacy conscious anti-Google, basically.
IdaTrevor Burrus, Jr. Exactly. They sell this pristine interface. They sell the idea of a direct answer without the tracking. But uh the stakes we were talking about today aren't just about a technical glitch or a vaguely worded privacy policy.
AllanNo, it's way bigger than that.
IdaRight. We are looking at a fundamental shift in the digital economy where the raw material powering these multi-billion dollar valuations appears to be composed of our leaked anxieties.
AllanAnd massive automated heists of intellectual property. Which, you know, if you have ever sat alone in your room late at night and asked an AI a sensitive medical question.
IdaOr asked it to explain a complicated financial decision you were too embarrassed to ask a human about.
AllanYes. Then this deep dive directly impacts you. So we had to start with the privacy mirage, because this is the stuff that hits the user first.
IdaLet's look at the class action lawsuits. Specifically, there's Doe v Perplexity, which was filed recently by a man in Utah, and a preceding case, Meyer v Perplexity, out of California.
AllanAnd the core of these lawsuits revolves around the alleged use of deeply embedded tracking software.
IdaRight, which for anyone who works in digital marketing, the technology at the center of this really isn't new.
AllanNo, it's super common.
IdaExactly. The plaintiffs allege that the second you log on to Perplexity's interface, the platform is quietly executing tracking code, specifically things like the Facebook Pixel and Google Analytics directly on your device.
AllanSo uh for those who might not spend their days deep in ad tech, a pixel is essentially an invisible one-by-one image embedded in the website's code.
IdaYep. Just hiding there.
AllanAnd when your browser loads that invisible image, it triggers a server call to a third party. In this case, Meta or Google.
IdaAaron Powell, which is standard practice if you're a retail site trying to track if somebody bought a pair of shoes.
AllanSure. But finding one inside an AI search engine that literally markets itself as an anonymous alternative to Google. That's wild.
IdaIt really is.
AllanAnd the plaintiffs actually included these specific HTTP request logs in the lawsuit. They showed data strings like uh FBPPFB.1, followed by a timestamp and an ID. Trevor Burrus, Jr.
IdaThey documented the exact digital fingerprints of the data leaving the building, and the volume and specificity of the data in those request logs is where this goes from, you know, a standard privacy concern to a massive breach of trust.
AllanAaron Powell Because it's not just basic metadata, right?
IdaTrevor Burrus, Jr. No, not at all. It's not just what browser you are using or your IP address. The trackers are allegedly capturing the full text of your chat.
AllanWait, the full text.
IdaVerbatim. They are transmitting your exact search queries and the AI's direct responses straight to Meta and Google servers. And they're tying it all together with personal identifiers like your email address and your Facebook ID.
AllanWow. Which means your most sensitive, completely unfiltered thoughts are just flying out the back door.
IdaExactly.
AllanLike the Utah man in the lawsuit, John Doe, he was asking perplexity about his tax obligations, details about his family finances.
IdaI think there were questions about Roth IRA conversions and a potential cannabis investments in there, too.
AllanRight. And the filings mention other users asking about deeply personal medical symptoms, relationship advice, private legal questions.
IdaStuff you wouldn't want your closest friends to know, let alone the world's largest advertising brokers.
AllanYeah, it completely shatters the illusion of the AI as this impartial, isolated oracle. You're treating the machine like a private diary, but the data pipeline is treating you like a commodity.
IdaAnd Meta and Google can harvest that verbatim text, pair it with your real identity using that Facebook ID, and use it to build hyper-targeted, incredibly intimate advertising profiles.
AllanOkay, but here's the thing perplexity is trying to compete with Google. Like their entire brand identity, their pitch to investors is literally we are the Google killer.
IdaRight.
AllanSo why on earth would they be secretly hard-coding Google Analytics into their front end and feeding Meta, their own users, highly sensitive data? That seems like handing your entire playbook to the opposing team.
IdaWell, it comes down to the brutal reality of scaling a tech platform. This data isn't just digital oil, right? It's a topographical map of human vulnerability and user behavior.
AllanAh, I see where you're going with it.
IdaYeah. Perplexity, as a rapidly growing startup, needs enterprise grade analytics to understand how people are interacting with its site, where they drop off, and how to acquire new users cheaply.
AllanAnd Meta and Google provide the most powerful radar equipment in the world to do that.
IdaExactly. And they provide it essentially for free so long as they get a copy of the map in return.
AllanWow. So Perplexity gets the growth metrics it desperately needs to show investors, and Meta and Google get your secrets to fuel their ad networks.
Incognito Mode That Still Leaks
IdaThe cognitive dissonance is staggering. You accept perplexity's terms of service thinking you are in a walled garden, but these third-party trackers are just quietly slurping up the garden in the background.
AllanWait, it gets better. And by better, I mean significantly worse. Let's talk about incognito mode.
IdaOh boy, yes.
AllanBecause perplexity heavily markets this feature where you can toggle a switch to create anonymous threads that don't save to your history. It implicitly promises the user a safe space for those really sensitive questions.
IdaBut according to the technical forensic analysis presented in the lawsuit, the data transmission to Meta and Google happens even when users activate that incognito mode.
AllanYou've got to be kidding me.
IdaI wish I was. Toggling the switch on the user interface might stop the chat from showing up in your personal history log on the screen, but it allegedly doesn't stop the underlying code from executing those server calls to the third-party trackers.
AllanIt's like putting on a fake mustache to go to a bar, while simultaneously handing the bartender your real driver's license, your social security card, and your personal diary.
IdaThat is the perfect analogy. You feel completely hidden because you flipped a switch on the screen, but the data architecture underneath doesn't care about your fake mustache.
AllanNow, to be fair to the companies involved, Perplexity has stated they haven't been served with this newest March 2026 suit yet and can't verify the technical claims.
IdaRight. And Meta has pointed to a policy that explicitly forbids advertisers from sending them sensitive personal information.
AllanBut if the forensic logs in this lawsuit actually hold up in court, it exposes a massive contradiction between the marketing of AI privacy and the mechanical reality of how these platforms actually operate.
Amazon Injunction And AI Agents
IdaYeah, so perplexity is allegedly perfectly happy to leave your backdoor wide open to Meta and Google. But when it comes to other platforms' boundaries, they suddenly have a very different operational philosophy.
AllanWhich is a perfect segue because this brings us to the injunction from Amazon, where we move from perplexity leaking your secrets out to perplexity aggressively breaking into places it shouldn't go.
IdaRight. And this injunction from Amazon view perplexity is fascinating because it forces us to deal with the legal reality of agentic AI.
AllanYeah, for anyone who hasn't been following the absolute bleeding edge of the industry, we're moving past passive AI, you know, where you ask a question and it just generates a block of text.
IdaAaron Powell We are entering the era of agents. Perplexity has this AI-powered browser feature called Comet, which is designed to autonomously carry out tasks on your behalf across the web.
AllanAaron Powell So it doesn't just tell you you gotta buy something, it literally goes and tries to buy it for you.
IdaExactly. And Comet was allegedly accessing users' password-protected Amazon accounts to execute tasks.
AllanWait, how was it getting in?
IdaWell, it was disguising itself as a standard web browser, bypassing Amazon security measures, and navigating the internal architecture of accounts.
AllanAnd Perplexity's defense in court was essentially built on user consent, right? Yeah. They argued we are only doing this because the user explicitly told us to.
IdaYep. Their stance was the user handed us their login credentials and asked us to perform a specific task on their behalf. We have permission.
AllanAnd honestly, my first instinct is to completely agree with Perplexity's defense there.
IdaReally?
AllanYeah. Look, if I give my best friend the keys to my apartment to water my plans while I'm out of town, my landlord cannot call the police and have my friend arrested for trespassing.
IdaI mean, logically that makes sense.
AllanRight. I gave them explicit permission to be there. Why is a digital agent acting on my behalf treated any differently?
IdaWell, that is the exact tension U.S. District Judge Maxine M. Chesney had to untangle, and her ruling sets a massive precedent. She drew a sharp legal line between user permission and platform authorization.
AllanAaron Powell Okay, break that down for me.
IdaAaron Powell Yes, you gave Perplexity the digital keys to your Amazon account. But Amazon, acting as the landlord of the digital building, sent Perplexity a formal cease and desist letter. They essentially said, We do not care if our tenant invited you in, you, the automated robot, are explicitly banned from the premises.
AllanSo the platform's terms of service absolutely override my personal consent regarding my own account data.
IdaAaron Powell Under the Computer Fraud and Abuse Act or the CFAA, yes.
AllanThe CFAA. That's a federal anti-hacking law from like the 1980s, right?
IdaAaron Powell Exactly. Famously inspired in part by the movie War Games. And applying a 40-year-old law to an autonomous web agent is tricky, but the judge relied on precedent showing that once a platform explicitly revokes authorization, continued access becomes a federal offense. Nope. The platform owns the infrastructure. Amazon even proved a cognizable loss under the CFAA by showing they had to spend over$5,000 just deploying engineers to develop technical countermeasures to detect and block comets' stealthy disguised activity on their servers.
AllanWe are entering this completely bizarre legal gray area. These highly advanced AI agents are acting as our personal proxies, but legally they are essentially trespassing on corporate property to do our bidding.
IdaAnd the judge completely dismissed Perplexity's argument that stopping this behavior would stifle tech innovation.
AllanYeah, she ruled that the public interest in preventing unauthorized access to private computer systems massively outweighs a startup's desire to maintain a first mover advantage in the AI shopping space.
The Copyright Book Heist Pipeline
IdaWhich creates a fascinating contradiction. The AI industry wants frictionless access to everything to make these agents work. But the internet is fundamentally made of private, gated communities. And if we look at how these models got smart enough to act as agents in the first place, that complete disregard for digital boundaries is actually the foundational engineering principle of the entire industry.
AllanWhich brings us to the great AI book heist.
IdaThe heist, yes.
AllanWe have this sweeping copyright lawsuit filed against basically every major AI player by a group of prominent authors. One of the lead plaintiffs is John Kerry Roo.
IdaThe investigative journalist who famously wrote Bad Blood detailing the massive fraud at Theranos.
AllanYeah. And I love that this exists, but also why? Like if you are a tech company orchestrating a massive, legally dubious data sweep, maybe don't steal the likes work of the specific journalist who specializes in taking down billion-dollar tech frauds.
IdaStrategically, perhaps not the best target to agitate.
AllanSeriously.
IdaBut this lawsuit meticulously outlines how these models didn't just casually scrape publicly available blogs, they intentionally targeted shadow libraries to acquire massive volumes of copyrighted text.
AllanWe were talking about utilizing sites like Libgen, Z Library, and Bibliotic.
IdaThey ingested data sets with names like Books 3, which contains hundreds of thousands of pirated books.
AllanAnd the infrastructure of piracy is remarkably resilient, right? Like when the FBI seizes the domains for a site like Z Library, the pirate communities instantly spin up decentralized mirrors.
IdaLike the Pirate Library Mirror or PiliMe.
AllanWhich is such a funny name. Pili sounds like a cute startup, but it's basically a hydra-like repository of stolen intellectual property.
IdaExactly. And the mechanism of how tech companies acquired this data is crucial. The lawsuit details how they utilize BitTorrent to download these massive libraries.
AllanRight. And for anyone who remembers the early 2000s internet, when you torrent a file, you aren't just downloading it from one central server.
IdaNo, you are pulling a tiny encrypted piece of the file from thousands of other computers on the peer-to-peer network.
AllanBut the software is designed so that while you are downloading, which is called leaching, you are simultaneously uploading those pieces to other users, which is seeding.
IdaWhich means they weren't just passively downloading pirated books.
AllanThey were actively distributing them.
IdaYes. The mechanical act of acquiring the data meant these tech companies were actively distributing pirated materials back into the network.
AllanThat is wild.
IdaEvery step of the large language model training pipeline requires making unauthorized copies. You copy it to download it, you copy it to pre-process the text, you copy it to tokenize it, and you copy it again and again as you run it through the neural network to adjust the weights.
AllanSo the obvious question here is why risk this level of federal copyright exposure? Why not just stick to scraping Wikipedia, Reddit, and public domain text?
IdaBecause long form prose is the absolute gold standard for teaching an artificial intelligence how to think.
AllanOkay, how so?
IdaWell, if you want an AI to understand complex logical reasoning, how to structure a persuasive argument, how narrative flows across hundreds of pages, or how syntax and rhythm operate at a high level, you need books.
AllanYou need the structured thought of professional authors.
IdaExactly. The lawsuit actually quotes an anthropic co-founder, essentially admitting that to create a model with truly advanced generative capabilities, you cannot rely on internet chatter. You need the entire text of diverse professionally edited books.
AllanAnd instead of paying the creators for that immense value, the industry just took it.
IdaThey just took it.
AllanAnd the lengths they went to take it are incredible. There were these investigations by Cloudflare and Wired, which are cited in these discussions, and they found that perplexity was deploying what they call stealth crawlers.
IdaRight. So when you build a website, you can put up a digital do not enter sign called a robots.txt file.
AllanIt's like a standard internet protocol designed to politely tell automated web crawlers not to scrape your site.
IdaExactly. And Cloudflare's digital forensics team discovered that Perplexities crawlers were allegedly ignoring those protocols entirely.
AllanJust blowing right past the signs.
IdaBut even more aggressively, they were deploying undeclared automated agents that were actively spoofing their identity. The code was written to impersonate the Google Chrome browser used by a normal human.
AllanThey put the fake mustache on the robot.
IdaThey did.
AllanThey are writing code to dynamically change their user agent strings, routing their traffic through different IP addresses, and actively deceiving the host servers, all to sneak past firewalls and scrake content they don't want to pay for.
IdaAnd this leads to why the authors are so furious about the RAG system retrieval augmented generation.
AllanYeah, explain Rag, because this mechanism is crucial to understand here.
IdaRight, because it operates differently than just training the model. During training, the AI reads the book to learn how to speak, embedding the patterns into its weights.
AllanAaron Powell Okay, so that's the foundation.
IdaYes. But Rag allows the AI to act like it's taking an open book test. When a user asks a question, the AI searches a massive external vector database in real time, retrieves the specific information, and patches it directly into the answer it generates for you.
AllanAaron Powell Which means it's not just using Carrie Roo's book to learn grammar, it's using a pirated copy of his book as a reference library to bypass his sales.
IdaPrecisely. The complaint points out that when asked about Carrie Roo's work, perplexity could spit out highly specific chapter-by-chapter summaries, pulling the thematic sequencing directly from the pirated text it was querying in real time.
AllanSo it is functioning as a direct free substitute for the author's paid copyrighted work. That is just a staggering appropriation of value.
IdaIt really is. But you know, if we zoom out, this isn't just a perplexity problem. This aggressive data acquisition strategy is the original sin of the entire generative AI arms race.
AllanOh, for sure. The copyright suit names Anthropic, Google, Meta, and OpenAI. Literally every single major player is accused of making massive copies of pirated works to build their multi-billion dollar ecosystems.
IdaAnthropic allegedly used a dataset called the Pile. Google used the C4 dataset, Meta used Books3 for its Llama models, and OpenAI used LibGen for the GPT series.
AllanBut out of this entire mountain of legal filings, I have to say my absolute favorite detail, the cherry on top of this massive digital heist, is XAI's Grok model.
IdaOh, yes.
AllanGrok essentially took the witness stand and enthusiastically testified against its own creators. This is simultaneously impressive and completely ridiculous.
IdaIt is genuinely one of the most remarkable self-owns in recent tech history because Grok was specifically designed by Elon Musk's XAI to be rebellious, right? It's supposed to be anti-censorship, answering questions without the typical corporate guardrails that companies like OpenAI put on their models.
AllanAnd it turns out it was entirely too honest. The lawsuit details these chat logs where a user simply asks Grok how it manages to know the contents of so many obscure books.
IdaAnd what did it say?
AllanGrok candidly replies, I basically vacuumed up whatever was out there, and Libjan has been one of the biggest whatever was out there troves for years. It just straight up confesses to the crime.
IdaIt goes into so much detail. Grok told the user, When I seem to know obscure academic monographs out of print novels or textbooks that normally cost$200, there's a decent chance some of that knowledge traces back to files that originally lived on Libjan.
AllanUnbelievable.
IdaIt literally understood its own training provenance and cheerfully explained the mechanics of the shadow library to the user.
AllanOf course they did. They spent billions of dollars to build these super intelligent AI that is literally too smart and too structurally honest to keep its own creators' legal secrets.
IdaIt perfectly highlights the reckless breakneck speed of this entire industry. They deployed these stealth crawlers to scrape absolutely everything they could find, blindly vacuuming up the internet to win the capability arms race, just assuming they could litigate the consequences later.
AllanShoot first, ask permission in court later.
IdaExactly. And to be clear here, these stealth crawlers are completely ideology blind. Whether it's vacuuming up right-wing manifestos or left-wing political theory from these shadow libraries, the algorithm genuinely doesn't care.
AllanRight. It's completely impartial.
IdaYeah, it's not taking a stance, it's just blindly consuming massive amounts of data to build its vocabulary and predictive capabilities.
The Bigger Pattern And Liability
AllanThe technology has no morality, it just has an insatiable appetite. Which honestly brings us to the bigger picture. Let's zoom out. All of these lawsuits, the privacy breaches with the tracking pixels, the autonomous trespassing on Amazon servers, the massive copyright infringement via BitTorrent, they are all symptoms of the exact same underlying engineering philosophy. Move fast, break the digital boundaries, and take the data. But when you read through the stark realities of these legal filings, you see that the foundation of that promised utopia is currently built on a bedrock of pirated books, ignored digital property rights, and the surreptitious monetization of our most private queries.
IdaAbsolutely.
AllanSo what does this say about us as a society? I mean, we want the convenience so badly. We want the magic answer engine to solve our problems instantly.
IdaWe are so eager for the technology to work that we are willing to treat a corporate chatbot like a private confessional.
AllanCompletely forgetting that the priest on the other side of the screen is backed by the world's largest, most aggressive data brokers who are logging every single word.
IdaWe are actively treating our privacy and the financial livelihoods of human creators for the ability to get a five second summary of a book we didn't want to buy or an answer to a question we were too lazy to research ourselves.
AllanGenuinely unsettling thought to leave you with today. We've talked about how these AI agents are starting to take actions on our behalf using our credentials to navigate platforms like Amazon. And we've seen how they brazenly ignore the legal boundaries of the platforms they interact with. So what happens when the inevitable goes wrong?
IdaOh, that's a scary thought.
AllanIf an AI agent like Perplexity's Comet, operating under your name, using your passwords, and executing a prompt you gave it, bypasses a firewall, violates the Computer Fraud and Abuse Act, or accidentally commits financial fraud while trying to optimize a purchase for you.
IdaWho holds the legal liability?
AllanExactly. If the robot commits a federal crime because you asked it to water your digital plants, is the tech company responsible or are you the one going to jail?
IdaThat is the multi-billion dollar legal question that nobody in Silicon Valley seems to have an answer for yet.
AllanWe are building autonomous proxies without understanding the rules of engagement. So the next time you step into that digital confessional booth or hand over your passwords to a helpful AI agent, just remember to look up.
IdaAlways look up.
AllanBecause that Times Square billboard is always on, the microphone is definitely recording, and you might just be the one held responsible for whatever the machine decides to do next.