“Is this a bot?” has been a useful question for decades. But in the age of AI, it may no longer be the right one.
The recent tension between Cloudflare and Perplexity illustrates why. Cloudflare accused Perplexity of using undeclared crawlers to bypass robots.txt, while Perplexity argued their AI agents act on behalf of users, not as autonomous bots. If you need to catch up, these are the two main articles from Cloudflare and Perplexity’s rebuttal.
The difference lies in how each defines agency and the framing.
A Better Question: Who Is the Agent Acting For?
If an AI assistant retrieves content because I asked it to, and uses that content to produce an answer for me, it’s not acting independently or without oversight. It’s acting on my behalf. It’s an extension of me and my intent.
That distinction matters. Crawlers index. Scrapers harvest. A real-time AI agent responds to intent and may or may not be actively monitored or steered by a human.
The Old Model May Not Fit
Web infrastructure still treats interaction as binary: human or automated, browser or bot. AI agents don’t fit neatly into either category.
Many now:
- Act on explicit human intent
- Operate in real time
- Fetch targeted results rather than scrape broadly
- Even control browsers: clicking, navigating, and submitting forms
When an agent acts through my browser, what is the user agent?
We’ve all seen a robot typing on a keyboard in cartoons or movies, who is the user agent in those cases?
The User Agent Is Now a Layered Construct
Technically, a user agent is defined as “a computer program representing a person” which includes browsers, bots, and scrapers alike. But this definition raises a crucial question: which person is being represented?
MDN says:
A typical user agent string looks like this: “Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0”.
This tells us the technical details, Firefox 124.0 on Windows 10, but reveals nothing about intent or agency. That same string could represent me browsing manually, an AI agent acting on my behalf, or a scraper running autonomously.
The term user agent, once a simple browser identifier, now spans multiple layers. Each layer reveals different aspects of who or what is behind a web request:
Layer | Description |
---|---|
Surface | The technical User-Agent string (e.g., Chrome on macOS) |
Execution | What is controlling the interaction (a person, a script, an AI agent) |
Intent | Who initiated the action, and why |
Identity | Is the agent acting transparently? On whose behalf? Whose identity: the identity of the program, platform, the user? |
And this is just one framing. With an alternate framing, we might have different layers entirely.
A browser request no longer guarantees a person is behind it. Understanding these layers is key to understanding the interaction.
The Blurring Boundaries
Imagine I ask an AI assistant to help me navigate a website. It opens my browser, clicks buttons, fills forms, and gathers the specific detail I asked for.
The request came from my browser, using my session… but the agent did the work. Is that me? A bot? Or something in between?
Delegation Isn’t Always Harmless
The same mechanisms that make agents useful could also:
- Scrape prohibited content
- Overwhelm APIs, either with or without intention
- Interact with services in ways their owners haven’t planned for
Not all delegation is benign. This is just one reason why intent matters.
A Larger Tension
This dispute is one example of a bigger conflict:
AI companies need broad access to training data and real-time content. Website operators want to control how their content is accessed, used, and monetized. Users want AI agents that work effectively. Copyright, licensing, and trust all sit in the middle.
When people were just putting things online for anyone who could possibly access it, they weren’t planning for the exact future we are in now.
For smaller publishers, journalists, and independent creators, this isn’t abstract. The web is a complex ecosystem where content access, attribution, and agency are intertwined.
What This Means Going Forward
We’re applying rules to interaction models that now have more layers than they once did and will continue to evolve.
The question can’t be limited to whether something is a bot, when the question actually encompasses much more. It’s:
Who is the agent acting for? In what context? And how should that be revealed or concealed?
Do we need new standards for this? Or can existing frameworks adapt? Should attribution be mandatory, voluntary, or contextual?
The Cloudflare-Perplexity dispute won’t be the last. As AI agents become more capable, these attribution questions will only multiply.
An Invitation, Not a Verdict
This is a shift in framing:
- From binary labels to layered understanding
- From blanket suspicion to contextual interpretation
- From old enforcement models to new conversations about delegation and trust
This debate is just one of many we’re likely to see as technology continues to evolve. The question of agency attribution isn’t going away. How we answer it will shape what kinds of AI interactions we enable or prevent.