“At the IETF 124 meeting in Montréal, I enjoyed quality time in a very small, very crowded side room filled with an unusually diverse mix of people: browser architects, policy specialists, working group chairs, privacy researchers, content creators, and assorted observers who simply care about the future of the web.”
The session, titled Preserving the Open Web, was convened by Mark Nottingham and David Schinazi—people you should follow if you want to understand how technical and policy communities make sense of the Internet’s future.
A week later, at the W3C TPAC meetings in Kobe, Japan, I ended up in an almost identical conversation, hosted again by Mark and David, fortunately a somewhat larger room. That discussion brought in new faces, new community norms, and a different governance culture, but in both discussions, we almost immediately fell to the question:
What exactly are we trying to preserve when we talk about “the open web”?
For that matter, what is the open web? The phrase appears everywhere—policy documents, standards charters, conference talks, hallway discussions—yet when communities sit down to define it, they get stuck. In Montréal and Kobe, the lack of a shared definition proved to be a practical obstacle. New specifications are being written, new automation patterns are emerging, new economic pressures are forming, and without clarity about what “open” means, even identifying the problem becomes difficult.
This confusion isn’t new. The web has been wrestling with the meaning of “open” for decades.
A Digital Identity Digest
Robots, Humans, and the Edges of the Open Web
Play Episode
Pause Episode
Mute/Unmute Episode
Rewind 10 Seconds
1x
Fast Forward 30 seconds
00:00
/
00:18:20
Subscribe
Share
Amazon
Apple Podcasts
CastBox
Listen Notes
Overcast
Pandora
Player.fm
PocketCasts
Podbean
RSS
Spotify
TuneIn
YouTube
iHeartRadio
RSS Feed
Share
Link
Embed
You can Subscribe and Listen to the Podcast on Apple Podcasts, or wherever you listen to Podcasts.
And be sure to leave me a Rating and Review!
How the web began, and why openness mattered
The earliest version of the web was profoundly human in scale and intent. Individuals wrote HTML pages, uploaded them to servers, and connected them with links. Publishing was permissionless. Reading was unrestricted. No identity was required. No subscription was expected. You didn’t need anyone’s approval to build a new tool, host a new site, or modify your browser.
The Web Foundation’s history of the web describes this period as a deliberate act of democratization. Tim Berners-Lee’s original design was intentionally simple, intentionally interoperable, and intentionally open-ended. Anyone should be able to create. Anyone should be able to link. Anyone should be able to access information using tools of their own choosing.
That was the first meaning of “open web”: a world where humans could publish, and humans could read, without needing to ask permission.
Then the robots arrived.
Robots.txt and the first negotiation with machines
In 1994, Martijn Koster created a lightweight mechanism for telling automated crawlers where they should and shouldn’t go: robots.txt. It was never a law; it was a social protocol. A well-behaved robot would read the file and obey it. A rogue one would ignore it and reveal itself by doing so.
That tiny file represented the web’s first attempt to articulate boundaries to non-human agents. It formalized a basic idea: openness for humans does not automatically imply openness for robots.
Even back then, openness carried nuance, and it was definitely not the last time the web community tried to define it.
The question keeps returning
One of the earliest modern attempts that I found to define the open web came in 2010, when Tantek Çelik wrote a piece simply titled “What is the Open Web?”. His framing emphasized characteristics rather than purity tests: anyone can publish; anyone can read; anyone can build tools; anyone can link; and interoperability creates more value than enclosure. It is striking how relevant those ideas remain, fifteen years later. These debates aren’t symptoms of crisis; they’re part of the web’s DNA.
The web has always needed periodic recalibration. It has always relied on communities negotiating what matters as technology, economics, and usage patterns change around them. As innovation happens, needs, wants, and desires for the web change.
And now, automation has forced us into another round of recalibration.
Automation came faster than consensus
The modern successors to robots.txt are now emerging in the IETF. One current effort, the AI Preferences Working Group (aipref) aims to provide a structured way for websites to express their preferences around AI training and automated data reuse. It’s an updated version of the old robots.txt promise: here is what we intend; please respect it.
But the scale is different. Search crawlers indexed pages so humans could find them. AI crawlers ingest pages so models can incorporate them. The stakes—legal, economic, creative, and infrastructural—are much higher.
Another effort, the newly chartered WebBotAuth Working Group (webbotauth), attempted to tackle the question of whether and how bots should authenticate themselves. The first meeting at IETF 124 made clear how tangled this space has become. Participants disagreed on what kinds of bots should be differentiated, what behavior should be encouraged or discouraged, and whether authentication is even a meaningful tool for managing the diversity of actors involved. The conversation grew complex (and heated) enough that the chairs questioned whether the group had been chartered before there was sufficient consensus to proceed.
None of this represents failure. It represents something more fundamental:
We do not share a common mental model of what “open” should mean in a web increasingly mediated by automated agents.
And this lack of clarity surfaced again—vividly—one week later in Kobe.
What the TPAC meeting added to the picture
The TPAC session began with a sentiment familiar to anyone who has been online for a while: one of the great gifts of the web is that it democratized information. Anyone could learn. Anyone could publish. Anyone could discover.
But then came the question that cut to the heart of the matter: Are we still living that reality today?
Participants pointed to shifts that complicate the old assumptions—paywalls, subscription bundles, identity gates, regional restrictions, content mediation, and, increasingly, AI agents that read but do not credit or compensate. Some sites once built for humans now pivot toward serving data brokers and automated extractors as their primary “audience.” Others, in response, block AI crawlers entirely. New economic pressures lead to new incentives, sometimes at odds with the early vision of openness.
From that starting point, several deeper themes emerged.
Openness is not, and has never been, binary
One of the most constructive insights from the TPAC discussion was the idea that “open web” should not be treated as a binary distinction. It’s a spectrum with many dimensions: price, friction, format, identity requirements, device accessibility, geographic availability, and more. Moving an article behind a paywall reduces openness in one dimension but doesn’t necessarily negate it in others. Requiring an email address adds friction but might preserve other characteristics of openness.
Trying to force the entire concept into a single yes/no definition obscures more than it reveals. It also leads to unproductive arguments, as different communities emphasize different attributes.
Recognizing openness as a spectrum helps explain why reaching consensus is so hard and why it may be unnecessary.
Motivations for publishing matter more than we think
Another thread that ran through the TPAC meeting was the simple observation that people publish content for very different reasons. Some publish for reputation, some to support a community, some for revenue, and some because knowledge-sharing feels inherently worthwhile. Those motivations shape how creators think about openness.
AI complicates this because it changes the relationship between creator intention and audience experience. If the audience receives information exclusively through AI summaries, the creator’s intended context or narrative can vanish. An article written to persuade, illuminate, or provoke thought may be reduced to a neutral paragraph. A tutorial crafted to help a community may be absorbed into a model with no attribution or path back to the original.
This isn’t just a business problem. It’s a meaning problem. And it affects how people think about openness.
The web is a commons, and commons require boundaries
At TPAC, someone invoked Eleanor Ostrom’s research on commons governance (here’s a video if you’re not familiar with that work): a healthy commons always has boundaries. Not barriers that prevent participation, but boundaries that help define acceptable use and promote sustainability.
That framing resonated well with many in the room. It helped reconcile something that often feels contradictory: promoting openness while still respecting limits. The original web was open because it was simple, not because it was boundary-free. Sharing norms emerged, often informally, that enabled sustainable growth.
AI-Pref and WebBotAuth are modern attempts to articulate boundaries appropriate for an era of large-scale automation. They are not restrictions on openness; they are acknowledgments that openness without norms is not sustainable. Now we just need to figure out what the new norms are in this brave new world.
We’re debating in the absence of shared data
Despite strong opinions across both meetings, participants kept returning to a sobering point: we don’t actually know how open the web is today. We lack consistent, shared metrics. We cannot easily measure the reach of automated agents, the compliance rates for directives, or the accessibility of content across regions and devices.
Chrome’s CRUX dataset, Cloudflare Radar, Common Crawl, and other sources offer partial insights, but there is no coherent measurement framework. This makes it difficult to evaluate whether openness is expanding, contracting, or simply changing form.
Without data, standards communities are arguing from instinct. And instinct is not enough for the scale of decisions now at stake.
Tradeoffs shape the web’s future
Another candid recognition from TPAC was that the web’s standards bodies cannot mandate behavior. They cannot force AI crawlers to comply. They cannot dictate which business models will succeed. They cannot enforce universal client behavior or constrain every browser’s design.
In other words: governance of the open web has always been voluntary, distributed, and rooted in negotiation.
The most meaningful contribution these communities can make is not to define one perfect answer, but to design spaces where tradeoffs are legible and navigable: spaces where different actors—creators, users, agents, platforms, governments—can negotiate outcomes without collapsing the web’s interoperability.
Toward a set of OpenWeb Values
Given the diversity of use cases, business models, motivations, and technical architectures involved, the chances of arriving at a single definition of “open web” are slim. But what Montréal and Kobe made clear is that communities might agree on values, even when they cannot agree on definitions.
Two small sections in this post lend themselves to concise lists without compromising narrative flow:
The values that surfaced repeatedly included:
Access, understood as meaningful availability rather than unrestricted availability.
Attribution, not only as a legal requirement but as a way of preserving the creator–audience relationship.
Consent, recognizing that creators need ways to express boundaries in an ecosystem increasingly mediated by automation.
Continuity, ensuring that the web remains socially, economically, and technically sustainable for both creators and readers.
These values echo what Tantek articulated in 2010 and what the Web Foundation cites in its historic framing of the web. They are principles rather than prescriptions, and they reflect the idea that openness is something we cultivate, not something we merely declare.
And in parallel, these values mirror the OpenStand Principles
—which articulated how open standards themselves should be developed. I wrote about this a few months ago in “Rethinking Digital Identity: What ARE Open Standards?” The fact that “open standard” means different things to different standards communities underscores that multiplicity of definitions does not invalidate a concept; it simply reflects the complexity of global ecosystems.
The same can be true for the open web. It doesn’t need a singular definition, but maybe it does needs a set of clear principles so we know what we are trying to protect.
Stewardship rather than preservation
This is why the phrase preserving the open web makes me a little uncomfortable. Preservation implies keeping something unchanged. But the web has never been static. It has always evolved through tension: between innovation and stability, between access and control, between human users and automated agents, between altruistic publication and economic incentive.
The Web Foundation’s history makes this clear, as does my own experience over the last thirty years (good gravy, has it really been that long?) The web survived because communities continued to adapt it. It grew because people kept showing up to argue, refine, and redesign. The conversations in Montréal and Kobe sit squarely in that tradition.
So perhaps the goal isn’t preservation at all. Perhaps it’s stewardship.
Stewardship acknowledges that the web has many purposes. It acknowledges that no single actor can dictate its direction. It acknowledges that openness requires boundaries, and boundaries require negotiation. And it acknowledges that tradeoffs are inevitable—but that shared values can guide how we navigate them.
Mark and David’s side meetings exist because a community still cares enough to have these conversations. The contentious first meeting of WebBotAuth was not a setback; it was a reminder of how difficult and necessary this work is. The TPAC discussions reinforced that, even in moments of disagreement, people are committed to understanding what should matter next.
If that isn’t the definition of an open web, it may be the best evidence we have that the open web still exists.
To Be Continued
The question “What is the open web?” is older than it appears. It surfaced in 1994 with robots.txt. It resurfaced in 2010 in Tantek’s writing. It has re-emerged now in the era of AI and large-scale automation. And it will likely surface again.
The real work is identifying what we value—access, attribution, consent, continuity—and ensuring that the next generation of tools, standards, and norms keeps those values alive.
If the conversations in Montréal and Kobe are any indication, people still care enough to argue, refine, and rebuild. And perhaps that, more than anything, is what will keep the web open.
If you’d like to read the notes that Mark and I took during the events, they are available here.
If you’d rather track the blog than the podcast, I have an option for you! Subscribe to get a notification when new blog posts go live. No spam, just announcements of new posts. [Subscribe here]
Transcript
00:00:30
Welcome back, everybody. This story begins in a very small, very crowded side room at IETF Meeting 124 in Montreal. It happened fairly recently, and it set the tone for a surprisingly deep conversation.
00:00:43
Picture this: browser architects, policy specialists, working group chairs, privacy researchers, content creators, and a handful of curious observers — all packed together, all invested in understanding where the web is headed.
00:00:57
The session was titled Preserving the Open Web. It was convened by Mark Nottingham and David Shkenazi — two people worth following if you want to understand how technical and policy perspectives meet to shape the future of the Internet.
00:01:10
A week later, at the W3C TPAC meeting in Kobe, Japan, I found myself in almost exactly the same conversation.
00:01:22
Once again, Mark and David convened the group to compare perspectives across different standards organizations.
00:01:28
They asked the same questions. The only real difference was the slightly larger room — and with it, new faces, new cultural norms, and a different governing style for the standards bodies.
00:01:41
But in both meetings, we landed almost immediately on the same question:
00:01:43
What exactly are we trying to preserve when we talk about the “open web”?
00:01:53
The phrase is everywhere. It appears in policy documents, standards charters, keynotes, and hallway conversations. Yet when you ask a room to define it, things get fuzzy very quickly. And that fuzziness isn’t academic — it matters.
00:02:36
Without clarity about what “open” means, identifying the actual problem becomes far more difficult as automation patterns shift and economic pressures evolve.
A Look Back
00:02:46
This isn’t a new dilemma. The web has been wrestling with the meaning of “open” for decades.
00:03:09
In the earliest days, everything about the web was profoundly human-scaled. People wrote HTML by hand. They published content to servers they controlled. They linked to one another freely. Publishing required no permission.
00:03:18
If you had an idea, a keyboard, a computer, and an Internet connection, you could build something.
00:03:26
The Web Foundation describes these early design choices as a deliberate act of democratization.
00:03:26–00:03:40
Tim Berners-Lee wanted anyone to create, anyone to link, and anyone to read — all using tools of their choosing. This spirit defined the earliest sense of an “open web”:
Humans publish
Humans read
Everything else stays out of the way
00:03:40
Dun dun dun.
00:03:42
Then the robots arrived.
Enter Automation
00:03:44
In 1994, Martin Koster proposed robots.txt, a simple file that told automated crawlers where they were and were not welcome.
00:04:14
It wasn’t a law. It was a social protocol. Well-behaved crawlers respected it. Bad actors ignored it and revealed themselves by doing so.
00:04:25
That tiny file introduced a big shift: openness for humans didn’t automatically mean openness for machines.
00:04:30
Even then, openness carried nuance.
Returning to Old Questions
00:04:35
When researching this post, one of the earliest attempts to define the open web I found was from Tantek Çelik in 2010.
00:05:07
His framing focused on characteristics — not purity tests:
Anyone can publish
Anyone can read
Anyone can build tools
Anyone can link
Interoperability creates more value than enclosure
00:05:16
Fifteen years later, it’s still uncannily relevant. And, amusingly, Tantek was in the room again at TPAC while we revisited the same conversation.
00:05:21
I can only imagine the déjà vu.
The Web Evolves
00:05:21–00:05:44
The web has always needed recalibration. As technology evolves, expectations shift — and so does the need to renegotiate what “open” should mean.
00:05:44
Automation and AI have pushed us into the next round of that negotiation.
00:05:51
Today’s successors to robots.txt are emerging in the IETF.
New Efforts in Standards Bodies
00:05:57
One is the AI Preferences Working Group, commonly known as AI-Pref.
00:06:03
They’re trying to create a structured way for websites to express preferences about AI training and automated data reuse.
00:06:11
Think of it as trying to define the language that might appear in a future robots-style file — same spirit, far higher stakes.
00:06:23
Why does this matter?
00:06:23–00:06:49
Traditional search crawlers index the web to help humans find things.
AI crawlers ingest the web so their models can absorb things.
This changes the stakes dramatically — legally, economically, creatively, and infrastructurally.
00:06:49
Another initiative in the IETF is the WebAuth Bot Working Group, which explores whether bots should authenticate themselves — and how.
00:06:59
Their early meetings focused on defining the scope of the problem. IETF124 was their first major gathering, and it highlighted how tangled this space really is.
00:07:29
With several hundred people in the room, discussions grew heated enough that the chairs questioned whether the group had been chartered prematurely.
00:07:42
Is that a failure? Maybe. Or maybe it reflects something deeper: we don’t share a mental model for what “open” should mean in a web mediated by automated agents.
Persistent Tensions
00:07:57
That same lack of clarity surfaced again at TPAC in Kobe.
00:08:08
The discussion began with a familiar sentiment: the web democratized information. It gave anyone with a computer and Internet access the ability to learn, publish, and discover.
00:08:35
But is that still true today?
Modern web realities include:
Paywalls
Subscription bundles
Identity gates
Regional restrictions
Content mediation
AI agents that read without attribution or compensation
00:08:35–00:09:07
Some sites now serve data brokers. Others try to block AI crawlers entirely. Economic pressures and incentives have shifted — not always in ways aligned with early ideals of openness.
Openness as a Spectrum
00:09:07
At TPAC, one of the most useful insights was this: the open web isn’t a switch. It’s not a binary. It’s a spectrum.
00:09:33
Different dimensions define openness:
Price
Friction
Format
Accessibility
Identity requirements
Device compatibility
Geography
00:09:39
A paywall changes openness on one axis but not all. An email gate adds friction but doesn’t automatically “close” content.
00:09:52
The binary mindset has never reflected reality — which explains why consensus is so elusive.
00:10:05
Seeing openness as a spectrum creates room for nuance and suggests that agreement may not even be necessary.
Motivations for Publishing
00:10:17
Another important thread: people publish for many reasons.
00:10:29–00:10:43
Some publish for reputation.
Some publish for community.
Some for income.
Some for the joy of sharing knowledge.
00:10:47
These motivations shape how people feel about openness.
00:10:58
AI complicates things. When readers experience your work only through an AI-generated summary, the context and tone you cared about may be lost.
00:10:58–00:11:20
A persuasive piece may be flattened into neutrality.
A community-oriented tutorial may be absorbed into a model with no link back to you.
This isn’t only an economic problem — it’s also a meaning problem.
Boundaries and the Commons
00:11:20
Elinor Ostrom’s work on governing shared resources came up. One of her core principles: shared resources need boundaries.
00:11:51
Not restrictive walls — but clear expectations for use. This framing resonated deeply and helped reconcile the tension between openness and limits.
00:12:01
The web has never been boundary-less. It worked because shared norms — formal and informal — made sustainable use possible.
00:12:06
AI-Pref and WebAuth Bot aren’t restrictions on openness. They’re attempts to articulate healthy boundaries for a new era.
00:12:14
But agreeing on those boundaries is the hard part.
The Measurement Gap
00:12:18
To understand the scale of the problem, we need measurement. But we can’t measure what we haven’t defined.
00:12:32
We lack shared metrics for openness. We don’t know:
How open the web currently is
How often automated agents obey directives
How frequently data is reused against publishers’ intentions
How accessibility shifts across devices and regions
00:12:47
Datasets like Chrome UX Report, Cloudflare Radar, and Common Crawl offer fragments — but no coherent measurement framework.
00:13:06
Without data, we argue from instinct rather than insight.
What Standards Bodies Can — and Cannot — Do
00:13:17
Another reality: standards bodies cannot mandate behavior.
00:13:34
They can’t force AI crawlers to respect preferences, dictate business models, control browser design, or enforce predictable client behavior.
00:13:45
Robots.txt has always been voluntary. It’s always been negotiation-based, not coercion-based.
00:13:56
The best contribution standards bodies can make is designing systems where trade-offs are visible and actors can negotiate without breaking interoperability.
00:14:02
It’s not glamorous work — but it’s necessary.
Shared Values
00:14:10
A single definition of the open web is unlikely.
00:14:17
But both Montreal and Kobe revealed alignment around a few core values:
00:14:21–00:14:46
Access — not unlimited, but meaningful
Attribution — to preserve the creator-audience relationship
Consent — to express boundaries in an automated ecosystem
Continuity — to ensure sustainability socially, economically, and technically
00:14:46–00:14:58
These values echo Tantek’s 2010 framing, the Web Foundation’s historical narrative, and the OpenStand principles.
00:14:58
They’re not definitions — they’re guideposts.
Stewardship, Not Preservation
00:15:09
This is why preserving the open web may not be the right phrasing. Preservation implies stasis. But the web has never been static.
00:15:37
It has always evolved through tension — innovation vs. stability, openness vs. control, humans vs. automation, altruism vs. economic pressure.
00:15:40
Thirty years. Good gravy.
00:15:40–00:16:10
The web endured because communities adapted, argued, refined, and rebuilt.
The Montreal and Kobe conversations fit squarely into that tradition.
So perhaps the goal isn’t preservation — it’s stewardship.
Stewardship acknowledges:
The web serves many purposes
No single actor controls it
Openness requires boundaries
Boundaries require negotiation
00:16:10
Trade-offs aren’t failures. They’re part of the ecosystem.
Looking Ahead
00:16:30
Mark and David’s side meetings exist because people care enough to do this work.
00:16:30–00:16:41
The contentious WebAuth Bot meeting wasn’t a setback — it was a reminder that the toughest conversations are the ones that matter.
00:16:41
TPAC showed that even without agreement, people are still trying to understand what comes next.
00:16:35–00:16:41
If that isn’t evidence that the open web still exists, I’m not sure what is.
00:16:41–00:17:18
This conversation is far from over.
It began with robots.txt in 1994.
It showed up in Tantek’s writing in 2010.
It’s appearing today in heated debates at standards meetings.
And it will almost certainly surface again — because the real work isn’t in defining “open.”
It’s in articulating what we value:
Access
Attribution
Consent
Continuity
00:17:18
And ensuring our tools and standards reflect those values.
Final Thoughts
00:17:18–00:17:26
If the discussions in Montreal and Kobe are any indication, people still care. They still show up. They still argue, revise, and rebuild.
00:17:26
And maybe that, more than anything else, is what will keep the web open.
00:17:30
Thanks for your time. Drop into the written post if you’re looking for the links mentioned today, and see you next week.
00:17:44
And that’s it for this week’s episode of The Digital Identity Digest. If this helped clarify things — or at least made them more interesting — share it with a friend or colleague and connect with me on LinkedIn @hlflanagan.
If you enjoyed the show, please subscribe and leave a rating or review on Apple Podcasts or wherever you listen. You can also find the written post at sphericalcowconsulting.com.
Stay curious, stay engaged, and let’s keep these conversations going.
The post Robots, Humans, and the Edges of the Open Web appeared first on Spherical Cow Consulting.