Beyond the web

June 7 2024

How does AI shape the future of the browser?

One view is that we’ll have a browser “copilot” that fills out forms, summarizes page content, and remembers stuff we’ve seen before. Presumably, it would also be general purpose like ChatGPT or Claude. In all likelihood, the assistant would just be ChatGPT or Claude.

The browser is probably going to be a battleground for assistant products. There’s no reason to go to chat.openai.com if there’s an equivalently useful assistant immediately present in your browser window. For this reason, I suspect OpenAI/Anthropic ends up building or buying up their own browser product (perhaps the one from NY). As I've written about before, this is Google's battle to lose.

In any case, I find the vision of a built-in browser assistant pretty underwhelming. It’s fine and good, and I’ll use the heck out of it. But I can’t shake the feeling that we have the potential to build something significantly better than this. Something that goes beyond chatbots, semantic search, and browser RPA.

Generative web

I think we should reimagine the web browser as a tool for generating and transforming web content.

In the last few months, I've seen a few projects that hint at this future.

Websim uses the conceit of a browser UI to inspire users to browse the web as it’s imagined by Claude.
Perplexity has launched Pages which allows anyone to create Wikipedia-style guides. Reminiscent of Hrishi Olickel’s work on Lumentis.
Max Krieger’s work on Rabbitholes:
introducing delve: a ChatGPT interface for going down rabbit holes

👉 delve . a9 . io pic.twitter.com/yVC5T1cyCH
— Max Krieger (@maxkriegers) May 26, 2024

Each of these shows the promise of generated web content. In the last paradigm of the web, the fundamental challenge was finding content that users wanted. In the new paradigm, users don’t need to search for content, they manifest it. Sometimes that generated content is extracted purely from the model; other times, the model remixes and transforms what already exists into superior form.

What I find most fascinating is the use of generative web content to satisfy complex reasoning tasks. These workflows follow a consistent grammar. Drawing on a particular body of content (a set of documents, search results, or web history) and a task description, an assistant generates rich web content.

Some examples of the sorts of interactions I have in mind:

Convert a PDF into a slide deck efficiently summarizing the main ideas, a set of search results into a spreadsheet, etc.
Generate a custom catalog based on the backpacks I’ve been looking at recently.
Generate a document that outlines the key ideas of the articles I’ve read recently.
What are all the Arxiv papers I’ve looked at in the past week? Can you organize them by category?

A key attribute of these workflows is that they’re iterative. The user can perform a single operation (“what are all the backpacks I’ve looked at”) and then iterate to what they want (“remove all backpacks over $200” and “generate a catalog”). This could happen all in a single request or in sequence.

Beyond the web

The more we bake AI-superpowers into the browser, the more likely it becomes that the browser eats up workflows that have nothing to do with the web. A browser that is good at viewing, transforming, and generating various forms of content (hypertext among others) would be a general-purpose application. We’d use it for traditional web content, but we’d also use it for reading ebooks and PDFs or consuming various forms of media living on our local filesystem, etc. The generative browser is really just an AI workspace.

I wonder if incumbent browsers will be able to pursue this vision. One argument against incumbents is their attachment to existing UX patterns. It’s conceivable that there’s a sort of UX counterpositioning; the incumbents can’t properly embrace these new capabilities without making radical UX changes that would alienate their existing audience. The radical transformation required to turn the browser into an AI workspace may be the kind of bold product bet that incumbents can’t afford to make.

Notes

[1] Rabbitholes are a spiritual successor to hyperlinks. Like hyperlinks, rabbitholes connect documents to other documents. But while hyperlinks are predefined connections to documents that already exist, rabbitholes are defined by the end-user at runtime, anchored to any arbitrary selection of content, and prompt the model to generate content on-the-fly.

[2] There are several other capabilities I'm excited by (yes, agent-oriented automation is one of them). One that's worth mentioning here is the possibility of built-in content recommendations. In the late 90s, Netscape experimented with the concept of “smart browsing”. The idea was that the browser would intelligently suggest other content the user might be interested in. This idea is worth revisiting in light of modern capabilities. Semantic search turns content into query. An AI workspace should draw implicit connections between what the user is looking at and other documents, whether private or publicly accessible from the web.

https://x.com/pzakin/status/1767992558284845239

https://x.com/pzakin/status/1770556461506236722

https://x.com/pzakin/status/1758274746159124876

https://pzakin.substack.com/p/chatgpt-google-and-the-war-for-the

https://www.kunle.app/august-2022-networked-notebooks.html

https://www.geoffreylitt.com/2023/03/25/llm-end-user-programming

Beyond the web

Generative web

Beyond the web

Notes

Related