robots.txt, llms.txt, and the Future of Web Permissions
The web is gaining new permission signals. Here is how classic crawler rules and AI-era policy files can work together.
robots.txt remains important
The robots.txt file is simple, familiar, and easy for crawlers to discover. Its restraint is part of its value. But modern content use cases increasingly need more nuance than allow or disallow by path.
llms.txt points toward curated context
The llms.txt convention reflects a practical need: site owners want to tell language models where the most useful context lives. That can improve answers and reduce noisy crawling, especially for documentation-heavy sites.
The missing layer is consent history
Permission signals become more useful when they include intent, categories, policy ownership, and change history. The future likely includes multiple files and APIs working together, with consent policies acting as the source of truth.