Trending:
AI & Machine Learning

llms.txt: the proposed standard helping AI systems parse your website content

A new file format aims to solve how LLMs understand website content. Developed by data scientist Jeremy Howard with Anthropic, llms.txt provides AI systems with structured summaries of site content. The standard remains unproven, with no formal adoption from major LLM providers yet.

What it is

llms.txt is a proposed standard that sits in your root directory (like robots.txt) and helps large language models understand your site's content structure. The file uses Markdown to provide curated summaries of key pages, topics, and resources.

Developed by data scientist Jeremy Howard in collaboration with Anthropic, the format addresses a specific problem: LLMs struggle to parse modern websites loaded with JavaScript, ads, and complex navigation. They often surface outdated forum posts over canonical documentation.

The business case

With an estimated 7.5 billion monthly queries across ChatGPT, Claude, and Gemini, how your content appears in AI-generated answers now matters. llms.txt is part of what's being called generative engine optimization (GEO), the AI-era parallel to traditional SEO.

The format is simple: site description, key page links with descriptions, topic lists. Most implementations stay under 500 lines. Optional companion formats include llms-full.txt (complete site content in single Markdown file) and .md variants of HTML pages.

What's unclear

This is a proposed standard, not a formal one. No major LLM provider has publicly committed to prioritizing llms.txt in their crawling behavior. Unlike robots.txt, which search engines respect, llms.txt cannot enforce compliance. AI systems may crawl beyond its suggestions.

Adoption requires ongoing maintenance, creating another file to keep current alongside sitemaps and robots.txt. For resource-constrained teams, the ROI depends entirely on whether LLM providers actually use it.

Implementation

The barrier to entry is low: create the file, add your content hierarchy, deploy to root directory. Takes 15-20 minutes according to early implementers. The format supports standard Markdown with H1 titles, H2 sections, and bulleted lists.

Whether it becomes standard practice depends on what happens next. If OpenAI, Anthropic, and Google formally integrate llms.txt into their crawling pipelines, expect rapid adoption. Without that signal, this stays in the "interesting experiment" category.

Worth watching: whether major SaaS platforms and cloud providers implement it. That would indicate enterprise confidence in the standard's staying power.