AI Readiness Checklist: 14 Things Every Website Needs in 2026

AI agents are already deciding who gets recommended, who gets cited, and who gets ignored. When someone asks ChatGPT for a tool recommendation, when Perplexity synthesizes an answer about your industry, when a Claude-powered agent does research on behalf of a buyer — those systems are making decisions based on signals your website either has or does not have.

The good news? Most of these signals are technical, implementable in an afternoon, and your competitors have not done them yet.

This is the complete 14-point checklist. Go through it item by item, check off what you have, and fix what you do not.

The Foundation Layer (Items 1–4)

These are the basics. If you are missing any of these, do them first — everything else builds on top.

1. llms.txt File

What it is: A plain-text Markdown file at yourdomain.com/llms.txt that acts as a structured directory for AI agents — telling them what your site is, who it is for, and where your most important content lives.

Why it matters: AI agents are noisy-signal-averse. A typical webpage has navigation, footers, cookie banners, and ads cluttering the actual content. llms.txt gives AI agents a clean, unambiguous map of your site.

What to check:

  • File exists at https://yourdomain.com/llms.txt
  • Served as text/plain (not HTML)
  • Contains an H1 with your site name
  • Contains a > blockquote description (the most-read section)
  • Links to your 10–20 most important pages with descriptions

2. robots.txt — Not Blocking AI Agents

What it is: The classic robots.txt file that tells crawlers what they can and cannot access.

Why it matters: Many sites have overly broad Disallow rules that accidentally block AI agent crawlers. Others have added blanket blocks for GPTBot or other AI crawlers without thinking through the consequences.

What to check:

  • robots.txt exists at yourdomain.com/robots.txt
  • You are not accidentally disallowing pages you want AI agents to see
  • You have made intentional decisions about which AI crawlers to allow

Common AI crawler user agents in 2026:

User-agent: GPTBot           # OpenAI
User-agent: ClaudeBot        # Anthropic
User-agent: PerplexityBot    # Perplexity
User-agent: GoogleBot        # Google (also used for AI Overviews)
User-agent: anthropic-ai     # Anthropic alternative
User-agent: cohere-ai        # Cohere

3. XML Sitemap (Up to Date)

What it is:A machine-readable map of all your public pages, submitted to search engines — and also used by AI crawlers to discover content.

Why it matters: AI agents that crawl the web often start from sitemaps. An outdated or missing sitemap means they might miss your best content entirely.

What to check:

  • Sitemap exists at yourdomain.com/sitemap.xml
  • Referenced in robots.txt
  • All important pages are included
  • <lastmod> dates are accurate
  • No 404 URLs included

4. Valid, Semantic HTML Structure

What it is:Using the correct HTML elements for their intended purpose — <nav> for navigation, <main> for main content, <article> for articles.

Why it matters: AI agents often parse HTML without rendering JavaScript. They rely on semantic markup to distinguish your actual content from navigation chrome. If everything is a <div>, they are guessing.

What to check:

  • Main page content is wrapped in <main>
  • Articles/posts use <article>
  • Navigation uses <nav>
  • Headings are hierarchical (one <h1>, then <h2>, then <h3>)
  • Lists use <ul> / <ol> — not divs styled to look like lists

The Structured Data Layer (Items 5–8)

Structured data is how you communicate what kind of thing your content is. AI agents use this extensively.

5. JSON-LD Structured Data — Organization

What it is: A JSON-LD block in your <head>that declares your organization's identity — name, URL, logo, social profiles, contact info.

Why it matters:When an AI agent is asked “who is [Company]?” or “what does [Company] do?”, this is the authoritative source it reaches for.

6. JSON-LD Structured Data — Product or SoftwareApplication

What it is:Schema markup that tells AI agents your product's name, category, pricing, and features in a machine-readable format.

Why it matters: AI shopping agents, recommendation engines, and research assistants specifically look for Product and SoftwareApplication schema to populate answers about what products are available and what they cost.

7. JSON-LD on Blog Posts — Article Schema

What it is: Structured data on each blog post declaring the author, publish date, headline, and content type.

Why it matters: When AI agents cite sources or pull content into answers, Article schema helps them attribute content correctly and assess freshness.

8. FAQ Schema on Key Pages

What it is: Structured FAQ markup that explicitly presents question-and-answer pairs from your content.

Why it matters: FAQ schema maps almost directly to how AI assistants respond to queries. A well-structured FAQ page often gets its content pulled verbatim into AI-generated answers.

The Discoverability Layer (Items 9–11)

9. OpenGraph Tags (og: meta tags)

What it is: Meta tags in your <head>that define how your page appears when shared or previewed — title, description, image, URL.

Why it matters: Many AI agents and browser tools use OG tags as a fallback when parsing page metadata. Missing or incorrect OG tags mean AI tools may pull wrong titles or descriptions for your pages.

10. Canonical URLs

What it is: A <link rel="canonical"> tag on every page that declares the “official” URL for that content.

Why it matters: Duplicate content confuses AI indexers just like it confuses Google. If your content is accessible at multiple URLs, canonical tags tell crawlers which version is authoritative.

11. Machine-Readable Pricing Page

What it is:A pricing page that uses clean semantic HTML, includes specific numbers, and has structured data — not just JavaScript-rendered cards with vague pricing language.

Why it matters:AI agents asked “how much does X cost?” look for pricing pages. If yours is JS-only, missing numbers, or uses language like “contact us for pricing” where you could be specific, you are invisible to AI price comparisons.

What to check:

  • Pricing is in plain HTML (not just JS-rendered)
  • Specific dollar amounts are present in the page text
  • Plan names and features are in readable list format
  • Has SoftwareApplication or PriceSpecification schema

The Content Quality Layer (Items 12–14)

12. Descriptive Image Alt Text

What it is: Meaningful alt attribute on every image that describes what the image shows.

Why it matters:Multi-modal AI agents increasingly “see” images on web pages. But even text-only AI crawlers use alt text as a signal about what is on the page. “screenshot.png” tells an AI nothing. “Screenshot of the Acme dashboard showing 3 active users with live cursor positions” is useful.

13. Clear, Unambiguous Page Titles and Meta Descriptions

What it is: Unique, descriptive <title> and <meta name="description"> tags on every page.

Why it matters:These are among the first signals AI agents read. Vague titles like “Home | Acme” or “Docs” leave AI crawlers to guess what the page is about. Specific titles help AI agents index and route your content correctly.

14. llms-full.txt (The Content Dump)

What it is: A companion to llms.txt that contains the full text of your most important content, pre-processed into clean Markdown.

Why it matters: Some AI systems prefer to ingest a single clean document over crawling dozens of pages. For documentation sites and content-heavy sites, llms-full.txt can dramatically increase the quality with which AI agents understand your content.

What to check:

  • File exists at yourdomain.com/llms-full.txt
  • Content is clean Markdown with no HTML cruft
  • Each section is clearly labeled with its source URL
  • Updated when major content changes

Your Score

Count how many items you checked off:

  • 14/14 — You are fully AI-ready. You are capturing traffic others are leaving on the table.
  • 10–13 — Strong foundation. A few afternoon fixes from fully optimized.
  • 6–9 — You are being partially understood. Fix the structured data items first.
  • 0–5 — Significant opportunity. Start with llms.txt and robots.txt today.

Don't want to audit manually?

AgentReady automates this entire checklist. Paste your URL, get a scored report in seconds, and see exactly which of these 14 items you are passing and failing — with specific fix instructions for each one. Free to scan. Takes 30 seconds.

Run Your Free AI Readiness Scan