AI Readiness Checklist: 17 Things Every Website Needs in 2026
AI readinesschecklistllms.txtstructured dataMCP
March 19, 2026
#
AI Readiness Checklist: 14 Things Every Website Needs in 2026
AI agents are already deciding who gets recommended, who gets cited, and who gets
ignored. When someone asks ChatGPT for a tool recommendation, when Perplexity
synthesizes an answer about your industry, when a Claude-powered agent does research on
behalf of a buyer — those systems are making decisions based on signals your
website either has or does not have.
The good news? Most of these signals are technical, implementable in an afternoon, and
your competitors have not done them yet.
This is the complete 14-point checklist. Go through it item by item, check off what you
have, and fix what you do not.
##
The Foundation Layer (Items 1–4)
These are the basics. If you are missing any of these, do them first — everything
else builds on top.
###
1. llms.txt File
**What it is:** A plain-text Markdown file at
`yourdomain.com/llms.txt`
that acts as a structured directory for AI agents — telling them what your site
is, who it is for, and where your most important content lives.
**Why it matters:** AI agents are noisy-signal-averse. A
typical webpage has navigation, footers, cookie banners, and ads cluttering the actual
content.
`llms.txt` gives AI
agents a clean, unambiguous map of your site.
**What to check:**
- File exists at `https://yourdomain.com/llms.txt`
- Served as `text/plain` (not HTML)
- Contains an H1 with your site name
- Contains a `>` blockquote description (the most-read section)
- Links to your 10–20 most important pages with descriptions
###
2. robots.txt — Not Blocking AI Agents
**What it is:** The classic
`robots.txt` file that
tells crawlers what they can and cannot access.
**Why it matters:** Many sites have overly broad
`Disallow` rules that
accidentally block AI agent crawlers. Others have added blanket blocks for GPTBot or
other AI crawlers without thinking through the consequences.
**What to check:**
- `robots.txt` exists at `yourdomain.com/robots.txt`
- You are not accidentally disallowing pages you want AI agents to see
- You have made intentional decisions about which AI crawlers to allow
**Common AI crawler user agents in 2026:**
User-agent: GPTBot # OpenAI User-agent: ClaudeBot # Anthropic User-agent: PerplexityBot # Perplexity User-agent: GoogleBot # Google (also used for AI Overviews) User-agent: anthropic-ai # Anthropic alternative User-agent: cohere-ai # Cohere
###
3. XML Sitemap (Up to Date)
**What it is:** A machine-readable map of all your public
pages, submitted to search engines — and also used by AI crawlers to discover
content.
**Why it matters:** AI agents that crawl the web often start
from sitemaps. An outdated or missing sitemap means they might miss your best content
entirely.
**What to check:**
- Sitemap exists at `yourdomain.com/sitemap.xml`
- Referenced in `robots.txt`
- All important pages are included
- `<lastmod>` dates are accurate
- No 404 URLs included
###
4. Valid, Semantic HTML Structure
**What it is:** Using the correct HTML elements for their
intended purpose —
`<nav>` for
navigation,
`<main>` for main
content,
`<article>` for
articles.
**Why it matters:** AI agents often parse HTML without
rendering JavaScript. They rely on semantic markup to distinguish your actual content
from navigation chrome. If everything is a
`<div>`, they are
guessing.
**What to check:**
- Main page content is wrapped in `<main>`
- Articles/posts use `<article>`
- Navigation uses `<nav>`
- Headings are hierarchical (one `<h1>`, then `<h2>`, then `<h3>`)
- Lists use `<ul>` / `<ol>` — not divs styled to look like lists
##
The Structured Data Layer (Items 5–8)
Structured data is how you communicate what kind of thing your content is. AI agents use
this extensively.
###
5. JSON-LD Structured Data — Organization
**What it is:** A JSON-LD block in your
`<head>` that
declares your organization's identity — name, URL, logo, social profiles,
contact info.
**Why it matters:** When an AI agent is asked “who is
[Company]?” or “what does [Company] do?”, this is the authoritative
source it reaches for.
###
6. JSON-LD Structured Data — Product or SoftwareApplication
**What it is:** Schema markup that tells AI agents your
product's name, category, pricing, and features in a machine-readable format.
**Why it matters:** AI shopping agents, recommendation
engines, and research assistants specifically look for Product and
SoftwareApplication schema to populate answers about what products are available and
what they cost.
###
7. JSON-LD on Blog Posts — Article Schema
**What it is:** Structured data on each blog post declaring
the author, publish date, headline, and content type.
**Why it matters:** When AI agents cite sources or pull
content into answers, Article schema helps them attribute content correctly and assess
freshness.
###
8. FAQ Schema on Key Pages
**What it is:** Structured FAQ markup that explicitly
presents question-and-answer pairs from your content.
**Why it matters:** FAQ schema maps almost directly to how AI
assistants respond to queries. A well-structured FAQ page often gets its content pulled
verbatim into AI-generated answers.
##
The Discoverability Layer (Items 9–11)
###
9. OpenGraph Tags (og: meta tags)
**What it is:** Meta tags in your
`<head>` that
define how your page appears when shared or previewed — title, description,
image, URL.
**Why it matters:** Many AI agents and browser tools use OG
tags as a fallback when parsing page metadata. Missing or incorrect OG tags mean AI
tools may pull wrong titles or descriptions for your pages.
###
10. Canonical URLs
**What it is:** A
`<link rel="canonical">`
tag on every page that declares the “official” URL for that content.
**Why it matters:** Duplicate content confuses AI indexers
just like it confuses Google. If your content is accessible at multiple URLs, canonical
tags tell crawlers which version is authoritative.
###
11. Machine-Readable Pricing Page
**What it is:** A pricing page that uses clean semantic HTML,
includes specific numbers, and has structured data — not just JavaScript-rendered
cards with vague pricing language.
**Why it matters:** AI agents asked “how much does X
cost?” look for pricing pages. If yours is JS-only, missing numbers, or uses
language like “contact us for pricing” where you could be specific, you are
invisible to AI price comparisons.
**What to check:**
- Pricing is in plain HTML (not just JS-rendered)
- Specific dollar amounts are present in the page text
- Plan names and features are in readable list format
- Has SoftwareApplication or PriceSpecification schema
##
The Content Quality Layer (Items 12–14)
###
12. Descriptive Image Alt Text
**What it is:** Meaningful
`alt` attribute on
every image that describes what the image shows.
**Why it matters:** Multi-modal AI agents increasingly
“see” images on web pages. But even text-only AI crawlers use alt text as a
signal about what is on the page. “screenshot.png” tells an AI nothing.
“Screenshot of the Acme dashboard showing 3 active users with live cursor
positions” is useful.
###
13. Clear, Unambiguous Page Titles and Meta Descriptions
**What it is:** Unique, descriptive
`<title>` and
`<meta name="description">`
tags on every page.
**Why it matters:** These are among the first signals AI
agents read. Vague titles like “Home | Acme” or “Docs” leave
AI crawlers to guess what the page is about. Specific titles help AI agents index and
route your content correctly.
###
14. llms-full.txt (The Content Dump)
**What it is:** A companion to
`llms.txt` that contains
the full text of your most important content, pre-processed into clean Markdown.
**Why it matters:** Some AI systems prefer to ingest a single
clean document over crawling dozens of pages. For documentation sites and content-heavy
sites,
`llms-full.txt` can
dramatically increase the quality with which AI agents understand your content.
**What to check:**
- File exists at `yourdomain.com/llms-full.txt`
- Content is clean Markdown with no HTML cruft
- Each section is clearly labeled with its source URL
- Updated when major content changes
##
Your Score
Count how many items you checked off:
- **14/14** — You are fully AI-ready. You are capturing traffic others are leaving on the table.
- **10–13** — Strong foundation. A few afternoon fixes from fully optimized.
- **6–9** — You are being partially understood. Fix the structured data items first.
- **0–5** — Significant opportunity. Start with llms.txt and robots.txt today.
Don't want to audit manually?
AgentReady automates this entire checklist. Paste your URL, get a scored report in
seconds, and see exactly which of these 14 items you are passing and failing —
with specific fix instructions for each one. Free to scan. Takes 30 seconds.
[
Run Your Free AI Readiness Scan
](/)
[
← Back to Blog
](/blog)