Skip to main content

Understanding AI Bot Types (Retrieval, Indexer, Training, Uncategorized)

Definitions and guidance for the AI Crawler types you see in your Agent Traffic dashboard

Updated this week

Last updated: October 22, 2025

AI bots are the automated crawlers and retrievers used by platforms like ChatGPT, Google Gemini, Perplexity, and Meta AI to access web content. Knowing which types to allow or block ensures your brand’s content can be seen, cited, and used accurately in AI-driven answers.

Below is a breakdown of the four main bot types Scrunch monitors and how they affect AI visibility.


Quick Reference Table

Bot Type

Purpose

Common Examples

Recommended Action

Impact on AI Visibility

Retrieval

Fetches pages in real time when users ask questions.

ChatGPT-User, PerplexityBot, meta-externalagent

✅ Allow

Enables live citations and referral traffic.

Indexer

Crawls pages for AI search indexes.

OAI-SearchBot, Googlebot, Googlebot-extended

✅ Allow

Ensures long-term discoverability in AI search.

Training

Collects content for model training (not used in results).

GPTBot, ClaudeBot, CCBot

🚫 Safe to block

No effect on citations or visibility.

Uncategorized

Mixed-purpose or experimental bots.

Some Meta AI bots, impersonator bots

⚠️ Monitor

Variable; verify in Agent Traffic.

1. Retrieval Bots (Allow)

Purpose:
Fetch live content from your site in real time when users ask questions in AI chat or search tools.

Examples:
ChatGPT-User, PerplexityBot, meta-externalagent

Why it matters:
These bots power real-time citations and visibility in AI answers. When someone asks ChatGPT or Perplexity a question, these bots pull your site’s data instantly.

Recommendation:
Allow these bots in your robots.txt and firewall settings. They are essential for being cited and driving traffic from AI platforms.


2. Indexer Bots (Allow)

Purpose:
Crawl and store your pages periodically to build searchable indexes used by AI systems.

Examples:
OAI-SearchBot, Googlebot, Googlebot-extended

Why it matters:
Indexer bots ensure your pages appear in AI search layers and Google AI Overviews even when users don’t request them directly.

Recommendation:
Allow these bots to maintain long-term discoverability and AI visibility.


3. Training Bots (Optional / Safe to Block)

Purpose:
Collect large volumes of web data for training future AI models.

Examples:
GPTBot, ClaudeBot, CCBot

Why it matters:
These bots do not affect real-time visibility or citations. Blocking them will not prevent your brand from appearing in AI results.

Recommendation:
🚫 Safe to block. Allow only if you’re comfortable with your content being used for AI model training.


4. Uncategorized Bots (Monitor)

Purpose:
Bots with mixed or unclear functions—some perform retrieval and training simultaneously or may not follow public documentation.

Examples:
Certain Meta AI bots, impersonator bots, or experimental crawlers.

Why it matters:
Behavior varies by platform. Some may represent legitimate new AI agents; others may scrape content without attribution.

Recommendation:
⚠️ Monitor case by case using Scrunch’s Agent Traffic tab to confirm legitimacy before allowing or blocking.


Next Steps

  • Review your robots.txt and firewall settings to confirm retrieval and indexer bots are allowlisted.

  • Use the Scrunch Site Audit and Agent Traffic tabs to detect blocked or frequently crawled bots.

  • Visit our Guide to AI User Agents for full user agent strings and setup instructions.

Did this answer your question?