Robots.txt Generator


Build a modern robots.txt file using standard User-agent, Allow, Disallow and Sitemap rules. Robots.txt is public guidance for crawlers, not a security control.

Site and Sitemap

Use one absolute sitemap URL per line. Sitemap directives are not tied to a single user-agent group.

Default Access

Optional. Google ignores Crawl-delay, but some crawlers may honor it.

Paths

One path per line. Paths should start with /. You can use * and $ for modern crawler matching.
Use this for files inside a disallowed folder that crawlers should still fetch.

Search Crawlers


AI Crawlers

Use this to express content-use preferences to crawlers that honor robots.txt. Server-side controls are still needed for enforcement.

Custom Groups

Do not put private URLs in robots.txt as a security measure. Anyone can read the file. Use authentication, noindex headers/tags, or server rules when access or indexing must be controlled.

About Robots.txt Generator

Build a flawless robots.txt that blocks search engines from confidential directories (like /staging/ or /internal-docs/) while still allowing crawling of important pages.
Real-world scenario: Your Dev site keeps getting indexed and showing up in search results, creating duplicate content issues. You generate a robots.txt that blocks /dev/ and disallows the GPTBot crawler. The dev site drops out of the index in 48 hours.
How to use it:

    Select which user-agents you want to apply rules to (Googlebot, Bingbot, all bots, etc.).

    Specify allowed directories and disallowed directories/paths.

    (Optional) Set a crawl delay.

    Click “Generate robots.txt”.

    Copy the output and save it as robots.txt in your website’s root folder.

FAQ

Q: What exactly does a robots.txt file do?
A: It’s a plain‑text file placed in your website’s root directory that tells search engine crawlers (and other well‑behaved bots) which parts of your site they are allowed to request and which parts they must ignore. It’s a “keep out” sign, not a locked door – it relies on voluntary compliance.

Q: Will a robots.txt file improve my SEO rankings?
A: Indirectly, yes. By blocking low‑value or duplicate pages (like staging sites, internal search results, or PDF print versions) you save crawl budget for your important pages. It doesn’t directly boost rankings, but it prevents crawlers wasting time on junk that could dilute your site’s perceived quality.

Q: Can I use robots.txt to hide private content?
A: No. Robots.txt is a public file. Anyone can view it by visiting yourdomain.com/robots.txt. It only asks polite bots not to crawl; malicious bots, users, or anyone with a direct link can still access the pages. For true privacy, use password protection or noindex meta tags.

Q: What’s the difference between Disallow and Noindex?
A: Disallow in robots.txt stops crawling, but the page can still appear in search results if it’s linked from elsewhere. Noindex (a meta tag or HTTP header) prevents the page from appearing in search results, even if it’s crawled. For pages already indexed that you want removed, you must allow crawling but add noindex – otherwise Googlebot can’t see the noindex tag because it’s blocked.

Q: Is one robots.txt enough for all search engines?
A: You can create rules for all bots using User-agent: *, but you can also target specific bots (like Googlebot, Bingbot, GPTBot) with separate blocks. This tool lets you select which user‑agents to apply rules to.

Q: What is a crawl delay and do I need it?
A: It tells bots to wait a certain number of seconds between page requests. Most major search engines (including Google) ignore crawl delay. It’s mainly useful for controlling aggressive bots from smaller engines or scrapers to avoid overloading your server.

Q: How do I know which directories to block?
A: Typical candidates are /wp-admin/, /staging/, /internal/, /search/, /tag/, /author/, /cdn-cgi/, /feed/, and anything with session IDs. If a folder doesn’t provide unique, valuable content to search users, consider blocking it.

Q: Can I test my robots.txt before uploading?
A: Use Google Search Console’s robots.txt Tester (under “Settings” → “robots.txt”) to test if specific URLs are blocked or allowed. Always test before you deploy a new file.

Q: Can this tool edit an existing robots.txt?
A: No. It generates a fresh file based on your inputs. You must manually merge it with your current rules if you’re updating. Always keep a backup of your existing file.


Detailed How‑to Guide

Step 1: Planning – Decide What to Block and Allow

Before you type anything, map out your site structure. Create two lists:

  • Directories/pages to Disallow: Staging sites, admin panels, internal search result pages, thank‑you pages, printer‑friendly versions, AJAX loader scripts, PDF archives, duplicate category filters, and any thin‑content sections that consume crawl budget.

  • Important directories to specifically Allow: If you block a broad folder but want to keep a specific sub‑folder crawlable, you’ll need an Allow rule. For example, block /docs/ but allow /docs/public/.

Also decide if you want different rules for different bots. For most sites, User-agent: * (all bots) is sufficient. Advanced users may want to block AI scrapers like GPTBot or CCBot while allowing Googlebot.

Step 2: Open the Tool

Navigate to https://webmastertools.seowolf.org/robots-txt-generator.

Step 3: Select User‑Agents

You’ll see a list of user‑agents (Googlebot, Bingbot, Yahoo Slurp, etc.) or an “All Bots” option.

  • If you want the same rules for everyone, select “All Bots” (*). This is recommended unless you have a specific reason to treat bots differently.

  • If you need separate blocks for different bots, you’ll need to generate multiple robots.txt blocks and combine them, or run the generator once per bot and merge the results. (The tool likely generates one set of rules per selected user‑agent.)

Step 4: Define Allowed Directories

In the “Allow” section, list the paths you want to explicitly permit – especially if a broader disallow might catch them. For each path, enter the relative URL path starting with /.
Example:

text
/public/
/assets/

Most sites don’t need many Allow rules; they are exceptions to Disallow rules.

Step 5: Define Disallowed Directories

In the “Disallow” section, enter the paths you want to block. Use one path per line. The most common patterns:

  • Whole directory: /wp-admin/ (blocks everything inside wp-admin)

  • Specific file: /secret-page.html

  • All files with a parameter: /*?* (blocks all URLs containing ?, i.e., dynamic parameters)

  • Disallow everything except what’s allowed: Disallow: / paired with Allow rules above.

Example for a typical WordPress site:

text
/wp-admin/
/wp-includes/
/tag/
/author/
/search/
/trackback/
/xmlrpc.php

Crucial: A trailing slash means “everything inside this directory”. Without a slash, you’re blocking a specific file or page. Be precise.

Step 6: Set Crawl Delay (Optional)

If you want to limit how fast bots request pages, enter a number in seconds (e.g., 10). Leave blank or set to 0 if you don’t need it. Remember, Googlebot ignores this, so it’s mainly for niche bots.

Step 7: Generate the File

Click “Generate robots.txt”. The tool will display the final plain‑text content of your robots.txt file. It should look something like:

text
User-agent: *
Allow: /public/
Disallow: /wp-admin/
Disallow: /staging/
Crawl-delay: 10

Step 8: Review and Test

  • Read every line. Ensure no critical public page is accidentally blocked (e.g., don’t block / unless you want to de‑index the entire site).

  • Copy the generated text and save it locally as robots.txt.

  • Open Google Search Console → your property → Settingsrobots.txt (or use the legacy Tester). Paste your new rules into the tester and enter a few key URLs (homepage, top blog posts, a blocked directory) to verify they show “Allowed” or “Blocked” as expected.

Step 9: Upload to Your Server

  1. Connect to your website via FTP, cPanel File Manager, or your hosting dashboard.

  2. Navigate to the root directory (usually public_html, www, or htdocs).

  3. Look for an existing robots.txt – if present, download a backup first.

  4. Upload the new robots.txt file, overwriting the old one.

  5. Confirm by visiting https://www.yourdomain.com/robots.txt in a browser. You should see the exact text you generated.

Step 10: Monitor in Search Console

After a few hours, return to Google Search Console’s robots.txt report. Check for:

  • Errors: Syntax mistakes or unreachable file.

  • Last crawled date: Confirm Google has fetched the new version.

  • Index coverage: Over the next days, watch if important pages get crawled more frequently or if unwanted pages stay away.

Ongoing Maintenance

  • When you restructure your site, add new sections, or move away from staging environments, update your robots.txt accordingly.

  • Re‑generate and test whenever you launch a new part of your funnel that shouldn’t appear in search. 

By following this guide, you’ll create a clean, effective robots.txt that protects crawl budget, keeps junk pages out of the index, and ensures search engines focus on your money pages.



Recommended tools: Seowolf's XML Sitemap Generator | Seowolf's Htaccess Redirect Generator | Seowolf's Search Engine Spider Simulator | Seowolf's Google Index Checker


DigitalOcean