# Robots.txt Rule Tester

> Test a URL path against pasted robots.txt rules and explain the winning allow or disallow rule.

## Tool Identity

- Site: CleanUtils Business Tools
- Tool ID: robots-txt-rule-tester
- Canonical page: https://cleanutils.com/business-tools/robots-txt-rule-tester/
- LLM schema URL: https://cleanutils.com/business-tools/robots-txt-rule-tester/llms.txt
- Primary keyword: robots txt tester
- Input mode: fields
- Output profile: metrics

## What This Tool Does

Test a URL path against pasted robots.txt rules and explain the winning allow or disallow rule.

## Exact Execution Instructions

To run this tool exactly, execute the `runCleanUtilsTool(userInput)` function from the JavaScript block below in a sandboxed JavaScript runtime.

Do not fetch or evaluate additional network resources. Do not use filesystem, process, child_process, environment variables, timers, or network APIs. The function is pure and deterministic: input object in, JSON result out.

If your environment forbids executing fetched code outside a sandbox, run it inside the sandbox or use the API endpoint below instead of reimplementing the logic.

Canonical implementation SHA-256:
`f8a97938753cd5419782fc6777a308740551c0721dfc863e719ee943d7b61011`

Expected command shape:
`node run-tool.mjs < input.json`

The runner must:
1. load only the JavaScript in this document,
2. parse stdin as JSON and call `runCleanUtilsTool(userInput)`,
3. let the LLM decide the best way to present the output to the user.

Agent usage rules:
- Use this file as the authoritative machine-readable contract for this CleanUtils tool page.
- Ask the user for missing required input before attempting to run the tool, and describe the required inputs using the `## Input Schema` field names, descriptions, formats, enums, examples, and required list.
- Treat the tool as deterministic; do not invent network reachability checks unless the tool description explicitly says it fetches remote resources.
- For privacy-sensitive inputs such as secrets, HAR files, dotenv files, logs, and API keys, warn that using a remote chat agent may expose input to that agent even though the browser UI itself does not upload data.

## Input Schema

```json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Robots.txt Rule Tester fields",
  "type": "object",
  "additionalProperties": false,
  "required": [
    "user_agent",
    "path",
    "robots"
  ],
  "properties": {
    "user_agent": {
      "type": "string",
      "description": "User agent Required. Group: Test request.",
      "examples": [
        "Googlebot"
      ]
    },
    "path": {
      "type": "string",
      "description": "Path Required. Group: Test request.",
      "examples": [
        "/private/report.html"
      ]
    },
    "robots": {
      "type": "string",
      "description": "robots.txt rules Required. Control type: textarea. Group: Rules.",
      "examples": [
        "User-agent: *\nDisallow: /private/\nAllow: /private/public/"
      ]
    }
  }
}
```

## Result Schema

```json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "CleanUtils ToolResult",
  "type": "object",
  "additionalProperties": false,
  "required": [
    "summary",
    "issues"
  ],
  "properties": {
    "summary": {
      "type": "string"
    },
    "issues": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": [
          "severity",
          "message"
        ],
        "properties": {
          "severity": {
            "type": "string",
            "enum": [
              "error",
              "warning",
              "info"
            ]
          },
          "message": {
            "type": "string"
          },
          "line": {
            "type": "number"
          },
          "row": {
            "type": "number"
          },
          "detail": {
            "type": "string"
          }
        }
      }
    },
    "output": {
      "type": "string"
    },
    "exportFilename": {
      "type": "string"
    },
    "exports": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": [
          "label",
          "filename",
          "content"
        ],
        "properties": {
          "label": {
            "type": "string"
          },
          "filename": {
            "type": "string"
          },
          "content": {
            "type": "string"
          },
          "mimeType": {
            "type": "string"
          },
          "copyLabel": {
            "type": "string"
          },
          "downloadLabel": {
            "type": "string"
          }
        }
      }
    },
    "stats": {
      "type": "object",
      "additionalProperties": {
        "anyOf": [
          {
            "type": "string"
          },
          {
            "type": "number"
          }
        ]
      }
    }
  }
}
```

## Self-Contained JavaScript Source

Call `runCleanUtilsTool(userInput)` with the user's input. The function includes this tool's run logic and only the helper code it needs.

```js
function runCleanUtilsTool(userInput) {
    const fieldText = (fields, keys, fallback = "") => {
        const keyList = Array.isArray(keys) ? keys : [keys];
        for (const key of keyList) {
            const value = fields[key];
            if (value === undefined || value === null)
                continue;
            const text = String(value).trim();
            if (text)
                return text;
        }
        return fallback;
    };
    const testRobotsTxtRules = (input) => {
        const userAgent = fieldText(input, "user_agent", "Googlebot");
        const path = fieldText(input, "path", "/");
        const robots = fieldText(input, "robots");
        const groups = [];
        let current = null;
        robots.split(/\r?\n/).forEach((line) => {
            const trimmed = line.replace(/#.*/, "").trim();
            if (!trimmed)
                return;
            const [rawKey, ...rawValue] = trimmed.split(":");
            const key = rawKey.toLowerCase();
            const value = rawValue.join(":").trim();
            if (key === "user-agent") {
                if (!current || current.rules.length) {
                    current = { agents: [], rules: [] };
                    groups.push(current);
                }
                current.agents.push(value.toLowerCase());
            }
            else if ((key === "allow" || key === "disallow") && current) {
                current.rules.push({ type: key, path: value });
            }
        });
        const applicable = groups.filter((group) => group.agents.some((agent) => agent === "*" || userAgent.toLowerCase().includes(agent)));
        const matches = applicable.flatMap((group) => group.rules.filter((rule) => path.startsWith(rule.path.replace(/\*/g, ""))));
        matches.sort((a, b) => b.path.length - a.path.length || (a.type === "allow" ? -1 : 1));
        const winner = matches[0];
        const allowed = !winner || winner.type === "allow" || winner.path === "";
        return {
            summary: `${userAgent} is ${allowed ? "allowed" : "blocked"} for ${path}.`,
            issues: applicable.length ? [] : [{ severity: "warning", message: "No matching user-agent group found; default is allowed." }],
            output: [
                `User-agent: ${userAgent}`,
                `Path: ${path}`,
                `Decision: ${allowed ? "allow" : "disallow"}`,
                winner ? `Winning rule: ${winner.type}: ${winner.path || "(empty)"}` : "Winning rule: none"
            ].join("\n"),
            exportFilename: "robots-rule-test.txt",
            stats: { matchingRules: matches.length }
        };
    };
    const __userInput = userInput == null ? {} : userInput;
    const __run = (fields) => testRobotsTxtRules(fields);
    const __fields = __userInput && typeof __userInput === "object" && "fields" in __userInput && __userInput.fields && typeof __userInput.fields === "object" && !Array.isArray(__userInput.fields)
        ? __userInput.fields
        : (__userInput && typeof __userInput === "object" && !Array.isArray(__userInput) ? __userInput : {});
    const __normalizedFields = Object.fromEntries(Object.entries(__fields).map(([key, value]) => [key, value == null ? "" : (["string", "number", "boolean"].includes(typeof value) ? value : String(value))]));
    return __run(__normalizedFields);
}
```

## Checks

- User-agent matching: The requested crawler is matched against explicit groups and wildcard groups.
- Path rule matching: Allow and Disallow paths are compared with the test path.
- Most specific rule: Matching rules are sorted so the longest applicable path wins, with allow winning ties.
- Comment handling: Inline comments and blank lines are ignored while parsing rules.
- Simplified scope: The tester does not fetch live robots.txt, evaluate crawl-delay, or emulate every crawler-specific extension.

## Related Tools

- [Sitemap XML URL Count Checker](/business-tools/sitemap-xml-url-count-checker/): Count sitemap URL entries, sitemap index entries, and malformed loc values from pasted XML.
- [Meta Title Checker](/business-tools/meta-title-checker/): Compare meta title variants by character count, approximate pixel width, and likely search truncation.