# Robots.txt Rule Tester > Test a URL path against pasted robots.txt rules and explain the winning allow or disallow rule. ## Tool Identity - Site: CleanUtils Business Tools - Tool ID: robots-txt-rule-tester - Canonical page: https://cleanutils.com/business-tools/robots-txt-rule-tester/ - LLM schema URL: https://cleanutils.com/business-tools/robots-txt-rule-tester/llms.txt - Primary keyword: robots txt tester - Input mode: fields - Output profile: metrics ## What This Tool Does Test a URL path against pasted robots.txt rules and explain the winning allow or disallow rule. ## Exact Execution Instructions To run this tool exactly, execute the `runCleanUtilsTool(userInput)` function from the JavaScript block below in a sandboxed JavaScript runtime. Do not fetch or evaluate additional network resources. Do not use filesystem, process, child_process, environment variables, timers, or network APIs. The function is pure and deterministic: input object in, JSON result out. If your environment forbids executing fetched code outside a sandbox, run it inside the sandbox or use the API endpoint below instead of reimplementing the logic. Canonical implementation SHA-256: `f8a97938753cd5419782fc6777a308740551c0721dfc863e719ee943d7b61011` Expected command shape: `node run-tool.mjs < input.json` The runner must: 1. load only the JavaScript in this document, 2. parse stdin as JSON and call `runCleanUtilsTool(userInput)`, 3. let the LLM decide the best way to present the output to the user. Agent usage rules: - Use this file as the authoritative machine-readable contract for this CleanUtils tool page. - Ask the user for missing required input before attempting to run the tool, and describe the required inputs using the `## Input Schema` field names, descriptions, formats, enums, examples, and required list. - Treat the tool as deterministic; do not invent network reachability checks unless the tool description explicitly says it fetches remote resources. - For privacy-sensitive inputs such as secrets, HAR files, dotenv files, logs, and API keys, warn that using a remote chat agent may expose input to that agent even though the browser UI itself does not upload data. ## Input Schema ```json { "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "Robots.txt Rule Tester fields", "type": "object", "additionalProperties": false, "required": [ "user_agent", "path", "robots" ], "properties": { "user_agent": { "type": "string", "description": "User agent Required. Group: Test request.", "examples": [ "Googlebot" ] }, "path": { "type": "string", "description": "Path Required. Group: Test request.", "examples": [ "/private/report.html" ] }, "robots": { "type": "string", "description": "robots.txt rules Required. Control type: textarea. Group: Rules.", "examples": [ "User-agent: *\nDisallow: /private/\nAllow: /private/public/" ] } } } ``` ## Result Schema ```json { "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "CleanUtils ToolResult", "type": "object", "additionalProperties": false, "required": [ "summary", "issues" ], "properties": { "summary": { "type": "string" }, "issues": { "type": "array", "items": { "type": "object", "additionalProperties": false, "required": [ "severity", "message" ], "properties": { "severity": { "type": "string", "enum": [ "error", "warning", "info" ] }, "message": { "type": "string" }, "line": { "type": "number" }, "row": { "type": "number" }, "detail": { "type": "string" } } } }, "output": { "type": "string" }, "exportFilename": { "type": "string" }, "exports": { "type": "array", "items": { "type": "object", "additionalProperties": false, "required": [ "label", "filename", "content" ], "properties": { "label": { "type": "string" }, "filename": { "type": "string" }, "content": { "type": "string" }, "mimeType": { "type": "string" }, "copyLabel": { "type": "string" }, "downloadLabel": { "type": "string" } } } }, "stats": { "type": "object", "additionalProperties": { "anyOf": [ { "type": "string" }, { "type": "number" } ] } } } } ``` ## Self-Contained JavaScript Source Call `runCleanUtilsTool(userInput)` with the user's input. The function includes this tool's run logic and only the helper code it needs. ```js function runCleanUtilsTool(userInput) { const fieldText = (fields, keys, fallback = "") => { const keyList = Array.isArray(keys) ? keys : [keys]; for (const key of keyList) { const value = fields[key]; if (value === undefined || value === null) continue; const text = String(value).trim(); if (text) return text; } return fallback; }; const testRobotsTxtRules = (input) => { const userAgent = fieldText(input, "user_agent", "Googlebot"); const path = fieldText(input, "path", "/"); const robots = fieldText(input, "robots"); const groups = []; let current = null; robots.split(/\r?\n/).forEach((line) => { const trimmed = line.replace(/#.*/, "").trim(); if (!trimmed) return; const [rawKey, ...rawValue] = trimmed.split(":"); const key = rawKey.toLowerCase(); const value = rawValue.join(":").trim(); if (key === "user-agent") { if (!current || current.rules.length) { current = { agents: [], rules: [] }; groups.push(current); } current.agents.push(value.toLowerCase()); } else if ((key === "allow" || key === "disallow") && current) { current.rules.push({ type: key, path: value }); } }); const applicable = groups.filter((group) => group.agents.some((agent) => agent === "*" || userAgent.toLowerCase().includes(agent))); const matches = applicable.flatMap((group) => group.rules.filter((rule) => path.startsWith(rule.path.replace(/\*/g, "")))); matches.sort((a, b) => b.path.length - a.path.length || (a.type === "allow" ? -1 : 1)); const winner = matches[0]; const allowed = !winner || winner.type === "allow" || winner.path === ""; return { summary: `${userAgent} is ${allowed ? "allowed" : "blocked"} for ${path}.`, issues: applicable.length ? [] : [{ severity: "warning", message: "No matching user-agent group found; default is allowed." }], output: [ `User-agent: ${userAgent}`, `Path: ${path}`, `Decision: ${allowed ? "allow" : "disallow"}`, winner ? `Winning rule: ${winner.type}: ${winner.path || "(empty)"}` : "Winning rule: none" ].join("\n"), exportFilename: "robots-rule-test.txt", stats: { matchingRules: matches.length } }; }; const __userInput = userInput == null ? {} : userInput; const __run = (fields) => testRobotsTxtRules(fields); const __fields = __userInput && typeof __userInput === "object" && "fields" in __userInput && __userInput.fields && typeof __userInput.fields === "object" && !Array.isArray(__userInput.fields) ? __userInput.fields : (__userInput && typeof __userInput === "object" && !Array.isArray(__userInput) ? __userInput : {}); const __normalizedFields = Object.fromEntries(Object.entries(__fields).map(([key, value]) => [key, value == null ? "" : (["string", "number", "boolean"].includes(typeof value) ? value : String(value))])); return __run(__normalizedFields); } ``` ## Checks - User-agent matching: The requested crawler is matched against explicit groups and wildcard groups. - Path rule matching: Allow and Disallow paths are compared with the test path. - Most specific rule: Matching rules are sorted so the longest applicable path wins, with allow winning ties. - Comment handling: Inline comments and blank lines are ignored while parsing rules. - Simplified scope: The tester does not fetch live robots.txt, evaluate crawl-delay, or emulate every crawler-specific extension. ## Related Tools - [Sitemap XML URL Count Checker](/business-tools/sitemap-xml-url-count-checker/): Count sitemap URL entries, sitemap index entries, and malformed loc values from pasted XML. - [Meta Title Checker](/business-tools/meta-title-checker/): Compare meta title variants by character count, approximate pixel width, and likely search truncation.