# Sitemap XML URL Count Checker
> Count sitemap URL entries, sitemap index entries, and malformed loc values from pasted XML.
## Tool Identity
- Site: CleanUtils Business Tools
- Tool ID: sitemap-xml-url-count-checker
- Canonical page: https://cleanutils.com/business-tools/sitemap-xml-url-count-checker/
- LLM schema URL: https://cleanutils.com/business-tools/sitemap-xml-url-count-checker/llms.txt
- Primary keyword: sitemap url counter
- Input mode: textarea
- Output profile: line-check
## What This Tool Does
Count sitemap URL entries, sitemap index entries, and malformed loc values from pasted XML.
## Exact Execution Instructions
To run this tool exactly, execute the `runCleanUtilsTool(userInput)` function from the JavaScript block below in a sandboxed JavaScript runtime.
Do not fetch or evaluate additional network resources. Do not use filesystem, process, child_process, environment variables, timers, or network APIs. The function is pure and deterministic: input string in, JSON result out.
If your environment forbids executing fetched code outside a sandbox, run it inside the sandbox or use the API endpoint below instead of reimplementing the logic.
Canonical implementation SHA-256:
`6674d1980120c5490ec07c39909c2c66182a48cbda2aaa7894eacded0947d610`
Expected command shape:
`node run-tool.mjs < input.txt`
The runner must:
1. load only the JavaScript in this document,
2. call `runCleanUtilsTool(inputText)`,
3. let the LLM decide the best way to present the output to the user.
Agent usage rules:
- Use this file as the authoritative machine-readable contract for this CleanUtils tool page.
- Ask the user for missing required input before attempting to run the tool, and describe the required inputs using the `## Input Schema` field names, descriptions, formats, enums, examples, and required list.
- Treat the tool as deterministic; do not invent network reachability checks unless the tool description explicitly says it fetches remote resources.
- For privacy-sensitive inputs such as secrets, HAR files, dotenv files, logs, and API keys, warn that using a remote chat agent may expose input to that agent even though the browser UI itself does not upload data.
## Input Schema
```json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Sitemap XML URL Count Checker input",
"type": "string",
"description": "Sitemap XML. https://example.com/",
"examples": [
"https://example.com/https://example.com/tools/"
]
}
```
## Result Schema
```json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "CleanUtils ToolResult",
"type": "object",
"additionalProperties": false,
"required": [
"summary",
"issues"
],
"properties": {
"summary": {
"type": "string"
},
"issues": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"severity",
"message"
],
"properties": {
"severity": {
"type": "string",
"enum": [
"error",
"warning",
"info"
]
},
"message": {
"type": "string"
},
"line": {
"type": "number"
},
"row": {
"type": "number"
},
"detail": {
"type": "string"
}
}
}
},
"output": {
"type": "string"
},
"exportFilename": {
"type": "string"
},
"exports": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"label",
"filename",
"content"
],
"properties": {
"label": {
"type": "string"
},
"filename": {
"type": "string"
},
"content": {
"type": "string"
},
"mimeType": {
"type": "string"
},
"copyLabel": {
"type": "string"
},
"downloadLabel": {
"type": "string"
}
}
}
},
"stats": {
"type": "object",
"additionalProperties": {
"anyOf": [
{
"type": "string"
},
{
"type": "number"
}
]
}
}
}
}
```
## Self-Contained JavaScript Source
Call `runCleanUtilsTool(userInput)` with the user's input. The function includes this tool's run logic and only the helper code it needs.
```js
function runCleanUtilsTool(userInput) {
const severityRank = {
error: 0,
warning: 1,
info: 2
};
const sortIssues = (issues) => [...issues].sort((a, b) => {
const severity = severityRank[a.severity] - severityRank[b.severity];
if (severity !== 0)
return severity;
return (a.line ?? a.row ?? 0) - (b.line ?? b.row ?? 0);
});
const looksLikeUrl = (value) => {
try {
const url = new URL(value.trim());
return url.protocol === "http:" || url.protocol === "https:";
}
catch {
return false;
}
};
const countSitemapUrls = (input) => {
const issues = [];
const urlLocs = [...input.matchAll(/([\s\S]*?)<\/loc>[\s\S]*?<\/url>/gi)].map((match) => match[1].trim());
const sitemapLocs = [...input.matchAll(/([\s\S]*?)<\/loc>[\s\S]*?<\/sitemap>/gi)].map((match) => match[1].trim());
const validUrlLocs = urlLocs.filter(looksLikeUrl);
const validSitemapLocs = sitemapLocs.filter(looksLikeUrl);
const malformedLocs = [...urlLocs, ...sitemapLocs].filter((loc) => !looksLikeUrl(loc));
malformedLocs.forEach((loc) => {
issues.push({ severity: "warning", message: `Malformed loc URL: ${loc}` });
});
if (!urlLocs.length && !sitemapLocs.length)
issues.push({ severity: "error", message: "No or entries found." });
return {
summary: `${urlLocs.length} URL loc entr${urlLocs.length === 1 ? "y" : "ies"} found: ${validUrlLocs.length} valid, ${urlLocs.length - validUrlLocs.length} malformed. ${sitemapLocs.length} sitemap index entr${sitemapLocs.length === 1 ? "y" : "ies"} found.`,
issues: sortIssues(issues),
output: [
`URL loc entries: ${urlLocs.length}`,
`Valid URL loc entries: ${validUrlLocs.length}`,
`Malformed URL loc entries: ${urlLocs.length - validUrlLocs.length}`,
`Sitemap index loc entries: ${sitemapLocs.length}`,
`Valid sitemap index loc entries: ${validSitemapLocs.length}`,
`Malformed sitemap index loc entries: ${sitemapLocs.length - validSitemapLocs.length}`,
"",
...urlLocs.slice(0, 100)
].join("\n"),
exportFilename: "sitemap-url-count.txt",
stats: {
urlLocEntries: urlLocs.length,
validUrlLocEntries: validUrlLocs.length,
malformedUrlLocEntries: urlLocs.length - validUrlLocs.length,
sitemapIndexLocEntries: sitemapLocs.length,
validSitemapIndexLocEntries: validSitemapLocs.length,
malformedSitemapIndexLocEntries: sitemapLocs.length - validSitemapLocs.length
}
};
};
const __userInput = userInput == null ? "" : userInput;
const __run = countSitemapUrls;
const __input = __userInput && typeof __userInput === "object" && "input" in __userInput ? __userInput.input : __userInput;
return __run(__input == null ? "" : String(__input));
}
```
## Checks
- URL entries: The tool counts loc values inside url entries.
- Sitemap index entries: The tool also counts loc values inside sitemap index entries.
- Malformed loc warnings: loc values that do not look like absolute URLs are flagged.
- Paste or upload workflow: The checker works from pasted XML without crawling a live website.
- No recursion: Sitemap indexes are counted, but child sitemap URLs are not fetched in the browser.
## Related Tools
- [Robots.txt Rule Tester](/business-tools/robots-txt-rule-tester/): Test a URL path against pasted robots.txt rules and explain the winning allow or disallow rule.
- [Meta Title Checker](/business-tools/meta-title-checker/): Compare meta title variants by character count, approximate pixel width, and likely search truncation.