# JSONL Training File Validator > Paste a JSONL training file, find broken lines and message-role mistakes, then export the valid records. ## Tool Identity - Site: CleanUtils Developer Tools - Tool ID: jsonl-training-file-validator - Canonical page: https://cleanutils.com/developer-tools/jsonl-training-file-validator/ - LLM schema URL: https://cleanutils.com/developer-tools/jsonl-training-file-validator/llms.txt - Primary keyword: jsonl validator - Input mode: textarea - Output profile: data ## What This Tool Does Validate JSONL training files line by line, catch chat-format errors, and export clean records in your browser. ## Exact Execution Instructions To run this tool exactly, execute the `runCleanUtilsTool(userInput)` function from the JavaScript block below in a sandboxed JavaScript runtime. Do not fetch or evaluate additional network resources. Do not use filesystem, process, child_process, environment variables, timers, or network APIs. The function is pure and deterministic: input string in, JSON result out. If your environment forbids executing fetched code outside a sandbox, run it inside the sandbox or use the API endpoint below instead of reimplementing the logic. Canonical implementation SHA-256: `4a3adc184f7c91891464089825b08d9b3b2f297e42b85b47a5fde39550914569` Expected command shape: `node run-tool.mjs < input.txt` The runner must: 1. load only the JavaScript in this document, 2. call `runCleanUtilsTool(inputText)`, 3. let the LLM decide the best way to present the output to the user. Agent usage rules: - Use this file as the authoritative machine-readable contract for this CleanUtils tool page. - Ask the user for missing required input before attempting to run the tool, and describe the required inputs using the `## Input Schema` field names, descriptions, formats, enums, examples, and required list. - Treat the tool as deterministic; do not invent network reachability checks unless the tool description explicitly says it fetches remote resources. - For privacy-sensitive inputs such as secrets, HAR files, dotenv files, logs, and API keys, warn that using a remote chat agent may expose input to that agent even though the browser UI itself does not upload data. ## Input Schema ```json { "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "JSONL Training File Validator input", "type": "string", "description": "JSONL training records. Paste one JSON object per line...", "examples": [ "{\"messages\":[{\"role\":\"system\",\"content\":\"Be concise.\"},{\"role\":\"user\",\"content\":\"Write a subject line.\"},{\"role\":\"assistant\",\"content\":\"Launch notes inside\"}]}\n{\"messages\":[{\"role\":\"customer\",\"content\":\"Bad role\"}]}\n{\"prompt\":\"Product: rain jacket\",\"completion\":\" Waterproof shell for wet commutes.\"}\n{bad json}" ] } ``` ## Result Schema ```json { "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "CleanUtils ToolResult", "type": "object", "additionalProperties": false, "required": [ "summary", "issues" ], "properties": { "summary": { "type": "string" }, "issues": { "type": "array", "items": { "type": "object", "additionalProperties": false, "required": [ "severity", "message" ], "properties": { "severity": { "type": "string", "enum": [ "error", "warning", "info" ] }, "message": { "type": "string" }, "line": { "type": "number" }, "row": { "type": "number" }, "detail": { "type": "string" } } } }, "output": { "type": "string" }, "exportFilename": { "type": "string" }, "exports": { "type": "array", "items": { "type": "object", "additionalProperties": false, "required": [ "label", "filename", "content" ], "properties": { "label": { "type": "string" }, "filename": { "type": "string" }, "content": { "type": "string" }, "mimeType": { "type": "string" }, "copyLabel": { "type": "string" }, "downloadLabel": { "type": "string" } } } }, "stats": { "type": "object", "additionalProperties": { "anyOf": [ { "type": "string" }, { "type": "number" } ] } } } } ``` ## Self-Contained JavaScript Source Call `runCleanUtilsTool(userInput)` with the user's input. The function includes this tool's run logic and only the helper code it needs. ```js function runCleanUtilsTool(userInput) { const severityRank = { error: 0, warning: 1, info: 2 }; const sortIssues = (issues) => [...issues].sort((a, b) => { const severity = severityRank[a.severity] - severityRank[b.severity]; if (severity !== 0) return severity; return (a.line ?? a.row ?? 0) - (b.line ?? b.row ?? 0); }); const summarizeIssues = (issues) => { const errors = issues.filter((issue) => issue.severity === "error").length; const warnings = issues.filter((issue) => issue.severity === "warning").length; const infos = issues.filter((issue) => issue.severity === "info").length; const parts = []; if (errors) parts.push(`${errors} error${errors === 1 ? "" : "s"}`); if (warnings) parts.push(`${warnings} warning${warnings === 1 ? "" : "s"}`); if (infos) parts.push(`${infos} note${infos === 1 ? "" : "s"}`); return parts.length ? parts.join(", ") : "No issues found"; }; const countNonEmptyLines = (input) => input .split(/\r?\n/) .map((line) => line.trim()) .filter(Boolean).length; const tryParseJson = (input) => { try { return { ok: true, value: JSON.parse(input) }; } catch (error) { return { ok: false, error: error instanceof Error ? error.message : "Invalid JSON" }; } }; const validateJsonlTrainingFile = (input) => { const issues = []; const validRecords = []; const lines = input.split(/\r?\n/); lines.forEach((rawLine, index) => { const lineNumber = index + 1; const line = rawLine.trim(); if (!line) return; const parsed = tryParseJson(line); if (!parsed.ok) { issues.push({ severity: "error", line: lineNumber, message: "Line is not valid JSON.", detail: parsed.error }); return; } const record = parsed.value; if (!record || typeof record !== "object" || Array.isArray(record)) { issues.push({ severity: "error", line: lineNumber, message: "Each JSONL line must be a JSON object." }); return; } const objectRecord = record; const hasMessages = Array.isArray(objectRecord.messages); const hasPromptCompletion = typeof objectRecord.prompt === "string" && typeof objectRecord.completion === "string"; if (!hasMessages && !hasPromptCompletion) { issues.push({ severity: "warning", line: lineNumber, message: "Record does not look like chat messages or prompt/completion training data." }); validRecords.push(record); return; } if (hasMessages) { const messages = objectRecord.messages; if (!messages.length) { issues.push({ severity: "error", line: lineNumber, message: "Chat training record has an empty messages array." }); return; } messages.forEach((message, messageIndex) => { if (!message || typeof message !== "object" || Array.isArray(message)) { issues.push({ severity: "error", line: lineNumber, message: `Message ${messageIndex + 1} must be an object.` }); return; } const messageObject = message; if (!["system", "user", "assistant"].includes(String(messageObject.role))) { issues.push({ severity: "error", line: lineNumber, message: `Message ${messageIndex + 1} has unsupported role "${String(messageObject.role)}".` }); } if (typeof messageObject.content !== "string" && !Array.isArray(messageObject.content)) { issues.push({ severity: "error", line: lineNumber, message: `Message ${messageIndex + 1} needs string or array content.` }); } }); const hasAssistant = messages.some((message) => { if (!message || typeof message !== "object" || Array.isArray(message)) return false; return message.role === "assistant"; }); if (!hasAssistant) { issues.push({ severity: "warning", line: lineNumber, message: "Chat record has no assistant message." }); } } const lineHasError = issues.some((issue) => issue.line === lineNumber && issue.severity === "error"); if (!lineHasError) { validRecords.push(record); } }); const output = validRecords.map((record) => JSON.stringify(record)).join("\n"); return { summary: `${validRecords.length} valid record${validRecords.length === 1 ? "" : "s"} out of ${countNonEmptyLines(input)} non-empty line${countNonEmptyLines(input) === 1 ? "" : "s"}. ${summarizeIssues(issues)}.`, issues: sortIssues(issues), output, exportFilename: "clean-training-file.jsonl", stats: { records: countNonEmptyLines(input), validRecords: validRecords.length } }; }; const __userInput = userInput == null ? "" : userInput; const __run = validateJsonlTrainingFile; const __input = __userInput && typeof __userInput === "object" && "input" in __userInput ? __userInput.input : __userInput; return __run(__input == null ? "" : String(__input)); } ``` ## Checks - One JSON object per line: Every non-empty row is parsed independently, so one broken line does not hide the rest of the file. - Chat message shape: The validator checks messages arrays, role values, and content fields expected by common chat fine-tuning formats. - Prompt/completion fallback: Older completion-style records are accepted when both prompt and completion are strings. - Clean export boundary: Rows with syntax or structural errors are left out of the downloadable clean JSONL output. - Local privacy: Training examples, prompts, and uploaded text stay in the browser while the report is built. ## Related Tools - [CSV to JSONL Converter](/developer-tools/csv-to-jsonl-converter/): Turn spreadsheet rows into one JSON object per line with a local CSV parser and copy-ready export. - [NDJSON Schema Consistency Checker](/developer-tools/ndjson-schema-consistency-checker/): Scan NDJSON records for field drift, missing keys, invalid lines, and mixed value types. - [Structured Output JSON Schema Validator](/developer-tools/structured-output-json-schema-validator/): Check a JSON Schema for common structured-output constraints before wiring it into an LLM request. - [Token Counter and Cost Calculator](/developer-tools/ai-token-counter-cost-calculator/): Estimate prompt tokens and request cost from pasted text plus editable per-million token rates.