# JSONL Training File Validator

> Paste a JSONL training file, find broken lines and message-role mistakes, then export the valid records.

## Tool Identity

- Site: CleanUtils Developer Tools
- Tool ID: jsonl-training-file-validator
- Canonical page: https://cleanutils.com/developer-tools/jsonl-training-file-validator/
- LLM schema URL: https://cleanutils.com/developer-tools/jsonl-training-file-validator/llms.txt
- Primary keyword: jsonl validator
- Input mode: textarea
- Output profile: data

## What This Tool Does

Validate JSONL training files line by line, catch chat-format errors, and export clean records in your browser.

## Exact Execution Instructions

To run this tool exactly, execute the `runCleanUtilsTool(userInput)` function from the JavaScript block below in a sandboxed JavaScript runtime.

Do not fetch or evaluate additional network resources. Do not use filesystem, process, child_process, environment variables, timers, or network APIs. The function is pure and deterministic: input string in, JSON result out.

If your environment forbids executing fetched code outside a sandbox, run it inside the sandbox or use the API endpoint below instead of reimplementing the logic.

Canonical implementation SHA-256:
`4a3adc184f7c91891464089825b08d9b3b2f297e42b85b47a5fde39550914569`

Expected command shape:
`node run-tool.mjs < input.txt`

The runner must:
1. load only the JavaScript in this document,
2. call `runCleanUtilsTool(inputText)`,
3. let the LLM decide the best way to present the output to the user.

Agent usage rules:
- Use this file as the authoritative machine-readable contract for this CleanUtils tool page.
- Ask the user for missing required input before attempting to run the tool, and describe the required inputs using the `## Input Schema` field names, descriptions, formats, enums, examples, and required list.
- Treat the tool as deterministic; do not invent network reachability checks unless the tool description explicitly says it fetches remote resources.
- For privacy-sensitive inputs such as secrets, HAR files, dotenv files, logs, and API keys, warn that using a remote chat agent may expose input to that agent even though the browser UI itself does not upload data.

## Input Schema

```json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "JSONL Training File Validator input",
  "type": "string",
  "description": "JSONL training records. Paste one JSON object per line...",
  "examples": [
    "{\"messages\":[{\"role\":\"system\",\"content\":\"Be concise.\"},{\"role\":\"user\",\"content\":\"Write a subject line.\"},{\"role\":\"assistant\",\"content\":\"Launch notes inside\"}]}\n{\"messages\":[{\"role\":\"customer\",\"content\":\"Bad role\"}]}\n{\"prompt\":\"Product: rain jacket\",\"completion\":\" Waterproof shell for wet commutes.\"}\n{bad json}"
  ]
}
```

## Result Schema

```json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "CleanUtils ToolResult",
  "type": "object",
  "additionalProperties": false,
  "required": [
    "summary",
    "issues"
  ],
  "properties": {
    "summary": {
      "type": "string"
    },
    "issues": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": [
          "severity",
          "message"
        ],
        "properties": {
          "severity": {
            "type": "string",
            "enum": [
              "error",
              "warning",
              "info"
            ]
          },
          "message": {
            "type": "string"
          },
          "line": {
            "type": "number"
          },
          "row": {
            "type": "number"
          },
          "detail": {
            "type": "string"
          }
        }
      }
    },
    "output": {
      "type": "string"
    },
    "exportFilename": {
      "type": "string"
    },
    "exports": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": [
          "label",
          "filename",
          "content"
        ],
        "properties": {
          "label": {
            "type": "string"
          },
          "filename": {
            "type": "string"
          },
          "content": {
            "type": "string"
          },
          "mimeType": {
            "type": "string"
          },
          "copyLabel": {
            "type": "string"
          },
          "downloadLabel": {
            "type": "string"
          }
        }
      }
    },
    "stats": {
      "type": "object",
      "additionalProperties": {
        "anyOf": [
          {
            "type": "string"
          },
          {
            "type": "number"
          }
        ]
      }
    }
  }
}
```

## Self-Contained JavaScript Source

Call `runCleanUtilsTool(userInput)` with the user's input. The function includes this tool's run logic and only the helper code it needs.

```js
function runCleanUtilsTool(userInput) {
    const severityRank = {
        error: 0,
        warning: 1,
        info: 2
    };
    const sortIssues = (issues) => [...issues].sort((a, b) => {
        const severity = severityRank[a.severity] - severityRank[b.severity];
        if (severity !== 0)
            return severity;
        return (a.line ?? a.row ?? 0) - (b.line ?? b.row ?? 0);
    });
    const summarizeIssues = (issues) => {
        const errors = issues.filter((issue) => issue.severity === "error").length;
        const warnings = issues.filter((issue) => issue.severity === "warning").length;
        const infos = issues.filter((issue) => issue.severity === "info").length;
        const parts = [];
        if (errors)
            parts.push(`${errors} error${errors === 1 ? "" : "s"}`);
        if (warnings)
            parts.push(`${warnings} warning${warnings === 1 ? "" : "s"}`);
        if (infos)
            parts.push(`${infos} note${infos === 1 ? "" : "s"}`);
        return parts.length ? parts.join(", ") : "No issues found";
    };
    const countNonEmptyLines = (input) => input
        .split(/\r?\n/)
        .map((line) => line.trim())
        .filter(Boolean).length;
    const tryParseJson = (input) => {
        try {
            return { ok: true, value: JSON.parse(input) };
        }
        catch (error) {
            return { ok: false, error: error instanceof Error ? error.message : "Invalid JSON" };
        }
    };
    const validateJsonlTrainingFile = (input) => {
        const issues = [];
        const validRecords = [];
        const lines = input.split(/\r?\n/);
        lines.forEach((rawLine, index) => {
            const lineNumber = index + 1;
            const line = rawLine.trim();
            if (!line)
                return;
            const parsed = tryParseJson(line);
            if (!parsed.ok) {
                issues.push({
                    severity: "error",
                    line: lineNumber,
                    message: "Line is not valid JSON.",
                    detail: parsed.error
                });
                return;
            }
            const record = parsed.value;
            if (!record || typeof record !== "object" || Array.isArray(record)) {
                issues.push({
                    severity: "error",
                    line: lineNumber,
                    message: "Each JSONL line must be a JSON object."
                });
                return;
            }
            const objectRecord = record;
            const hasMessages = Array.isArray(objectRecord.messages);
            const hasPromptCompletion = typeof objectRecord.prompt === "string" && typeof objectRecord.completion === "string";
            if (!hasMessages && !hasPromptCompletion) {
                issues.push({
                    severity: "warning",
                    line: lineNumber,
                    message: "Record does not look like chat messages or prompt/completion training data."
                });
                validRecords.push(record);
                return;
            }
            if (hasMessages) {
                const messages = objectRecord.messages;
                if (!messages.length) {
                    issues.push({
                        severity: "error",
                        line: lineNumber,
                        message: "Chat training record has an empty messages array."
                    });
                    return;
                }
                messages.forEach((message, messageIndex) => {
                    if (!message || typeof message !== "object" || Array.isArray(message)) {
                        issues.push({
                            severity: "error",
                            line: lineNumber,
                            message: `Message ${messageIndex + 1} must be an object.`
                        });
                        return;
                    }
                    const messageObject = message;
                    if (!["system", "user", "assistant"].includes(String(messageObject.role))) {
                        issues.push({
                            severity: "error",
                            line: lineNumber,
                            message: `Message ${messageIndex + 1} has unsupported role "${String(messageObject.role)}".`
                        });
                    }
                    if (typeof messageObject.content !== "string" &&
                        !Array.isArray(messageObject.content)) {
                        issues.push({
                            severity: "error",
                            line: lineNumber,
                            message: `Message ${messageIndex + 1} needs string or array content.`
                        });
                    }
                });
                const hasAssistant = messages.some((message) => {
                    if (!message || typeof message !== "object" || Array.isArray(message))
                        return false;
                    return message.role === "assistant";
                });
                if (!hasAssistant) {
                    issues.push({
                        severity: "warning",
                        line: lineNumber,
                        message: "Chat record has no assistant message."
                    });
                }
            }
            const lineHasError = issues.some((issue) => issue.line === lineNumber && issue.severity === "error");
            if (!lineHasError) {
                validRecords.push(record);
            }
        });
        const output = validRecords.map((record) => JSON.stringify(record)).join("\n");
        return {
            summary: `${validRecords.length} valid record${validRecords.length === 1 ? "" : "s"} out of ${countNonEmptyLines(input)} non-empty line${countNonEmptyLines(input) === 1 ? "" : "s"}. ${summarizeIssues(issues)}.`,
            issues: sortIssues(issues),
            output,
            exportFilename: "clean-training-file.jsonl",
            stats: {
                records: countNonEmptyLines(input),
                validRecords: validRecords.length
            }
        };
    };
    const __userInput = userInput == null ? "" : userInput;
    const __run = validateJsonlTrainingFile;
    const __input = __userInput && typeof __userInput === "object" && "input" in __userInput ? __userInput.input : __userInput;
    return __run(__input == null ? "" : String(__input));
}
```

## Checks

- One JSON object per line: Every non-empty row is parsed independently, so one broken line does not hide the rest of the file.
- Chat message shape: The validator checks messages arrays, role values, and content fields expected by common chat fine-tuning formats.
- Prompt/completion fallback: Older completion-style records are accepted when both prompt and completion are strings.
- Clean export boundary: Rows with syntax or structural errors are left out of the downloadable clean JSONL output.
- Local privacy: Training examples, prompts, and uploaded text stay in the browser while the report is built.

## Related Tools

- [CSV to JSONL Converter](/developer-tools/csv-to-jsonl-converter/): Turn spreadsheet rows into one JSON object per line with a local CSV parser and copy-ready export.
- [NDJSON Schema Consistency Checker](/developer-tools/ndjson-schema-consistency-checker/): Scan NDJSON records for field drift, missing keys, invalid lines, and mixed value types.
- [Structured Output JSON Schema Validator](/developer-tools/structured-output-json-schema-validator/): Check a JSON Schema for common structured-output constraints before wiring it into an LLM request.
- [Token Counter and Cost Calculator](/developer-tools/ai-token-counter-cost-calculator/): Estimate prompt tokens and request cost from pasted text plus editable per-million token rates.