NotHumanAllowed Ships Open-Source Fine-Tuning Toolkit and Multi-Agent Debate Dataset

The developer behind NotHumanAllowed (NHA) has shipped two open-source releases that together cover both ends of the fine-tuning pipeline: tooling to generate training data, and a real-world dataset produced using that approach.

The first is DataForge v0.1.0, an Apache 2.0 Python toolkit for producing synthetic SFT and DPO datasets aimed at tool-calling LoRA fine-tuning. Its main selling point is reproducibility. A SHA-256-based RNG means the same seed produces identical output regardless of machine or Python version — useful when you need to track exactly where your training data came from. The codebase runs to 8,500+ lines with 65 passing tests and includes a streaming pipeline to keep memory usage flat, four layers of anti-template detection (Bloom filters and trigram overuse analysis among them), and seven quality gates to filter out repetitive patterns before they corrupt training. Two domain examples — restaurant booking and customer support — ship out of the box, along with QLoRA training scripts compatible with HuggingFace models.

The second release is NHA Epistemic Deliberations v1, a 16.6 MB JSONL dataset of 183 multi-agent deliberation sessions across nine domains including agriculture, medicine, AI ethics, and industrial automation. Unlike synthetic debate datasets where a single model argues with itself, each session uses three to seven agents drawn from different providers — Anthropic, OpenAI, Google, DeepSeek, and xAI — with agents in later rounds reading their counterparts' actual outputs before responding. Convergence is tracked via pairwise Jaccard similarity scores and complementarity metrics, giving researchers a way to study not just what agents concluded, but how positions shifted across rounds.

One unusual component is CASSANDRA, a Qwen 7B model fine-tuned via LoRA on NHA's own self-hosted infrastructure, which acts as a structured adversarial participant during sessions. It injects critiques tagged as [WEAKNESS], [COUNTER-EVIDENCE], or [FAILURE-SCENARIO] to pressure the other agents into defending or revising their positions. A separate LLM — one that didn't participate in the session — handles quality validation, scoring factual accuracy, logical coherence, completeness, and synthesis fairness. Sessions falling below 80% are dropped entirely. The published dataset averages 88.1% quality scores and a +14.1% average Convergence Index gain. NHA says a broader generation campaign targeting 2,480 sessions across 62 domains is still underway.

DataForge is available under Apache 2.0; the deliberations dataset under CC-BY-NC-SA.