Five rounds of adversarial testing:
75/100
38 bugs found including narrator identity mismatches, mursal-sahih violations, false companions.
Mursal cap gap identified; Sahihayn T10+ contamination found.
Arabic matn quality, isnad-matn split verification, clustering validation, teacher-student graph checks, chronological coherence.
With findings — Hammad ibn Salamah T9 (should be T2), 3 Sahihayn PID misresolutions, parser edge cases.
Nothing new to find
— remaining errors are long-tail disambiguation; same error class shrinking each round.
33 automated tests organized into four categories:
NRS Integrity (12 tests)
- Specific narrator spot checks: al-Zuhri, Aisha, Sa'id ibn al-Musayyab, Malik
- T1/companion consistency, death year ranges, registry counts
Corpus Structure (9 tests)
- 449,285 record count (v3.4 deployed; 457,526 at v3.26 staged)
- Mursal-sahih = 0
- T10+ Sahihayn = 0
- Multi-PID = 0
- Grade agreement, Bukhari rate >= 95.9% (canonical floor)
Cross-field consistency (7 tests, added v3.6+)
- Regression detail vs computedGrade alignment
- Provenance array completeness
- Override flag respect (
_naqd3Override,_ilalCap) - resolutionRate freshness (Bug 1.2 regression guard)
Regression Anchors (6 tests)
- Bukhari #1 sahih, #2 sahih, #8 sahih, #15 sahih, #7280 hasan
- Ahmad #22007 very-weak
Documentation (2 tests)
- Methodology file exists and >30KB
- Manifest file exists
| Category | Tests | Role |
|---|---|---|
| NRS Integrity | 12 | Narrator and registry invariants |
| Corpus Structure | 9 | Counts, caps, and grade sanity |
| Regression Anchors | 6 | Anchor hadith grades stable |
| Documentation | 2 | Shipped artifacts present |
CI-ready: exit code 0 = all pass; exit code 1 = any fail. Run after every correction batch.
Six automated phases:
| Phase | Focus | Outcome |
|---|---|---|
| 1 | Garbage collectors — PIDs absorbing multiple people's names | 45 found, all cleared (99,000+ positions) |
| 2 | Taqrib linkage — NRS entry describes wrong person | 216 issues, 148 mismatches, 75+ corrected |
| 3 | Companion integrity — false companions / missing flags | 6,982 total companions, clean |
| 4 | Death years — outside tabaqah range | 103 outliers, 357+ corrected |
| 5 | Assessment extraction — contradictory assessments | 164 tier mismatches reviewed |
| 6 | Duplicate PIDs — same person with multiple IDs | 44 pairs investigated — all confirmed different people |
The Eve-Theology f5/reasoner multi-model pipeline identifies inconsistencies. The classical scholarly source provides the correction.
Every computational correction cites a specific:
- Taqrib al-Tahdhib entry number
- Tahdhib al-Tahdhib volume and page reference
- Named classical scholar's documented assessment
The model is the tool that finds the error. The book is the source of the correction.
- NRS death-year quality: Chronological pair analysis revealed cases where estimated death years (derived from tabaqah rather than documented sources) produce implausible teacher-student timelines. Ongoing correction via tabaqah-range validation.
- Maqtu classification edge cases: Some entries ending at Companions may be better classified as mawquf. Under review.
- Taqwiyah review cases: A small number of hadith flagged by fabrication-detection may still receive elevated grades through taqwiyah from supporting chains. Each case requires individual scholarly judgment.