الموجز Summary

This corpus is an applied-AI / data-engineering work product grounded in classical rijāl literature. It is NOT classical mujtahid scholarship.

Canonical external citation: "AI-assisted hadith corpus, structurally validated against documented teacher-student relationships, with an open scholar-collaboration program."

المنهجية الحاسوبية Computational Provenance

Build method

Automated Node.js streaming pipelines

Ingestion source

OpenITI digital editions of classical Arabic texts (2025-1-9 release)

Models used

MindHYVE Eve-Theology™ F5/reasoner — Microsoft Phi-4-derived, LoRA fine-tuned on Eve-Genesis™ (Uṣūl Edition)
Microsoft Phi-3 — classifier
Frontier model — deep synthesis and judging
Frontier model — alternative reasoning paths
Frontier model — long-context analysis across the full classical compendium
Azure embedding model — 3,072-dim vectors for semantic search infrastructure

Roles AI was used for

PID disambiguation tiebreaking when 2+ structural candidates remain (v3.9 multi-stage validator and v3.26 LLM tiebreaker)
Matn re-extraction from arabicText for parser-failed records
NAQD-3 theological subtlety / cross-document contradiction / linguistic register analysis on flagged subsets
Phase 2C temporal-issue triage (skip vs chain-break vs pid-fix classification)
Glossary v1 term synthesis with mandatory classical-citation requirement

Roles AI was NOT used for

Per-narrator scholarly verification at scale
Manuscript collation (al-muqābalah) against original witnesses
Issuing fatwa or replacing qualified muhaddith judgment

Human role scope

Architecture, threshold-setting, regression policy, sample-audit at decision-architecture level. Per-narrator and per-chain scholarly verification was NOT performed.

Regression method

33-test automated suite anchored to canonical hadith (Bukhārī #1, #2, #8, #15, #7280; Aḥmad #22007); 7-test cross-field consistency check; NAQD-1/2/3 framework runs.

إعادة المعايرة المنهجية Methodology recalibration (v3.11)

Trigger: Recurring external-examiner cycle (most recent 2026-05-02) identified that earlier external-facing framing overstated the human-scholarly role versus the applied-AI-with-classical-grounding posture the pipeline actually represents.
Outcome: v3.11 framing recalibration tightened external-facing language to: "AI-assisted, structurally validated against documented teacher-student relationships, scholar-in-the-loop pending on residue queues." This wording is now the canonical external citation for IPSC.
Standing discipline: Standing rules: AI-involvement disclosure travels with every record via the _provenanceDisclosure block; LLM-assisted decisions carry a verifiedBy: "llm-..." provenance stamp; stylistic and historical-context documents are explicitly marked as such.

The wider quality-and-audit cycle is documented at /honesty.

القيود المعروفة Known Limits

Per-narrator scholar verification: Per-narrator scholarly verification at scale is the scope of the open scholar-collaboration program (~98K records currently in the program). IPSC stamps a verifiedBy provenance on every grading decision so consumers know whether a verdict is deterministic, LLM-assisted, or scholar-confirmed.
Manuscript collation: NOT performed. Source is OpenITI digital editions, not original witnesses.
Tahdhīb graph coverage: 386,520 explicit teacher-student edges; al-Mizzī's "wa-jamāʿah" / "wa-khalq kathīr" expansions are NOT enumerated. Lower bound on actual relationships.
No-candidate chain positions: 274,164 chain positions (74% of v3.9 task scan) had no NRS candidate above similarity 0.5 — kunyah-only refs, generic names, regional variants outside the index.
Mudallis registry: 107 narrators tracked; Ibn Ḥajar's full Ṭabaqāt al-Mudallisīn has ~152. ~30% gap.
Ikhtilāṭ onset years: 26% have onset year filled; 74% have onsetYear: null (deterioration period unknown for those records).
Kanz al-ʿUmmāl: v3.12 recovered citation symbols from OpenITI source — 14,603 of 46,650 records (31.3%) now have _citationAttribution + 1,034 grade inferences. ~28K Kanz records still without attribution. Note: 'attributed' ≠ 'graded' — computedGrade still requires chain analysis. ~89% of Kanz still computedGrade='not-graded'.
Bukhārī sahih+hasan rate: 95.8% computed grade (down from 97.3% in earlier releases — that drop reflects honest de-inflation of stale supportingChain counts; the prior number was inflated).

كيفية الاستخدام How to use this page

Cite alongside any IPSC claim. When making a public statement about what IPSC accomplishes (corpus size, convergence rates, NAQD findings, etc.), pair the claim with a link or quotation from this disclosure.
Surface in API responses. Customers integrating the corpus must surface the disclosure summary in any externally-facing claim about provenance.
Read in context. Pair this with /honesty (audit practice + scholar collaboration) and /changelog (v3.0 → v3.176.1 release history) for a full picture.

Provenance Disclosure