الموجز Summary
This corpus is an applied-AI / data-engineering work product grounded in classical rijāl literature. It is NOT classical mujtahid scholarship.
Canonical external citation: "AI-assisted hadith corpus, structurally validated against documented teacher-student relationships, with an open scholar-collaboration program."
المنهجية الحاسوبية Computational Provenance
- Build method
- Automated Node.js streaming pipelines
- Ingestion source
- OpenITI digital editions of classical Arabic texts (2025-1-9 release)
- Models used
-
- MindHYVE Eve-Theology f5/reasoner
- Anthropic Claude Opus 4.6
- Anthropic Claude Haiku 4.5
- OpenAI text-embedding-3-large (3,072-dim vectors)
- Roles AI was used for
-
- PID disambiguation tiebreaking when 2+ structural candidates remain (v3.9 multi-stage validator and v3.26 LLM tiebreaker)
- Matn re-extraction from arabicText for parser-failed records
- NAQD-3 theological subtlety / cross-document contradiction / linguistic register analysis on flagged subsets
- Phase 2C temporal-issue triage (skip vs chain-break vs pid-fix classification)
- Glossary v1 term synthesis with mandatory classical-citation requirement
- Roles AI was NOT used for
-
- Per-narrator scholarly verification at scale
- Manuscript collation (al-muqābalah) against original witnesses
- Issuing fatwa or replacing qualified muhaddith judgment
- Human role scope
- Architecture, threshold-setting, regression policy, sample-audit at decision-architecture level. Per-narrator and per-chain scholarly verification was NOT performed.
- Regression method
- 33-test automated suite anchored to canonical hadith (Bukhārī #1, #2, #8, #15, #7280; Aḥmad #22007); 7-test cross-field consistency check; NAQD-1/2/3 framework runs.
إعادة المعايرة المنهجية Methodology recalibration (v3.11)
- Trigger
- Recurring external-examiner cycle (most recent 2026-05-02) identified that earlier external-facing framing overstated the human-scholarly role versus the applied-AI-with-classical-grounding posture the pipeline actually represents.
- Outcome
- v3.11 framing recalibration tightened external-facing language to: "AI-assisted, structurally validated against documented teacher-student relationships, scholar-in-the-loop pending on residue queues." This wording is now the canonical external citation for IPSC.
- Standing discipline
- Standing rules: AI-involvement disclosure travels with every record via the _provenanceDisclosure block; LLM-assisted decisions carry a verifiedBy: "llm-..." provenance stamp; stylistic and historical-context documents are explicitly marked as such.
The wider quality-and-audit cycle is documented at /honesty.
القيود المعروفة Known Limits
- Per-narrator scholar verification
- Per-narrator scholarly verification at scale is the scope of the open scholar-collaboration program (~98K records currently in the program). IPSC stamps a verifiedBy provenance on every grading decision so consumers know whether a verdict is deterministic, LLM-assisted, or scholar-confirmed.
- Manuscript collation
- NOT performed. Source is OpenITI digital editions, not original witnesses.
- Tahdhīb graph coverage
- 386,520 explicit teacher-student edges; al-Mizzī's "wa-jamāʿah" / "wa-khalq kathīr" expansions are NOT enumerated. Lower bound on actual relationships.
- No-candidate chain positions
- 274,164 chain positions (74% of v3.9 task scan) had no NRS candidate above similarity 0.5 — kunyah-only refs, generic names, regional variants outside the index.
- Mudallis registry
- 107 narrators tracked; Ibn Ḥajar's full Ṭabaqāt al-Mudallisīn has ~152. ~30% gap.
- Ikhtilāṭ onset years
- 26% have onset year filled; 74% have onsetYear: null (deterioration period unknown for those records).
- Kanz al-ʿUmmāl
- v3.12 recovered citation symbols from OpenITI source — 14,603 of 46,650 records (31.3%) now have _citationAttribution + 1,034 grade inferences. ~28K Kanz records still without attribution. Note: 'attributed' ≠ 'graded' — computedGrade still requires chain analysis. ~89% of Kanz still computedGrade='not-graded'.
- Bukhārī sahih+hasan rate
- 95.9% computed grade (down from 97.3% in earlier releases — that drop reflects honest de-inflation of stale supportingChain counts; the prior number was inflated).
كيفية الاستخدام How to use this page
- Cite alongside any IPSC claim. When making a public statement about what IPSC accomplishes (corpus size, convergence rates, NAQD findings, etc.), pair the claim with a link or quotation from this disclosure.
- Surface in API responses. Customers integrating the corpus must surface the disclosure summary in any externally-facing claim about provenance.
- Read in context. Pair this with /honesty (audit practice + scholar collaboration) and /changelog (v3.0 → v3.26 release history) for a full picture.