بسم الله الرحمن الرحيم

الإفصاح عن المصدر

Provenance Disclosure

The authoritative AI-involvement and known-limits disclosure for the IPSC corpus, sourced verbatim from corpus-v3/manifest.json_provenanceDisclosure. Published as a permanent page so it can be cited alongside any external claim about what IPSC is and is not.

IPSC v3.4 deployed · v3.26 staged · manifest-sourced 2026-05-04
الموجز Summary

This corpus is an applied-AI / data-engineering work product grounded in classical rijāl literature. It is NOT classical mujtahid scholarship.

Canonical external citation: "AI-assisted hadith corpus, structurally validated against documented teacher-student relationships, with an open scholar-collaboration program."

المنهجية الحاسوبية Computational Provenance
Build method
Automated Node.js streaming pipelines
Ingestion source
OpenITI digital editions of classical Arabic texts (2025-1-9 release)
Models used
  • MindHYVE Eve-Theology f5/reasoner
  • Anthropic Claude Opus 4.6
  • Anthropic Claude Haiku 4.5
  • OpenAI text-embedding-3-large (3,072-dim vectors)
Roles AI was used for
  • PID disambiguation tiebreaking when 2+ structural candidates remain (v3.9 multi-stage validator and v3.26 LLM tiebreaker)
  • Matn re-extraction from arabicText for parser-failed records
  • NAQD-3 theological subtlety / cross-document contradiction / linguistic register analysis on flagged subsets
  • Phase 2C temporal-issue triage (skip vs chain-break vs pid-fix classification)
  • Glossary v1 term synthesis with mandatory classical-citation requirement
Roles AI was NOT used for
  • Per-narrator scholarly verification at scale
  • Manuscript collation (al-muqābalah) against original witnesses
  • Issuing fatwa or replacing qualified muhaddith judgment
Human role scope
Architecture, threshold-setting, regression policy, sample-audit at decision-architecture level. Per-narrator and per-chain scholarly verification was NOT performed.
Regression method
33-test automated suite anchored to canonical hadith (Bukhārī #1, #2, #8, #15, #7280; Aḥmad #22007); 7-test cross-field consistency check; NAQD-1/2/3 framework runs.
إعادة المعايرة المنهجية Methodology recalibration (v3.11)
Trigger
Recurring external-examiner cycle (most recent 2026-05-02) identified that earlier external-facing framing overstated the human-scholarly role versus the applied-AI-with-classical-grounding posture the pipeline actually represents.
Outcome
v3.11 framing recalibration tightened external-facing language to: "AI-assisted, structurally validated against documented teacher-student relationships, scholar-in-the-loop pending on residue queues." This wording is now the canonical external citation for IPSC.
Standing discipline
Standing rules: AI-involvement disclosure travels with every record via the _provenanceDisclosure block; LLM-assisted decisions carry a verifiedBy: "llm-..." provenance stamp; stylistic and historical-context documents are explicitly marked as such.

The wider quality-and-audit cycle is documented at /honesty.

القيود المعروفة Known Limits
Per-narrator scholar verification
Per-narrator scholarly verification at scale is the scope of the open scholar-collaboration program (~98K records currently in the program). IPSC stamps a verifiedBy provenance on every grading decision so consumers know whether a verdict is deterministic, LLM-assisted, or scholar-confirmed.
Manuscript collation
NOT performed. Source is OpenITI digital editions, not original witnesses.
Tahdhīb graph coverage
386,520 explicit teacher-student edges; al-Mizzī's "wa-jamāʿah" / "wa-khalq kathīr" expansions are NOT enumerated. Lower bound on actual relationships.
No-candidate chain positions
274,164 chain positions (74% of v3.9 task scan) had no NRS candidate above similarity 0.5 — kunyah-only refs, generic names, regional variants outside the index.
Mudallis registry
107 narrators tracked; Ibn Ḥajar's full Ṭabaqāt al-Mudallisīn has ~152. ~30% gap.
Ikhtilāṭ onset years
26% have onset year filled; 74% have onsetYear: null (deterioration period unknown for those records).
Kanz al-ʿUmmāl
v3.12 recovered citation symbols from OpenITI source — 14,603 of 46,650 records (31.3%) now have _citationAttribution + 1,034 grade inferences. ~28K Kanz records still without attribution. Note: 'attributed' ≠ 'graded' — computedGrade still requires chain analysis. ~89% of Kanz still computedGrade='not-graded'.
Bukhārī sahih+hasan rate
95.9% computed grade (down from 97.3% in earlier releases — that drop reflects honest de-inflation of stale supportingChain counts; the prior number was inflated).
كيفية الاستخدام How to use this page
  • Cite alongside any IPSC claim. When making a public statement about what IPSC accomplishes (corpus size, convergence rates, NAQD findings, etc.), pair the claim with a link or quotation from this disclosure.
  • Surface in API responses. Customers integrating the corpus must surface the disclosure summary in any externally-facing claim about provenance.
  • Read in context. Pair this with /honesty (audit practice + scholar collaboration) and /changelog (v3.0 → v3.26 release history) for a full picture.