Name: Islamic Primary Source Corpus (IPSC) v3.176.1
License: https://www.theogrid.ai/terms

v3.176.1 Deployed (Azure) 2026-06

Evidence Map + full matn-core re-embedding

Corpus reaches 617,931 graded records across 128 works (480,350 graded; weak-rate 45.47%). Tier-3 Evidence Map assembled — statement-level evidence aggregation across all collections. Full matn-core re-embedding (588,737 records, 3,072-dim). Multi-narrator (qālā/qālū) chain resolution. Narrator denominator corrected to 68,299 named / 28,586 reliability-assessed. Bukhārī sahih+hasan 95.8%.
- Tier-3 Evidence Map: every chain for a statement assembled into one cross-collection view (surfacing tawātur)
- Full matn-core re-embedding: 588,737 records re-embedded at 3,072-dim
- Multi-narrator (qālā / qālū) chain resolution
- Narrator denominator corrected to 68,299 named / 28,586 reliability-assessed
- 5-layer integrity infrastructure + NAQD audits; 40/40 regression + 7/7 cross-field gates green
v3.172–v3.175 Deployed (Azure) 2026-06

Tier-3 Evidence Map foundations

Multi-narrator (qālā/qālū) chain resolution and a full matn-core re-embedding of 588,737 records — the data and embedding layers that the Evidence Map is built on.
- Multi-narrator (qālā / qālū) chain resolution
- Full matn-core re-embedding of 588,737 records
- Data + embedding layers underpinning the Tier-3 Evidence Map
v3.119–v3.170 Deployed (Azure) 2026-06

Data-quality hardening & narrator deduplication

Systematic narrator-identity cleanup: write-time NRS sanity validation, batch-job silent-failure tracking, and a multi-tier canonical-anchor deduplication workstream consolidating duplicate narrator records across the rijāl graph.
- Write-time NRS sanity validation
- Batch-job silent-failure tracking
- Multi-tier canonical-anchor narrator deduplication across the rijāl graph
v3.38–v3.67 Deployed (Azure) 2026-05

Corpus depth: biographical & tārīkh ingests

Ingested the major biographical, tārīkh and ḍuʿafāʾ corpora — Ibn ʿAsākir’s Tārīkh Dimashq, al-Khaṭīb’s Tārīkh Baghdād, Ibn Saʿd’s Ṭabaqāt, al-Ḥākim’s Mustadrak, al-Dāraquṭnī, Ibn ʿAdī’s al-Kāmil, al-ʿUqaylī’s Ḍuʿafāʾ, and the major Mawḍūʿāt (fabrication) catalogs — growing the corpus to ~128 works and 617,931 records. A dual-extraction pass recovered 13,213 embedded classical verdicts that single-pass ingestion had missed.
- Ibn ʿAsākir Tārīkh Dimashq, al-Khaṭīb Tārīkh Baghdād, Ibn Saʿd Ṭabaqāt
- al-Ḥākim Mustadrak, al-Dāraquṭnī, Ibn ʿAdī al-Kāmil, al-ʿUqaylī Ḍuʿafāʾ + major Mawḍūʿāt catalogs
- Corpus grows to ~128 works / 617,931 records
- Dual-extraction pass recovered 13,213 embedded classical verdicts missed by single-pass ingestion
v3.30 Deployed (Azure) 2026-05

Citation grading tiers + integrity infrastructure

Introduced three tiers of citation rigor (tier-1: 32 checks available now; tier-2/3 for deeper aḥkām-level defensibility) and a five-layer integrity infrastructure with twelve mechanical bug-prevention rules — Arabic-regex safety, LLM bulk-apply preflight gates, a provenance-flag registry, write-time NRS sanity validation, and cascade-coverage lints.
- Three tiers of citation rigor (tier-1: 32 checks live; tier-2/3 for aḥkām-level defensibility)
- Five-layer integrity infrastructure with twelve mechanical bug-prevention rules
- Arabic-regex safety, LLM bulk-apply preflight gates, provenance-flag registry
- Write-time NRS sanity validation + cascade-coverage lints
v3.26 Staged (local) 2026-05-04

LLM PID Tiebreaker + Ship-Blocker Remediation

Eve-Theology f5/reasoner batched 25-narrator prompts assigned 12,660 PIDs across the 6 v3.19+ ingest collections (47.4% → 69.5% PID resolution). Source-collection caps applied to 2,168 records in fabrication anthologies. NRS-Taqrīb reconciliation: 6,034 confirmed, 1,345 mismatches flagged.
- 12,660 LLM-assigned PIDs (each carrying _pidTiebreakerVerdict provenance)
- 2,168 records capped via _naqd3Override (Mawduʿāt 424, Tanzīh 1,238, Fawāʾid 506)
- 7,364 new matn embeddings; cluster merge to 39.2% coverage
- All hard gates green: 33/33 regression + 7/7 cross-field verify
- Cumulative LLM cost since v3.5: ~$420
v3.13–v3.25 Staged (local) 2026-05-02 to 2026-05-03

Citation Cascade + 6 Phase 2-A Collection Ingests

v3.13 to v3.18 generalized citation grade promotion beyond Sahihayn-only. v3.19 introduced the OpenITI to IPSC ingestion pipeline (token-Jaccard + containment narrator matcher with multi-tier confidence). v3.21 to v3.25 ingested 6 new primary collections (+8,241 records). v3.25.1 fixed 5 pipeline robustness bugs.
- Ibn al-Sunnī ʿAmal al-Yawm wa al-Laylah (770 records)
- Hannād Zuhd (1,429 records)
- al-Quḍāʿī Shihāb (1,497 records)
- al-ʿUqaylī Duʿafāʾ (2,103 records)
- Ibn al-Mubārak Jihād (268 records)
- Ibn ʿAdī al-Kāmil (2,174 records)
v3.12 Staged (local) 2026-05-02

Source-Data Items + Fresh Embeddings

After verifying OpenITI 2025-1-9 source files, refreshed all 448,237 matn embeddings (previous embeddings were on broken-matn data). Kanz al-ʿUmmāl re-ingestion recovered 14,603 newly-attributed records via the OpenITI symbol parser.
- NAQD-3 fresh-embedding re-run: 1,851 findings (300 critical, 638 high)
- Previous near-zero N3-CC contradiction count was an artifact of broken embeddings
- Mudallis registry 105 → 107
- 21,264 semantic clusters at threshold 0.92
v3.11 Audit response 2026-05-02

Methodology recalibration cycle

External-examiner cycle drove a framing recalibration on one stylistic document. v3.11 added the _provenanceDisclosure manifest block so AI-involvement scope travels with every record, established the standing provenance-discipline rules, and closed NAQD-1 V1 (112 → 0).
- _provenanceDisclosure block added to corpus-v3/manifest.json
- Standing provenance-discipline rules established
- External framing recalibrated to match pipeline scope
- 11,380 ikhtilāt onset years backfilled
- 1,338 NRS undefined-tier entries explicitly flagged
v3.5–v3.10 Staged (local) 2026-04-26 to 2026-05-01

Corpus Integrity Push

Six release cycles surfacing and remediating multiple defect classes. v3.5 audit-driven corrections (256K stale temporal flags removed, 127K real issues found). v3.7 first-attempt PID validator failed regression (12 T10+ violations) — lesson incorporated into v3.9 multi-stage architecture (24,485 safe PID swaps). v3.8 critical matn recovery from broken-regex corruption that survived two release cycles because regression did not sample matn content.
- v3.8 lesson: regression tests must include content sampling, not just structural checks
- v3.9 multi-stage validator: 5 structural pre-filters + LLM tiebreaker = 24,485 swaps
- v3.10 anonymous-narrator detector reclassified 3,787 chains as munqati
- 8 git tags v3.5 → v3.12 with 33/33 regression maintained throughout
v3.4 Deployed (Azure) 2026-04-25

First Azure AI Search Deployment + Glossary v1

First IPSC corpus deployed to Azure AI Search (10 tier indexes; +3 with Glossary v1 = 13 indexes / 1,569,379 docs total). 8 vector indexes plus glossary termEmbedding. 7,652 narrators with classical scholarly quotations. Glossary v1 ship: 730 canonical hadith-science terms across 3 tier indexes.
- Tier-separated indexes (Public / Research / Scholar) with atomic version cutover
- 11 vector search indexes total (matn x 3, narrator x 3, defect x 2, term x 3)
- 7,652 narrators with Taqrīb + Tahdhīb al-Kamāl classical quotations
- Glossary v1: 730 canonical hadith-science terms with multi-source classical citations
v3.2 Staged (local) 2026-04-22

Matn-Criticism Pipeline (Phases A–G)

Two-pass architecture: Pass 1 deterministic string-op scan (420,110 records, 11,441 flagged). Pass 2 multi-tier Eve-Theology™ Fusion v5 reasoning: lightweight triage → Eve-Theology™ F5/reasoner detail → frontier-tier scholar-grade analysis on top 5,000 concerns. Reference databases: 304 Quran rulings, 271 anachronisms, 335 fabrication patterns, 1,301 mutawatir canon entries, prophetic linguistic baseline (39 words avg, sajʿ density 0.033).
- 815 chain-matn conflicts identified
- 808 scholar-ready defense documents generated
- Sahihayn matn-criticism cleanliness: 99.4%
- Sahihayn chain-grade convergence: 97.3% (later corrected to 95.9% in v3.8)
v3.1 Staged (local) 2026-04 (early)

Cross-Reference Layers

Matn criticism pipeline analyzed 437,740 matns. 87,844 hadith cross-linked to al-Dāraquṭnī’s ilal works. Teacher-student graph constructed: 7,973 nodes, 889,913 directed edges (later corrected to 386,520 explicit edges in the v3.11 methodology recalibration). 8,258 hawala chain-switch records split. 1,279,676 Quran term cross-references + 6,447 direct quotations.
- 437,740 matns analyzed by criticism pipeline
- 87,844 hadith linked to al-Dāraquṭnī’s ilal works
- 8,258 hawala chain-switch markers identified and split
- 45 garbage-collector PIDs cleared (99,477 positions cleaned)
v3.0 Staged (local) 2026-03-20

Initial corpus release

449,415 hadith from 86 classical works. 27,099 NRS entries with 12-tier reliability assessments anchored to Ibn Ḥajar Taqrīb al-Tahdhīb. Five-condition Ibn al-Salāh grading engine. Person ID resolution pipeline (6 iterations). Structured isnad parsing (1,820,033 chain positions). Bag-of-words matn clustering (54,270 clusters).
- 449,415 hadith / 27,099 NRS entries / 86 classical works
- Five-condition framework: ittisāl al-sanad, ʿadālah, dabt, no shudhūdh, no ʿillah
- 12-tier narrator classification (Sahābi → Kadhdhāb)
- 6,983 Companions identified

Changelog

Evidence Map + full matn-core re-embedding

Tier-3 Evidence Map foundations

Data-quality hardening & narrator deduplication

Corpus depth: biographical & tārīkh ingests

Citation grading tiers + integrity infrastructure

LLM PID Tiebreaker + Ship-Blocker Remediation

Citation Cascade + 6 Phase 2-A Collection Ingests

Source-Data Items + Fresh Embeddings

Methodology recalibration cycle

Corpus Integrity Push

First Azure AI Search Deployment + Glossary v1

Matn-Criticism Pipeline (Phases A–G)

Cross-Reference Layers

Initial corpus release

Source documents