بسم الله الرحمن الرحيم

سجل الإصدارات

Changelog

Every release of the Islamic Primary Source Corpus from v3.0 (March 2026) to v3.176.1 (June 2026). Each entry shows what shipped, what changed, and whether it is currently serving on Azure AI Search.

Deployed Currently serving on Azure AI Search Staged Tagged locally, awaiting Azure v4 cutover Audit Methodology-recalibration cycle
  1. v3.176.1 Deployed (Azure)

    Evidence Map + full matn-core re-embedding

    Corpus reaches 617,931 graded records across 128 works (480,350 graded; weak-rate 45.47%). Tier-3 Evidence Map assembled — statement-level evidence aggregation across all collections. Full matn-core re-embedding (588,737 records, 3,072-dim). Multi-narrator (qālā/qālū) chain resolution. Narrator denominator corrected to 68,299 named / 28,586 reliability-assessed. Bukhārī sahih+hasan 95.8%.

    • Tier-3 Evidence Map: every chain for a statement assembled into one cross-collection view (surfacing tawātur)
    • Full matn-core re-embedding: 588,737 records re-embedded at 3,072-dim
    • Multi-narrator (qālā / qālū) chain resolution
    • Narrator denominator corrected to 68,299 named / 28,586 reliability-assessed
    • 5-layer integrity infrastructure + NAQD audits; 40/40 regression + 7/7 cross-field gates green
  2. v3.172–v3.175 Deployed (Azure)

    Tier-3 Evidence Map foundations

    Multi-narrator (qālā/qālū) chain resolution and a full matn-core re-embedding of 588,737 records — the data and embedding layers that the Evidence Map is built on.

    • Multi-narrator (qālā / qālū) chain resolution
    • Full matn-core re-embedding of 588,737 records
    • Data + embedding layers underpinning the Tier-3 Evidence Map
  3. v3.119–v3.170 Deployed (Azure)

    Data-quality hardening & narrator deduplication

    Systematic narrator-identity cleanup: write-time NRS sanity validation, batch-job silent-failure tracking, and a multi-tier canonical-anchor deduplication workstream consolidating duplicate narrator records across the rijāl graph.

    • Write-time NRS sanity validation
    • Batch-job silent-failure tracking
    • Multi-tier canonical-anchor narrator deduplication across the rijāl graph
  4. v3.38–v3.67 Deployed (Azure)

    Corpus depth: biographical & tārīkh ingests

    Ingested the major biographical, tārīkh and ḍuʿafāʾ corpora — Ibn ʿAsākir’s Tārīkh Dimashq, al-Khaṭīb’s Tārīkh Baghdād, Ibn Saʿd’s Ṭabaqāt, al-Ḥākim’s Mustadrak, al-Dāraquṭnī, Ibn ʿAdī’s al-Kāmil, al-ʿUqaylī’s Ḍuʿafāʾ, and the major Mawḍūʿāt (fabrication) catalogs — growing the corpus to ~128 works and 617,931 records. A dual-extraction pass recovered 13,213 embedded classical verdicts that single-pass ingestion had missed.

    • Ibn ʿAsākir Tārīkh Dimashq, al-Khaṭīb Tārīkh Baghdād, Ibn Saʿd Ṭabaqāt
    • al-Ḥākim Mustadrak, al-Dāraquṭnī, Ibn ʿAdī al-Kāmil, al-ʿUqaylī Ḍuʿafāʾ + major Mawḍūʿāt catalogs
    • Corpus grows to ~128 works / 617,931 records
    • Dual-extraction pass recovered 13,213 embedded classical verdicts missed by single-pass ingestion
  5. v3.30 Deployed (Azure)

    Citation grading tiers + integrity infrastructure

    Introduced three tiers of citation rigor (tier-1: 32 checks available now; tier-2/3 for deeper aḥkām-level defensibility) and a five-layer integrity infrastructure with twelve mechanical bug-prevention rules — Arabic-regex safety, LLM bulk-apply preflight gates, a provenance-flag registry, write-time NRS sanity validation, and cascade-coverage lints.

    • Three tiers of citation rigor (tier-1: 32 checks live; tier-2/3 for aḥkām-level defensibility)
    • Five-layer integrity infrastructure with twelve mechanical bug-prevention rules
    • Arabic-regex safety, LLM bulk-apply preflight gates, provenance-flag registry
    • Write-time NRS sanity validation + cascade-coverage lints
  6. v3.26 Staged (local)

    LLM PID Tiebreaker + Ship-Blocker Remediation

    Eve-Theology f5/reasoner batched 25-narrator prompts assigned 12,660 PIDs across the 6 v3.19+ ingest collections (47.4% → 69.5% PID resolution). Source-collection caps applied to 2,168 records in fabrication anthologies. NRS-Taqrīb reconciliation: 6,034 confirmed, 1,345 mismatches flagged.

    • 12,660 LLM-assigned PIDs (each carrying _pidTiebreakerVerdict provenance)
    • 2,168 records capped via _naqd3Override (Mawduʿāt 424, Tanzīh 1,238, Fawāʾid 506)
    • 7,364 new matn embeddings; cluster merge to 39.2% coverage
    • All hard gates green: 33/33 regression + 7/7 cross-field verify
    • Cumulative LLM cost since v3.5: ~$420
  7. v3.13–v3.25 Staged (local)

    Citation Cascade + 6 Phase 2-A Collection Ingests

    v3.13 to v3.18 generalized citation grade promotion beyond Sahihayn-only. v3.19 introduced the OpenITI to IPSC ingestion pipeline (token-Jaccard + containment narrator matcher with multi-tier confidence). v3.21 to v3.25 ingested 6 new primary collections (+8,241 records). v3.25.1 fixed 5 pipeline robustness bugs.

    • Ibn al-Sunnī ʿAmal al-Yawm wa al-Laylah (770 records)
    • Hannād Zuhd (1,429 records)
    • al-Quḍāʿī Shihāb (1,497 records)
    • al-ʿUqaylī Duʿafāʾ (2,103 records)
    • Ibn al-Mubārak Jihād (268 records)
    • Ibn ʿAdī al-Kāmil (2,174 records)
  8. v3.12 Staged (local)

    Source-Data Items + Fresh Embeddings

    After verifying OpenITI 2025-1-9 source files, refreshed all 448,237 matn embeddings (previous embeddings were on broken-matn data). Kanz al-ʿUmmāl re-ingestion recovered 14,603 newly-attributed records via the OpenITI symbol parser.

    • NAQD-3 fresh-embedding re-run: 1,851 findings (300 critical, 638 high)
    • Previous near-zero N3-CC contradiction count was an artifact of broken embeddings
    • Mudallis registry 105 → 107
    • 21,264 semantic clusters at threshold 0.92
  9. v3.11 Audit response

    Methodology recalibration cycle

    External-examiner cycle drove a framing recalibration on one stylistic document. v3.11 added the _provenanceDisclosure manifest block so AI-involvement scope travels with every record, established the standing provenance-discipline rules, and closed NAQD-1 V1 (112 → 0).

    • _provenanceDisclosure block added to corpus-v3/manifest.json
    • Standing provenance-discipline rules established
    • External framing recalibrated to match pipeline scope
    • 11,380 ikhtilāt onset years backfilled
    • 1,338 NRS undefined-tier entries explicitly flagged
  10. v3.5–v3.10 Staged (local)

    Corpus Integrity Push

    Six release cycles surfacing and remediating multiple defect classes. v3.5 audit-driven corrections (256K stale temporal flags removed, 127K real issues found). v3.7 first-attempt PID validator failed regression (12 T10+ violations) — lesson incorporated into v3.9 multi-stage architecture (24,485 safe PID swaps). v3.8 critical matn recovery from broken-regex corruption that survived two release cycles because regression did not sample matn content.

    • v3.8 lesson: regression tests must include content sampling, not just structural checks
    • v3.9 multi-stage validator: 5 structural pre-filters + LLM tiebreaker = 24,485 swaps
    • v3.10 anonymous-narrator detector reclassified 3,787 chains as munqati
    • 8 git tags v3.5 → v3.12 with 33/33 regression maintained throughout
  11. v3.4 Deployed (Azure)

    First Azure AI Search Deployment + Glossary v1

    First IPSC corpus deployed to Azure AI Search (10 tier indexes; +3 with Glossary v1 = 13 indexes / 1,569,379 docs total). 8 vector indexes plus glossary termEmbedding. 7,652 narrators with classical scholarly quotations. Glossary v1 ship: 730 canonical hadith-science terms across 3 tier indexes.

    • Tier-separated indexes (Public / Research / Scholar) with atomic version cutover
    • 11 vector search indexes total (matn x 3, narrator x 3, defect x 2, term x 3)
    • 7,652 narrators with Taqrīb + Tahdhīb al-Kamāl classical quotations
    • Glossary v1: 730 canonical hadith-science terms with multi-source classical citations
  12. v3.2 Staged (local)

    Matn-Criticism Pipeline (Phases A–G)

    Two-pass architecture: Pass 1 deterministic string-op scan (420,110 records, 11,441 flagged). Pass 2 multi-tier Eve-Theology™ Fusion v5 reasoning: lightweight triage → Eve-Theology™ F5/reasoner detail → frontier-tier scholar-grade analysis on top 5,000 concerns. Reference databases: 304 Quran rulings, 271 anachronisms, 335 fabrication patterns, 1,301 mutawatir canon entries, prophetic linguistic baseline (39 words avg, sajʿ density 0.033).

    • 815 chain-matn conflicts identified
    • 808 scholar-ready defense documents generated
    • Sahihayn matn-criticism cleanliness: 99.4%
    • Sahihayn chain-grade convergence: 97.3% (later corrected to 95.9% in v3.8)
  13. v3.1 Staged (local)

    Cross-Reference Layers

    Matn criticism pipeline analyzed 437,740 matns. 87,844 hadith cross-linked to al-Dāraquṭnī’s ilal works. Teacher-student graph constructed: 7,973 nodes, 889,913 directed edges (later corrected to 386,520 explicit edges in the v3.11 methodology recalibration). 8,258 hawala chain-switch records split. 1,279,676 Quran term cross-references + 6,447 direct quotations.

    • 437,740 matns analyzed by criticism pipeline
    • 87,844 hadith linked to al-Dāraquṭnī’s ilal works
    • 8,258 hawala chain-switch markers identified and split
    • 45 garbage-collector PIDs cleared (99,477 positions cleaned)
  14. v3.0 Staged (local)

    Initial corpus release

    449,415 hadith from 86 classical works. 27,099 NRS entries with 12-tier reliability assessments anchored to Ibn Ḥajar Taqrīb al-Tahdhīb. Five-condition Ibn al-Salāh grading engine. Person ID resolution pipeline (6 iterations). Structured isnad parsing (1,820,033 chain positions). Bag-of-words matn clustering (54,270 clusters).

    • 449,415 hadith / 27,099 NRS entries / 86 classical works
    • Five-condition framework: ittisāl al-sanad, ʿadālah, dabt, no shudhūdh, no ʿillah
    • 12-tier narrator classification (Sahābi → Kadhdhāb)
    • 6,983 Companions identified