بسم الله الرحمن الرحيم

سجل الإصدارات

Changelog

Every release of the Islamic Primary Source Corpus from v3.0 (March 2026) to v3.26 (May 2026). Each entry shows what shipped, what changed, and whether it is currently serving on Azure AI Search.

Deployed Currently serving on Azure AI Search Staged Tagged locally, awaiting Azure v4 cutover Audit Methodology-recalibration cycle
  1. v3.26 Staged (local)

    LLM PID Tiebreaker + Ship-Blocker Remediation

    Eve-Theology f5/reasoner batched 25-narrator prompts assigned 12,660 PIDs across the 6 v3.19+ ingest collections (47.4% → 69.5% PID resolution). Source-collection caps applied to 2,168 records in fabrication anthologies. NRS-Taqrīb reconciliation: 6,034 confirmed, 1,345 mismatches flagged.

    • 12,660 LLM-assigned PIDs (each carrying _pidTiebreakerVerdict provenance)
    • 2,168 records capped via _naqd3Override (Mawduʿāt 424, Tanzīh 1,238, Fawāʾid 506)
    • 7,364 new matn embeddings; cluster merge to 39.2% coverage
    • All hard gates green: 33/33 regression + 7/7 cross-field verify
    • Cumulative LLM cost since v3.5: ~$420
  2. v3.13–v3.25 Staged (local)

    Citation Cascade + 6 Phase 2-A Collection Ingests

    v3.13 to v3.18 generalized citation grade promotion beyond Sahihayn-only. v3.19 introduced the OpenITI to IPSC ingestion pipeline (token-Jaccard + containment narrator matcher with multi-tier confidence). v3.21 to v3.25 ingested 6 new primary collections (+8,241 records). v3.25.1 fixed 5 pipeline robustness bugs.

    • Ibn al-Sunnī ʿAmal al-Yawm wa al-Laylah (770 records)
    • Hannād Zuhd (1,429 records)
    • al-Quḍāʿī Shihāb (1,497 records)
    • al-ʿUqaylī Duʿafāʾ (2,103 records)
    • Ibn al-Mubārak Jihād (268 records)
    • Ibn ʿAdī al-Kāmil (2,174 records)
  3. v3.12 Staged (local)

    Source-Data Items + Fresh Embeddings

    After verifying OpenITI 2025-1-9 source files, refreshed all 448,237 matn embeddings (previous embeddings were on broken-matn data). Kanz al-ʿUmmāl re-ingestion recovered 14,603 newly-attributed records via the OpenITI symbol parser.

    • NAQD-3 fresh-embedding re-run: 1,851 findings (300 critical, 638 high)
    • Previous near-zero N3-CC contradiction count was an artifact of broken embeddings
    • Mudallis registry 105 → 107
    • 21,264 semantic clusters at threshold 0.92
  4. v3.11 Audit response

    Methodology recalibration cycle

    External-examiner cycle drove a framing recalibration on one stylistic document. v3.11 added the _provenanceDisclosure manifest block so AI-involvement scope travels with every record, established the standing provenance-discipline rules, and closed NAQD-1 V1 (112 → 0).

    • _provenanceDisclosure block added to corpus-v3/manifest.json
    • Standing provenance-discipline rules established
    • External framing recalibrated to match pipeline scope
    • 11,380 ikhtilāt onset years backfilled
    • 1,338 NRS undefined-tier entries explicitly flagged
  5. v3.5–v3.10 Staged (local)

    Corpus Integrity Push

    Six release cycles surfacing and remediating multiple defect classes. v3.5 audit-driven corrections (256K stale temporal flags removed, 127K real issues found). v3.7 first-attempt PID validator failed regression (12 T10+ violations) — lesson incorporated into v3.9 multi-stage architecture (24,485 safe PID swaps). v3.8 critical matn recovery from broken-regex corruption that survived two release cycles because regression did not sample matn content.

    • v3.8 lesson: regression tests must include content sampling, not just structural checks
    • v3.9 multi-stage validator: 5 structural pre-filters + LLM tiebreaker = 24,485 swaps
    • v3.10 anonymous-narrator detector reclassified 3,787 chains as munqati
    • 8 git tags v3.5 → v3.12 with 33/33 regression maintained throughout
  6. v3.4 Deployed (Azure)

    First Azure AI Search Deployment + Glossary v1

    First IPSC corpus deployed to Azure AI Search (10 tier indexes; +3 with Glossary v1 = 13 indexes / 1,569,379 docs total). 8 vector indexes plus glossary termEmbedding. 7,652 narrators with classical scholarly quotations. Glossary v1 ship: 730 canonical hadith-science terms across 3 tier indexes.

    • Tier-separated indexes (Public / Research / Scholar) with atomic version cutover
    • 11 vector search indexes total (matn x 3, narrator x 3, defect x 2, term x 3)
    • 7,652 narrators with Taqrīb + Tahdhīb al-Kamāl classical quotations
    • Glossary v1: 730 canonical hadith-science terms with multi-source classical citations
  7. v3.2 Staged (local)

    Matn-Criticism Pipeline (Phases A–G)

    Two-pass architecture: Pass 1 deterministic string-op scan (420,110 records, 11,441 flagged). Pass 2 multi-tier LLM reasoning: Haiku triage → Eve-Theology f5/reasoner detail → Opus 4.6 1M scholar-grade analysis on top 5,000 concerns. Reference databases: 304 Quran rulings, 271 anachronisms, 335 fabrication patterns, 1,301 mutawatir canon entries, prophetic linguistic baseline (39 words avg, sajʿ density 0.033).

    • 815 chain-matn conflicts identified
    • 808 scholar-ready defense documents generated
    • Sahihayn matn-criticism cleanliness: 99.4%
    • Sahihayn chain-grade convergence: 97.3% (later corrected to 95.9% in v3.8)
  8. v3.1 Staged (local)

    Cross-Reference Layers

    Matn criticism pipeline analyzed 437,740 matns. 87,844 hadith cross-linked to al-Dāraquṭnī’s ilal works. Teacher-student graph constructed: 7,973 nodes, 889,913 directed edges (later corrected to 386,520 explicit edges in the v3.11 methodology recalibration). 8,258 hawala chain-switch records split. 1,279,676 Quran term cross-references + 6,447 direct quotations.

    • 437,740 matns analyzed by criticism pipeline
    • 87,844 hadith linked to al-Dāraquṭnī’s ilal works
    • 8,258 hawala chain-switch markers identified and split
    • 45 garbage-collector PIDs cleared (99,477 positions cleaned)
  9. v3.0 Staged (local)

    Initial corpus release

    449,415 hadith from 86 classical works. 27,099 NRS entries with 12-tier reliability assessments anchored to Ibn Ḥajar Taqrīb al-Tahdhīb. Five-condition Ibn al-Salāh grading engine. Person ID resolution pipeline (6 iterations). Structured isnad parsing (1,820,033 chain positions). Bag-of-words matn clustering (54,270 clusters).

    • 449,415 hadith / 27,099 NRS entries / 86 classical works
    • Five-condition framework: ittisāl al-sanad, ʿadālah, dabt, no shudhūdh, no ʿillah
    • 12-tier narrator classification (Sahābi → Kadhdhāb)
    • 6,983 Companions identified