بسم الله الرحمن الرحيم

ملفات البيانات

Data Files

449,285 structured, graded hadith records in JSONL (v3.4 deployed; v3.26 staged adds 8,241 more). 67–70 fields per record. Built for Azure AI Search, Elasticsearch, or any document store.

Looking for the commercial API on Azure infrastructure? See /api for the three-tier access model and licensing.

Data Files

Corpus V3

ipsc-hadith-v3.jsonl

449,285 records (v3.4) · 2.9 GB

Full hadith with parsed chains, grades, enrichments. v3.26 staged adds 8,241 records.

ipsc-narrators-v3.jsonl

37,046 records · 13 MB

NRS database with tiers, assessments, graph positions. 27,118 carry NRS reliability assessments. v3.33 reconciliation against Taqrib confirmed 6,034 / flagged 1,345 mismatches.

ipsc-ilal-v3.jsonl

16,082 records · 45 MB

Hidden defect cross-references from al-Daraqutni

ipsc-matn-clusters-v3.jsonl

52,938 records · 39 MB

Cross-collection content clusters with attestation. v3.12 fresh-embedding rebuild after the v3.8 matn-corruption recovery.

ipsc-entities-v3.jsonl

54,885 records · ~15 MB

Hadith entity aggregations (all chains per teaching)

ipsc-glossary-v1.jsonl

730 terms · ~1 MB

Glossary v1 (2026-04-25): 730 canonical hadith-science terms with multi-source classical citations. 3-tier (Public / Research / Scholar).

manifest.json

_provenanceDisclosure · lightweight

Corpus statistics, file inventory, and the authoritative AI-involvement disclosure block (always public).

67–70 Fields

Schema Overview

CategoryFields
Identity id, workId, collection, hadithNumber, bookName, chapterName
Source text arabicText, isnadText, matn, englishText
Chain analysis isnadStructured (array), chainContinuity, chainAttribution, chainQualityIndex (number, 0.0–1.0 chain data reliability score)
Grading computedGrade, autoGrade, autoGradeDetail (worstTier, worstNarrator, worstLabel, worstPosition, narratorCount, resolvedCount, resolutionRate, chainContinuity, mursalCap, supportingChains, taqwiyah, hasMudallisAnanah, hasIkhtilat), gradeConfidence (number, 0.0–1.0 confidence score), computedConfidence, gradingNotes
Provenance _provenance (array) — consolidated correction history with classical source citations. v3.26 entries: _v326PidTiebreaker, _v335ShipBlocker.
Enrichment crossLinks_ilal, crossLinks_rijal, matnClusterId, clusterId (v3.26), attestationLevel, shudhudh
v3.26 fields _pidTiebreakerVerdict (method=v3.26-llm-tiebreaker-sonnet, confidence, reasoning), _naqd3Override (sourceCollection, sourceAuthority, originalGrade, cappedGrade) (public-tier visible), _chainMatnConflict (public-tier visible), _v319MatchAlternatives (scholar-tier)

للباحثين · For Researchers

للباحثين / For Researchers

The IPSC provides structured data for computational hadith scholarship. JSONL format, 67–70 fields per record, ~2.9 GB for the hadith corpus alone.

حالات استخدام نموذجية · Representative use cases

  • Computational hadith criticism
  • Narrator network analysis (889,913 graph edges)
  • Cross-collection transmission patterns
  • Matn clustering studies (52,938 clusters)
  • Hidden defect analysis (16,082 ilal cross-links)

Each record carries parsed chain positions, resolved Person IDs, NRS tiers, transmission formulas, grading provenance, and cross-references — ready for analysis without preprocessing.

Example

Hadith Record

{
  "id": "bukhari-sahih-000001",
  "collection": "Sahih al-Bukhari",
  "hadithNumber": "1",
  "arabicText": "...",
  "isnadStructured": [
    {
      "position": 1,
      "name": "...",
      "canonicalPersonId": "PERSON-005691",
      "_nrs": { "tier": 3, "label": "thiqah" }
    }
  ],
  "computedGrade": "sahih",
  "autoGradeDetail": {
    "worstTier": 3,
    "chainContinuity": "continuous",
    "supportingChains": 47
  }
}

API Access

Contact for Access