بسم الله الرحمن الرحيم

المنهجية التقنية

IPSC V3 Technical Methodology

A computational implementation of the classical Ibn al-Salah five-condition framework. Every grade traceable. Every decision documented.

IPSC V3.7 · April 2026 · 649-line methodology document
نظرة عامة 1. Overview

The Islamic Primary Source Corpus (IPSC) V3 is a computationally parsed and graded dataset of 449,415 hadith drawn from 38,787+ collection-level source entries across 86 classical works. Each hadith record carries a structured chain of transmission (isnad), a separated body text (matn), a computational authenticity grade, and metadata linking to narrator reliability assessments, hidden defect records, and textual parallels.

The corpus provides structured, machine-readable hadith data with a transparent and reproducible grading methodology. Every grade is derived from documented inputs — narrator reliability tiers, chain continuity classification, defect cross-links, and corroboration counts — so that any scholar can trace the reasoning behind any individual grade.

No pre-existing scholarly grades are imported as authoritative. The corpus grades hadith independently from the chain data, then compares its output against existing scholarly opinions where available.

Nine Major Collections

CollectionCompilerDeath (AH)
Sahih al-Bukharial-Bukhari256
Sahih MuslimMuslim ibn al-Hajjaj261
Sunan Abi DawudAbu Dawud275
Jami' al-Tirmidhial-Tirmidhi279
Sunan al-Nasa'ial-Nasa'i303
Sunan Ibn MajahIbn Majah273
Musnad AhmadAhmad ibn Hanbal241
Muwatta' MalikMalik ibn Anas179
Sunan al-Darimial-Darimi255

Supplementary Collections

Sahih Ibn Hibban, Sahih Ibn Khuzaymah, al-Mustadrak (al-Hakim), al-Sunan al-Kubra (al-Bayhaqi), Musannaf 'Abd al-Razzaq, Musannaf Ibn Abi Shaybah, al-Mu'jam al-Kabir/al-Awsat/al-Saghir (al-Tabarani), Musnad al-Bazzar, Musnad Abu Ya'la, Musnad al-Tayalisi, Sunan al-Daraqutni, Shu'ab al-Iman (al-Bayhaqi), Sharh Ma'ani al-Athar (al-Tahawi), Tafsir al-Tabari, and additional musnad, musannaf, and mu'jam works. Fabrication-detection references: Tanzih al-Shari'ah (Ibn 'Iraq) and al-La'ali al-Masnu'ah (al-Suyuti).

منهجية تحليل الرواة 2. Narrator Resolution Pipeline

2.1 Arabic Text Normalization

  1. Harakat removal — all tashkil (fathah, dammah, kasrah, sukun, shaddah, tanwin) stripped
  2. Hamza normalization — أ , إ , آ normalized to bare alif ا
  3. Ta marbuta — terminal ة normalized to ه
  4. Alif maqsura — ى normalized to ي
  5. Whitespace — multiple spaces, zero-width joiners, non-breaking spaces collapsed

2.2 Person ID (PID) Assignment

Each narrator position maps to at most one canonical Person ID. PIDs take the form PERSON-NNNNNN (six-digit, zero-padded) for Taqrib narrators and PERSON-6NNNNNNN (eight-digit, prefix 60) for supplementary sources. Ambiguous positions retain null rather than recording a potentially incorrect assignment.

2.3 NRS Database

The Narrator Reliability Score database contains 27,118 assessed entries (within a broader 37,046-entry narrator index). Sources, in precedence order:

  1. Taqrib al-Tahdhib — Ibn Hajar al-'Asqalani (d. 852 AH) — primary anchor
  2. Tahdhib al-Tahdhib — Ibn Hajar — detailed assessments
  3. Mizan al-I'tidal — al-Dhahabi (d. 748 AH)
  4. al-Thiqat — Ibn Hibban (d. 354 AH)
  5. al-Kamil fi Du'afa' al-Rijal — Ibn 'Adi (d. 365 AH)
  6. al-Jarh wa-l-Ta'dil — Ibn Abi Hatim (d. 327 AH)
  7. al-Tarikh al-Kabir — al-Bukhari (d. 256 AH)
  8. Tarikh Ibn Ma'in — Yahya ibn Ma'in (d. 233 AH)
  9. Lisan al-Mizan — Ibn Hajar

2.4 Resolution Approaches

a) Exact match — normalized name matches exactly one NRS entry
b) Kunyah disambiguation — graph-based contextual resolution using teacher-student network (7,973 nodes, 889,913 directed edges)
c) Companion-end detection — terminal position matching a known sahabi, validated by prophetic attribution formula
d) Relational reference'an abihi / 'an jaddihi resolved via genealogy; flagged with quality caps during grading

2.5 Coverage

78.5% of narrator positions carry a resolved PID. An additional 4.9% are structural (collective/anonymous references). The remaining 16.6% genuine null consist of ambiguous kunyahs, unresolvable relational references, single-name narrators with multiple candidates, and collective references. These are genuine disambiguation gaps — the system does not guess.

التوفيق بين التقييمات 3. Assessment Reconciliation

3.1 Taqrib Anchoring

Ibn Hajar's Taqrib al-Tahdhib serves as the primary and authoritative source. When a Taqrib verdict exists, it is never overridden by other sources, reflecting scholarly consensus that Ibn Hajar's Taqrib represents the most careful synthesis of the earlier critical tradition.

3.2 Source Hierarchy

  1. Taqrib al-Tahdhib — if available, final. Never overridden.
  2. Tahdhib al-Tahdhib — detailed discussion when Taqrib absent.
  3. Multiple non-Ibn-Hajar sources — 2+ independent critics agree, consensus adopted.
  4. Single source — adopted with reduced confidence.

When sources conflict, the weaker assessment prevails unless the stronger comes from a higher-hierarchy source.

3.3 Twelve-Tier System

TierArabicTransliterationEnglishGrading Impact
T1صحابيSahabiCompanionAutomatic pass — beyond jarh wa-ta'dil
T2ثقة متقنThiqah mutqinVery reliable, preciseSupports sahih
T3ثقةThiqahReliableSupports sahih
T4صدوقSaduqTruthfulSupports hasan
T5صدوق يهمSaduq yahimTruthful but errsSupports hasan
T6مقبولMaqbulAcceptable when supportedDa'if alone; hasan with corroboration
T7ضعيف / مجهولDa'if / majhulWeak / unknownDa'if
T8ضعيف جداDa'if jiddanVery weakDa'if (eligible for taqwiyah)
T9متروكMatrukAbandonedVery weak — also for anonymous narrators
T10متروكMatruk (severe)Abandoned (severe)Very weak — corroboration blocked
T11متهم بالكذبMuttaham bi-l-kadhibAccused of lyingVery weak — corroboration blocked
T12كذاب / وضاعKadhdhab / wadda'Liar / fabricatorMawdu' (fabricated) — corroboration blocked

Key principle: T1–T3 support sahih. T4–T6 support hasan. T7–T8 produce da'if. T9+ produce very weak or fabricated and cannot be strengthened by corroboration — the deficiency lies in 'adalah (moral integrity), not merely dabt (precision).

منهجية التصنيف 4. Grading Methodology

The grading engine implements the classical five-condition framework of Ibn al-Salah (Muqaddimah):

Ittisal al-sanad — Chain continuity from chainContinuity field
'Adalat al-ruwat — Narrator uprightness from NRS tier
Dabt al-ruwat — Narrator precision from NRS tier
'Adam al-shudhudh — Absence of anomaly from shudhudh flag
'Adam al-'illah — Absence of hidden defect from crossLinks_ilal and ilalDefectCount

4.1 Resolution Threshold

A hadith is graded only when 50% or more of its narrator positions carry resolved PIDs. Below this, the grade is set to not-graded with computedConfidence: "low".

4.2 Base Grade from Weakest Narrator

Weakest TierBase Grade
T1–T3sahih
T4–T6hasan
T7–T8da'if
T9+very-weak
T12mawdu' (fabricated)

4.3 Quality Caps

  • Uncertain resolutions below T4 are capped at T4 (saduq) — benefit of the doubt
  • Original tier T8 or worse: cap rises to T6 (maqbul)
  • Phase-5 relational resolutions (father/grandfather) capped at T4 regardless

4.4 Chain Continuity Adjustments

  • Broken chain (munqati' / mu'allaq): downgraded one level
  • Uncertain chain with all T1–T3 narrators: conservatively set to hasan
  • Continuous chain: no adjustment

4.5 Mursal Cap

If chainContinuity = "mursal" and no companion PID is found at the terminal position, the grade is capped at da'if. With 2+ independent supporting chains, a mursal may reach hasan li-ghayrihi.

4.6 Taqwiyah (Mutual Strengthening)

  • Da'if + 2+ independent chains → upgraded to hasan li-ghayrihi
  • Hasan + 3+ independent chains → upgraded to sahih li-ghayrihi
  • Independence requirement: supporting chains must not share a common bottleneck narrator (madar). For 10+ chains, a square-root discount is applied.
  • Hard floor: taqwiyah blocked when weakest narrator is T10+. A liar corroborated by other liars does not become truthful.

4.7 Defect Handling

  • Single 'illah: flagged, confidence reduced, no automatic downgrade
  • Two+ defect records: downgraded one level
  • Shudhudh: if flagged and grade is sahih, reduced to hasan

4.8 Hawala Handling

The hawala marker (ح) indicates a chain-switch — 8,751 records flagged. The grading engine grades the primary (first) chain only. The secondary chain is noted in autoGradeDetail but does not override.

4.9 Anonymous Narrator Penalty

Collective or anonymous references (nas, rajul, ba'd ashabihi) are assigned T9 (matruk/majhul) because no individual can be identified for reliability assessment.

تصنيف اتصال السند 5. Chain Continuity Classification
ClassificationMeaning
continuousStandard connected chain — each narrator heard directly from the next, verified by temporal overlap and known teacher-student relationships
muttasilVerified as connected to a companion — initially ambiguous, later confirmed
likely-continuousProbable connection based on death-date overlap and generational proximity, without explicit documentation
scholarly-verifiedContinuity confirmed by classical scholarship (e.g., al-Mizzi in Tahdhib al-Kamal)
mursalChain does not reach a companion through verified hearing — a tabi'i reports directly from the Prophet
muallaqSuspended: one or more narrators at the beginning omitted by the compiler
compilationCompiler's own chain or editorial arrangement
uncertainInsufficient data to determine connectivity
parser-errorChain text could not be reliably parsed — receives not-graded

Continuity is determined by checking adjacent narrator pairs. Chain break severity = impossible pairs / total pairs. Severity above 0.3 classifies the chain as broken.

الإثراء المتخصص 6. Specialized Enrichments

6a. Transmission Formulas

Tasrih (explicit hearing): haddathana/haddathani, sami'tu, akhbarana/akhbarani, anba'ana — these explicitly indicate direct hearing.

'An'anah (ambiguous): the formula 'an ("from") does not explicitly state direct hearing. When the narrator is a known mudallis of severity 3+, 'an'anah triggers a chain-level flag.

6b. Tadlis Detection

Registry of 48 narrators catalogued with severity levels 1–5, derived from Ibn Hajar's Tabaqat al-Mudalliseen.

LevelDescriptionTreatment of 'an'anah
1Rarely practiced tadlisAccepted
2Scholars tolerated due to status or rarityGenerally accepted
3Scholars differed; significant number practiced frequentlyNot accepted without tasrih
4Scholars rejected their 'an'anah altogetherNot accepted
5Weak narrators who also practiced tadlisNot accepted

Level 3+ with 'an'anah triggers a one-level downgrade. Note: ~388,000 positions (~21%) have no parsed transmission formula — a parser-level limitation.

6c. Ikhtilat (Mental Deterioration)

Structured data on 29 mukhtalit narrators with onset year, pre/post student lists, and biographical sources. When detected, records are flagged with _ikhtilat: true.

6d. 'Ilal (Hidden Defects)

16,082 entries cross-linked from al-Daraqutni's al-'Ilal al-Waridah. Defect types: mursal, mawquf-as-marfu', tadlis, wahm (error), da'if chain. Two+ cross-links trigger a one-level downgrade.

6e. Attestation Levels

LevelChainsCount
gharib1 (solitary)
'aziz2
mashhur3+12,209 clusters
mutawatirMass-transmitted1,161 clusters

Additionally, 6,346 common-link clusters (chains converging on a single pivotal transmitter). Attestation is computed at the matn cluster level.

تحليل المتن 7. Content Analysis (Matn Criticism)

A two-pass computational matn criticism architecture — the first of its kind applied at corpus scale.

What It Checks

  • Quran contradictions
  • Fabrication patterns
  • Anachronistic vocabulary
  • Chain-matn conflicts
  • Prophetic linguistic baseline deviation

Results

  • 280 likely fabrications identified
  • 91 chain-matn conflicts detected
  • 6,822 known fabrications validated against reference works

Prophetic Linguistic Baseline

Average word count: 39 words. Saj' (rhyming prose) density: 0.033. These baselines help identify texts that deviate significantly from the established prophetic speech patterns.

مقاييس الجودة 8. Quality Metrics

8.1 Convergence with Scholarly Consensus

Bukhari sahih + hasan
95.8%
Muslim sahih + hasan
96.0%
Muwatta' Malik
97.1%

These rates were independently achieved — the system was not tuned to match Bukhari or Muslim.

8.2 Integrity Checks

CheckCriterionResult
Mursal graded sahihShould never independently receive sahih0 violations
T10+ in SahihaynAbandoned narrators should not appear in Bukhari/Muslim0 violations
Grade consistencycomputedGrade = autoGrade across 250-record test set100% agreement
Top-500 PID auditManual verification of 500 most frequent PIDs436/500 (87.2%)

8.3 Grade Distribution

GradeCount%
sahih47,44110.6%
sahih li-ghayrihi111,99624.9%
hasan71,16015.8%
hasan li-ghayrihi52,20811.6%
da'if33,6457.5%
very-weak24,8595.5%
not-graded98,76922.0%

Total graded: 350,646 (78.0%). Taqwiyah upgrades: 168,284. Quality caps applied: 139,531. Ilal-flagged: 7,009.

القيود الموثقة 9. Documented Limitations

Narrator Resolution

  • 16.6% null PID rate — genuine disambiguation gaps (ambiguous kunyahs, relational references, single-name narrators)
  • Death year approximation — for ~39,799 entries, estimated from tabaqah rather than documented sources

Textual Verification

  • Arabic matn text not collated against critical printed editions
  • Hawala records flagged but not physically split (8,751 records)

Enrichment Coverage

  • Matn cluster coverage: 61% — hadith without cluster assignments do not benefit from cross-chain attestation
  • Mudallis registry covers 48 of ~150+ documented mudalliseen
  • Ikhtilat database covers 29 narrators — classical sources document dozens more
  • Shudhudh detection is flag-based, not comprehensive

Methodological Scope

  • No fiqhi (jurisprudential) context — grading is purely chain-based
  • Single-chain grading — the corpus does not perform full takhrij
تنسيق البيانات 10. Data Format

JSONL format — one JSON object per line. Primary index: ipsc-hadith-v3.jsonl (449,415 records).

Core Fields

FieldTypeDescription
idstringUnique identifier (e.g., bukhari-sahih-000001)
workIdstringCollection identifier
collectionstringHuman-readable collection name
hadithNumberstringNumber within collection
arabicTextstringFull Arabic text (isnad + matn)
isnadstringSeparated chain of transmission (Arabic)
matnstringSeparated body text (Arabic)
hadithTypestringmarfu, mawquf, maqtu, tafsir, mawdu

Isnad Structure

FieldTypeDescription
positionnumberOrdinal position (1 = compiler's source)
namestringNarrator name as it appears in Arabic
canonicalPersonIdstring|nullResolved PID or null
formulastringTransmission formula text
formulaTypestringtasrih or ananah
_nrsobjectEmbedded NRS: tier, label, deathAH, isCompanion
_mudallisobjectSeverity and requiresTasrih (if applicable)
_resolvedBystringResolution method

Grading Fields

FieldTypeDescription
computedGradestringFinal grade: sahih, sahih-li-ghayrihi, hasan, hasan-li-ghayrihi, daif, very-weak, mawdu, not-graded
autoGradestringV3 final regrade pass
chainContinuitystringConnectivity classification
computedConfidencestringhigh, medium, or low
gradingNotesarrayHuman-readable grading explanations

Enrichment Flags

FieldTypeDescription
_ikhtilatbooleanMukhtalit narrator in chain
crossLinks_ilalarrayDefect record references
ilalDefectCountnumberCount of known defects
shudhudhbooleanTextual anomaly detected
isCompoundbooleanContains hawala chain-switch marker
attestationLevelstringgharib, aziz, mashhur, or mutawatir
resolutionRatenumberFraction of positions with resolved PIDs

Supplementary Indexes

IndexRecordsDescription
ipsc-narrators-v337,046Full narrator database (NRS + biographical)
ipsc-ilal-v316,082Hidden defect records from al-Daraqutni
ipsc-matn-clusters-v354,270Matn cluster records with English summaries
ipsc-presentation-v36Presentation-layer summary statistics
الاستشهاد 11. Citation

Individual Hadith

IPSC V3 Corpus, TheoAI / Eve Theology LLC, 2026. Hadith [id], graded [computedGrade]. Narrator resolution via NRS v3 (27,118 entries). Methodology: docs/methodology-v3.md.

Corpus-Level

TheoAI / Eve Theology LLC. Islamic Primary Source Corpus (IPSC), Version 3. 2026. 449,415 hadith across 86 classical works, computationally graded via chain analysis with 27,118 narrator reliability entries.

Methodology

TheoAI / Eve Theology LLC. "IPSC V3 Technical Methodology." 2026. Covers narrator resolution pipeline, twelve-tier assessment reconciliation, five-condition grading algorithm, tadlis detection, ikhtilat flagging, and ilal cross-linking.

Specific Grading Decision

Hadith [id] graded [grade] per IPSC V3 methodology: weakest narrator [worstNarrator] at tier [worstTier] ([worstLabel]), chain continuity: [chainContinuity], supporting chains: [supportingChains]. Taqwiyah: [applied/not applied]. See autoGradeDetail for full provenance.