Concordance Methodology
This page documents the actual methods used to build the concordance between Russell's dissertation references and the manuscript photograph collections. All statistics are drawn from the project database. Uncertainty and provisional status are marked explicitly throughout.
The Problem Russell Presents #
Russell's thesis references folios by signature (e.g., "b6v", "h1r") rather than by page number, since the 1499 edition has no printed pagination. Our manuscript photograph collections, however, use different naming conventions: the Siena images use folio numbers with recto/verso suffixes (e.g., O.III.38_0014r.jpg), while the BL images use sequential numbers (C_60_o_12-001.jpg through C_60_o_12-196.jpg). The core challenge is mapping Russell's 282 signature-based references to the 674 available photographs.
BL vs. Siena Image Sets #
| Property | Siena O.III.38 | BL C.60.o.12 |
|---|---|---|
| Images | 478 | 196 |
| Naming | Folio number + r/v suffix | Sequential number only |
| Edition | 1499 Aldine | 1545 Aldine (second edition) |
| Folio mapping | Directly encoded in filename | Requires inference |
| Matching confidence | HIGH / MEDIUM | LOW (all provisional) |
How the Signature Map Works #
The signature map is a deterministic lookup table of 448 entries generated from the 1499 collation formula: a–z8 (omitting j, u, w), A–F8, G4. Each quire has 8 leaves (except G with 4), and each leaf has recto and verso sides. The map converts any valid signature (e.g., "b6v") to a sequential folio number and vice versa.
This map is generated by build_signature_map.py and is fully deterministic.
It is correct for the 1499 edition. Its applicability to the 1545 edition is assumed but
not verified.
How Dissertation References Were Extracted #
Russell's thesis PDF was processed by extract_references.py using PyMuPDF
for text extraction and regular expressions for signature pattern matching. The script
extracted 282 references, distributed across manuscripts:
| Manuscript | References |
|---|---|
| Buffalo RBR | 59 |
| Inc.Stam.Chig.II.610 | 55 |
| C.60.o.12 | 50 |
| INCUN A.5.13 | 42 |
| Modena (Panini) | 29 |
| O.III.38 | 28 |
How Image Filenames Were Parsed #
catalog_images.py parses image filenames from two manuscript collections into
structured records with folio number, side (recto/verso), and page type. The Siena images
encode folio information directly (O.III.38_0014r.jpg = folio 14 recto). The BL images
use sequential numbers that do not directly encode folio information.
How Matching Was Performed #
1 Signature Lookup
Convert each dissertation reference's signature (e.g., "b6v") to a folio number using the signature map.
2 Manuscript Filter
Filter images to the manuscript shelfmark specified in the dissertation reference.
3 Folio Match
Match the computed folio number to images with the same folio number and side.
4 Fallback Cross-Match
If no direct match, attempt cross-manuscript matching (lower confidence).
5 Confidence Assignment
Assign HIGH (exact Siena match), MEDIUM (folio-based), or LOW (BL sequential inference).
Where Confidence Is High vs. Low #
| Confidence | Matches | Description |
|---|---|---|
| HIGH | 431 | Siena images with explicit folio+side in filename, matched by signature lookup |
| MEDIUM | 0 | Siena images matched by folio number (cross-manuscript or side ambiguous) |
| LOW | 0 | BL images where sequential photo number is assumed to equal folio number |
By Manuscript
| Manuscript | HIGH | MEDIUM | LOW |
|---|---|---|---|
| Siena O.III.38 | 392 | 0 | 0 |
| BL C.60.o.12 | 39 | 0 | 0 |
Why BL Matches Are Provisional #
What Human Review Is Still Needed #
- Verify BL photograph-to-folio correspondence against physical or high-resolution images
- Confirm that the 1545 edition follows the same collation as the 1499
- Spot-check MEDIUM confidence Siena matches for side (recto/verso) accuracy
- Review hand attribution for edge cases where multiple hands annotate the same folio
- Validate marginal text transcriptions against original manuscript images
How This Methodology Supports Future Scholarship #
The concordance pipeline produces a structured, queryable dataset linking Russell's close reading to digital images. Once the BL matches are verified, this enables:
- Folio-level browsing of annotations alongside manuscript images
- Systematic comparison of annotation density across copies
- Identification of folios that attracted multiple annotators
- Cross-referencing between annotations and the dictionary of terms
- Future multimodal analysis using computer vision on manuscript images