Skip to content

Scholarly digital edition of the Chronicon by Romualdus Salernitanus (XII century)


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Romualdus Project


This repository includes the source code (mostly TEI XML and Python) of the digital scholarly edition of the Chronicon by Romualdus Salernitanus (Romualdus Guarna, XII century) edited by Paolo Monella within the ALIM Project (2017-2020).

The home page of this edition is

The textus constitutus of the edition, including an introduction and some textual statistics, is in the TEI XML P5 file XML/chronicon.xml in this repository.

The edition has been published in the ALIM digital library at in 2020. At this URL you can read a HTML visualization of the edition and download it in XML (the original source), HTML, PDF and plain text format.

Sigla and editions

Repository structure

  • csv folder includes CSV tables:

    • tables of signs for each manuscript (MS) transcribed at the graphematic layer, such as a-tos.csv for MS A and b-tos.csv for MS B
    • tables of common abbreviation combinations for those MSS (manuscripts), such as a-combi.csv for MS A and b-combi.csv for MS B (note that the transcription has been done at the graphematic layer only for the first paragraphs of MSS A, B and C; MS A has been entirely transcribed at the alphabetic layer only; MSS B and C have not transcribed except for the first paragraphs)
  • db folder includes the Sqlite3 database file romualdus.sqlite3 with tables on variant types and subtypes, textual decisions and on collation

  • fonts folder includes fonts used for HTML visualizatoin (see file index.html and folder html below)

  • html folder includes HTML visualization of MSS transcriptions at the graphematic/alphabetic layer

    • transcription at the graphematic layer has only been created for the first paragraphs of MSS A, B and C, then abandoned
  • img folder in my local version of this repository includes links to PDF digital facsimiles of MSS A and O

    • those PDF files are not available online due to (questionable) copyright restrictions
  • python folder includes the Python 3 code used in this project

  • scan/ocr folder includes .txt files resulting of the OCR, at different stages of manual revision

  • xml folder includes the XML source files

  • Other files in the root of the repository:

    • index.html: the home page of the project website, including

      • a brief MSS description
      • links to the graphematic/alphabetic layer transcription visualization
    • romualdus.png: a screenshot to be possibly used in the Website

    • stylesheet.css: CSS stylesheet associated to index.html

Files in the xml folder

Transcription and OCR files

Original files (all .xml files are valid TEI P5 XML):

  • a.xml: complete TEI XML transcription of MS A
    • only the first paragraphs have been transcribed at the graphematic layer
    • the other have been transcribed at the alphabetic layer only
    • for paragraphs from g116.6-118.8 through g163.1-163.5 I only transcribed major variants
    • Garufi's edition was the collation base until (and including) paragraph g163.1-163.5
    • Bonetti's edition was the collation base for the collation from (and including) paragraph g163.6-163.7
  • o.xml: transcription of the fragment of MS O (Schwartz's Aa) including the text of the "short version" of the Chronicon, i.e. paragraphs g168.5-168.7 through g185.8-186.5
    • the text has been transcribed at the alphabetic layer
    • Bonetti's edition was the collation base
  • g.xml: reviewed OCR of Garufi's edition (1914)
  • bonetti.xml: reviewed OCR of Bonetti's edition (only the critical text, that Bonetti drew from Garufi 1914 and Arndt 1866), reporting only the second part of the Chronicon (from Garufi page 163, i.e. par. g163.6-163.7, to the end of the work, including the Peace of Venice)
  • b.xml: graphematic transcription of the first paragraphs of MS B
  • c.xml: graphematic transcription of the first paragraphs of MS C

Split and sorted versions of a.xml:

  • a.xml was split by script python/ into two chunks to facilitate collation:
    • a1.xml to be collated with g.xml
    • a2.xml to be collated with bonetti.xml (paragraphs g163.6-163.7 to the end of the work, including the Peace of Venice)
    • a2-sorted.xml is a version of a2.xml (created by script python/ in which the order of paragraphs matches that of Bonetti's edition, to facilitate collation
  • a2-sorted.xml and bonetti.xml were further split (again, by script python/ into three chunks, reporting the same portions of the Chronicon, to facilitate collation:
    • a2-sorted-2-alfa.xml and bonetti-2-alfa.xml: paragraphs g163.6-163.7 through g167.4-168.4, for which only Bonetti and A must be collated
    • a2-sorted-2-bravo.xml and bonetti-2-bravo.xml: par. g168.5-168.7 through g185.8-186.5, i.e. the part for which Bonetti, A and O must be collated
    • a2-sorted-2-charlie.xml and bonetti-2-charlie.xml: par. g186.6-186.7 to the end of the work, inlcuding the Peace of Venice, for which only Bonetti and A must be collated

Those files have been further processed by a number of scripts in the python folder (mainly python/ to produce simplified versions that have been fed to for collation. Most TEI XML markup was removed. For paragraph tags, < and > were replaced with brackets. The simplified version of each of the above file has a -simple suffix. E.g.:

  • a1-simple.xml is the simplified version of a1.xml
  • a2-sorted-2-bravo-simple.xml, of a2-sorted-2-bravo.xml
  • bonetti-2-charlie-simple.xml, of bonetti-2-charlie.xml etc.

Collation files

Direct ouput of (later edited manually to improve the result):

  • m1.xml: the result of the ( collation between g-simple.xml and a1-simple.xml
  • m2-alfa.xml: result of collation between a2-sorted-2-alfa-simple.xml and bonetti-2-alfa-simple.xml
  • m2-bravo.xml: result of collation between a2-sorted-2-bravo-simple.xml and bonetti-2-bravo-simple.xml
  • m2-charlie.xml: result of collation between a2-sorted-2-charlie-simple.xml and bonetti-2-charlie-simple.xml

Script python/ then processed those files to produce XML well-formed files in which the brackets for paragraph tags were re-transformed to < and >. The resulting files were respectively:

  • m1-par.xml
  • m2-alfa-par.xml
  • m2-bravo-par.xml
  • m2-charlie-par.xml

Finally, with the help of script python/' and other modules imported by it (e.g. python/, to detect the variant subtype), I brought about the _constitutio textus_, by storing information in the db/romualdus.sqlite3` database. The output files for each chunk were:

  • m1-par-out.xml
  • m2-alfa-par-out.xml
  • m2-bravo-par-out.xml
  • m2-charlie-par-out.xml

Finally, script python/ and re-unified the latter files, attaching their content to a template teiHeader taken from file xml/teiHeader_template.xml. It thus produced file chronicon.xml, the textus constitutus of the edition.

Python code in the python folder

Transcription and OCR:

  • check named entities in transcription/OCR files
  • non_rs.txt: a list of words that are not named entities in transcription/OCR files
  • mark numerals in TEI XML and relative checks
  • post-process OCR txt files to produce TEI XML
  • romanranges: text files to help the processing of Roman numerals
  • rs_bonetti_all.txt, rs_garufi_pp_1-20.txt, rs.txt: lists of named entities in transcription/OCR files

Initial attempts to collate transcriptions with CollateX:

  • collate transcriptions with CollateX
  • xmlns_collatex: originally used to store versions of the XML files compatible with CollateX

Visualization of MS transcriptions at the graphematic layer:

  • find abbreviation combinations in transcriptions and compare them with the relevant CSV file
  • check that transcriptions of MSS A, B and C all have the same TEI XML paragraph tags with the same xml:id's
  • extract/divide the transcription layers from the TEI XML source file, producing HTML code for visualization
  • check that only graphemes in the table of signs have been used in graphematic transcription
  • modules utilized to visualize MS transcriptions at the graphematic layer
  • produce HTML visualizations of MS transcriptions at the graphematic layer
  • produce the alphabetic representation of sequences of graphemes (for MS transcriptions at the graphematic layer)
  • trigger for modules producing a HTML visualization of MS transcriptions at the graphematic layer

Pre-processing of transcription/OCR files before collation:

  • split transcription/OCR files of the second part of the work (see XML file description above)
  • replace long tags with XML entities
  • simplify TEI XML markup (see XML file description above)
  • re-arrange paragraphs in the second part of the work (see XML file description above)
  • (see XML file description above)

Post-processing of collation files:

  • import bibliography from a BibTeXML file to the 'front' element of chronicon.xml
  • diff: spreadsheets to check the diffs found by module
  • trigger for all other scripts (pre-processing before collation and post-procesing after collation)
  • re-unify collation files (see XML file description above)
  • function to easily import sqlite3 DB tables
  • repeat collation of a paragraph with CollateX (I planned to use it after JuxtaCommons collation)
  • assist during the constitutio textus, working on collation files and the sqlite3 DB
  • post-process JuxtaCommons files (see XML file description above)
  • extract the readings (deriving from chronicon.xml) from a temporary text file, then insert them in the DB
  • detect the variant subtype (e.g. 'num-num', 'missing-in-print' etc.)


  • module to easily get all textual content in an XML element
  • module including global variables to be used by other scripts
  • ripostiglio: previous versions of Pyhon modules and scripts
  • create textual statistics and insert them in the front element of chronicon.xml
  • strip an XML node and keep its text and tail textual content


Scholarly digital edition of the Chronicon by Romualdus Salernitanus (XII century)







No packages published