Skip to content

About Transcript Files and the OH Solution Pack

kstapelfeldt edited this page Jun 28, 2017 · 13 revisions

The Oral History module supports two types of transcript files on ingest. The different datastreams produced on ingest of either format depend on how you have configured the module. You can review a diagram of the workflow for both types of source files on the documentation home page. There are pros and cons to both approaches.

  • Custom XML Format
  • WebVTT format

The Oral History module does not support transcripts with the following

  • In-text markup (such as html styling)
  • Overlapping tiers

Custom XML Format

In the XML transcript, the structure is as follows

<cues>
<!-- cues is the root level of the XML file -->
    <solespeaker>One Speaker</solespeaker>
<!-- use the solespeaker element if there is only one speaker throughout the transcript -->
    <cue>
        <speaker>Different Speaker</speaker>
        <!-- only declare the speaker element if you have not declared the "solespeaker" element at the "cues" level-->
        <start>0.000</start>
        <end>12.124</end>
        <!-- 'start' and 'end' elements are start time and end time in seconds for the cue. -->
        <transcript>This is the transcript text content.</transcript>
        <translation>This is the annotation content.</translation>

        <!-- 'transcript' and/or 'translation' are default content tiers of the cue.
              Extra tier(s) can be added as long as they are listed in the configuration page.
             'transcript' element is required if 'Enable captions/subtitles display' is configured to be true, as this 
             element will be crosswalked to a webvtt file on ingest and used to power closed captioning in the viewer -->
    </cue>

    <!-- add more cues with above structure.-->

</cues>

Recording speakers in the transcript

  • If only one person speaks throughout the interview, you do not have to add the stamp to each time cue. Just indicate the speaker at the beginning. For a sample XML transcript with a single tier and single speaker, visit our testing repository.

  • If multiple people speak throughout the interview, declare a speaker for each time cue. An example is provided, and the issue of speakers described in more detail in the readme for the module.

Multi-tiered transcripts

Tiers are additional layers of information that can be added to your transcript that aren’t transcribed information. This can include translations, transliterations, annotations, and so on. Annotation and Transcription are enabled by default, but additional tiers can be defined in the administration screen for the module.