THDL Toolbox > Audio-Video > Linguistic Processing - Transcription, Annotation And Translation > Transcription Manual
- Version Control: David Germano
- Author(s): Ed Garrett, Roger Andersen
This article provides guidelines for transcribing spoken Tibetan texts. This task differs considerably from writing a story or letter. To faithfully represent the spoken record, it is important to write down everything everybody says, and to capture the texture of the text as much as possible, while still trying to preserve the principles and values of Tibetan's long-standing literary tradition. This article starts with general principles, and then proceeds to discuss a list of crucial transcription issues in alphabetical order (not in alpha?).
Transcription in THDL is done in a special format that allows for transcriptions to be used in the Savant software system. Savant allows for transcriptions to be created with translations, annotations, and timecodes, so that the transcription can be played back in synchronization with the original audio or video. These transcripts must be created in QuillDriver, a sister software program that provides a user friendly way to create such transcripts. Transcripts can, however, be created in simple word processing forms, and then migrated over to QuillDriver. Please refer to the QuillDriver Transcription Manual for details on actually making transcripts before you begin doing actual transcription, whether you intend to use QuillDriver from the start or some intermediary form like pen on paper, or a word processing document. The current document focuses on the content of transcription as well as time coding transcripts. The actual creation of a QuillDriver Transcript and using QuillDriver is documented within the QuillDriver Transcription Manual.
- Do it right the first time! Don't start by doing a quick and easy transcript, thinking you'll go back later and edit and revise it. You won't. So be careful and do it right from the start.
- Write down everything you hear, and only what you hear! In transcripts of colloquial speech lesson videos , you must write down everything everybody says, including for example every 'ong, a, lags so and so on. Conversely, you should not add particles to the text which do not occur in the spoken sample.
- However, for academic and other non-colloquial speech lesson videos, you should not write down every instance of habitual articles such as 'ong, a, lags so and so on. When studying these videos for their academic content, these common articles can be distracting.
- Try as much as possible to capture the texture and communicative impact of the speech. That means noting down pauses, interruptions, and so on.
If you encounter a Chinese loanword in your transcript, attempt to spell the word in Chinese. Follow this word with the Tibetan equivalent in parentheses. For example:
In some cases, speaker Ka may split an utterance over two turns. We propose to code this by using a series of tshegs at the beginning of the speaker's second utterance:
Note that unlike "overlap" (for which, see below), here Ka and Kha are speaking in turn. There is a pause after Ka's first utterance, and Kha speaks in cooperation with Ka.
Speakers often make mistakes, for example mispronouncing or mischoosing words in sometimes embarrassing ways. We do not correct their speech in our transcriptions. Instead, we simply note down what speakers say, as they say it. If the mistake is so severe that the speaker's meaning is impossible to determine without a hint, transcribers may follow the mistake with the target word in brackets.
If one speaker interrupts another speaker, thus preventing the first speaker from completing his utterance, then do not fail to type the first speaker's partial utterance. Instead of ending the partial utterance with a shad, end it with a series of tshegs.
A misstart occurs when a speaker starts to say something, then stops, and then continues or says something else. In cases of mis-start, type the first part of the speaker's utterance, followed by a space, and then the second part of his utterance. (To get a space in THDL Extended Wylie, type '_'.)
An example from the video Phurdron with her Dad:
ཀ � ོང་ཁ་སེང་པཱ་ལགས་- - - པཱ་ལགས་ཁྱེད་རང་བཞུགས་མེད་དུས་ཙམ་པར་ཟླ་སྒྲོན་� དིར་སླེབས་སོང་ང་ལགས།
ཀ � ོང་� ོང་རེད་རེད་� བྲེལ་བ་མང་ཙམ་བྱེད་ཕར་ཚུར།
An example from the video Phurdron with her Dad: དཔེར་ན། ཕུར་སྒྲོན་དང་ཁོ་མ� ི་ཕ་ཞེས་པ� ི་བརྙན་ཕབ་ལས།
ཀ � ོང་ཁ་སེང་པཱ་ལགས་་་་པཱ་ལགས་ཁྱེད་རང་བཞུགས་མེད་དུས་ཙམ་པར་ཟླ་སྒྲོན་� དིར་སླེབས་སོང་ང་ལགས།
ཀ � ོང་� ོང་རེད་རེད་� བྲེལ་བ་མང་ཙམ་བྱེད་ཕར་ཚུར།
When two or more speakers' speech overlaps, try not to split a speaker's continuous sentence into two separate turns, interrupted by the other speaker's interjection. Instead, consider putting all of each speaker's utterance in a single speaker turn. Since we have direct and immediate access to audio and video files of the interaction, overlap need not be resolved in the transcript itself.
Pauses should be graphically represented whenever they are longer than normal. Our convention is to represent long pauses as a series of tshegs. Short pauses are coded with three tshegs, medium-length pauses with six tshegs, and long pauses with twelve tshegs.
When the speaker successfully completes an utterance, follow that utterance with a shad, even if it is not a complete sentence. (Note: do not use shad if the speaker has been interrupted or if the speaker has mis-started.)
When naming speakers, do not include the honorific lags, except in special cases such as pA lags and the like. If the participant's name cannot be figured out or is irrelevant in the context, use general terms of address such as cog lags 'brother', pho lags 'grandfather', and the like. Valid speaker names include:
པཱ་ལགས་ ཕུར་སྒྲོན་ཅོག ཨ་ཅག བཀྲ་ཤིས
The general idea of our spelling guidelines is to use literary spellings, except where spoken forms excessively diverge from literary forms. In such cases, the literary form is given first, followed by the colloquial form in parentheses. For example:
A list of approved and rejected spellings is maintained as the Spelling Database. When considering a spelling, transcribers should first consult this list to see whether the proposed spelling has already been approved or rejected. The Spelling Database was created largely by applying the spelling rules found in the Spelling Guidelines document. Transcriber familiarity with this document is essential. If a spelling is not in the Spelling Database, the transcriber should then consult a dictionary. If the dictionary also fails to shed light on the matter, then he has no choice but to propose a potentially controversial spelling. Transcribers should mark; proposed spellings in their transcripts as follows:
This makes it easy to search for proposed spellings and replace them with approved spellings. In addition, transcribers should add their proposal to their Spelling Proposal file. For each proposed new spelling, transcribers keep track of the following fields in their Spelling Proposal document: Proposed Spelling, Rationale for Spelling, Definition in Tibetan, Chinese or English translation, rendering of the pronunciation of the word in Tibetan script, proposed transcription practice, literary root, example passage or sentence, and source (i.e. title of transcript). Periodically, each transcriber's Spelling Proposal file is reviewed, and an authority makes the necessary adjustments to the Spelling Database, which is then made available to transcribers.
The QuillDriver software allows time codes to be inserted at any level of granularity. At one extreme, time codes could delimit each clause. At the other extreme, they could delimit each page of text. Our guidelines for granularity of time coding depend on the type of material being transcribed as well as the way the material will be used. Throughout, our time coding guidelines are to insert both Start and Stop times for each block that is time coded (i.e. by pressing Ctrl-] in QuillDriver rather than Ctrl-[).
For such materials, we generally time code each clause separately. Usually, there is enough of a pause between clauses to make this possible. For example, in MST 36, the following lines are timecoded separately:
There is a noticeable pause after byung, which enables a time code to be placed there. In some cases, the decision to time code a series of clauses or phrases as one or more separate segments depends on whether or not there is any pause to hold the time code. For example, the following line, also from MST 36, although splitable in theory after bkra shis bde legs, was kept as one line in practice because there was virtually no pause after bde legs:
It may sometimes make sense in colloquial conversations to enclose two speakers' turns together within a single set of time codes. For example, if person A is saying 'ong 'ong 'ong throughout person B's utterance, it may make more sense to time code A and B's utterances together rather than to give A's utterance a separate set of time codes.
In the case of lectures, one may want to limit time coding to every "paragraph" of the transcript. For non-colloquial speech lesson videos, time codes should delimit paragraphs no longer than three to four sentences in length.
For interviews, either (a) questions and answers should be time coded together, as a unit; or (b) questions should be time coded separately from answers, and answers should be time coded at the paragraph level. (a) is only feasible if both question and answer are relatively short.
If possible, each line of a song should be time coded separately. If this is not possible given the flow of the song, then each verse should be time coded separately.
Sometimes you can't hear clearly what's being said, which means that you either have to make a guess, or just leave an empty space in the transcription. Such cases must be clearly marked, to alert readers and subsequent transcribers to the transcription difficulty. If you have no guess as to what's being said, you should put a series of tshegs inside parentheses, as in the first example below. If you have a guess, then precede and follow your guess by a series of tshegs, and enclose the whole in parentheses, as in the second example:
Tibetan version has been removed by Bradley Aaron due to corrupt fonts. Original with corrupted text can be obtained from him.
Provided for unrestricted use by the Tibetan and Himalayan Digital Library