THL Toolbox > Tibetan Texts > Marking Up Tibetan Text in Word with Styles
Contributor(s): Chris Hatchell, THL Staff.
This page describes how to format Tibetan texts using Microsoft Word’s “styles” feature.
- Getting Started
- Basic Markup Goals
- Steps for the Basic Markup of a Tibetan Text
- Advanced Markup
- Detailed Guidelines
- Style Sheet
To aid in the markup of Tibetan texts in Word, we have created a Tibetan-language template:
This template contains the same Word styles as the English-language template used for THL documents. In the Tibetan-language template, however, the font is specified as "Tibetan Machine Uni" and the language is specified as "Tibetan (PRC)." The Tibetan-language template also contains some simple formatting conventions to ensure that the Tibetan font displays nicely; in particular, it sets the font-size and paragraph spacing, and includes an option that makes lines break properly at the edges of the page. Once you download the template, place a copy in the folder where Microsoft Word stores its templates (usually this is C > Documents and Settings > user > Application Data > Microsoft > Templates). Once you have installed the template, open a copy of it and paste your Tibetan text into it.
The basic goal of text markup is, first, to create an electronic edition of a text that is easy to read, search, and navigate. Here, you will add luxuries that aren’t there in the original text: subheads, paragraph formatting, clearly identifiable text-titles and personal names, and so forth. Secondly, by doing this markup with a standardized set of styles, it makes it easy for the text to be converted to XML and put on the web.
- For more on Word styles, see Using Microsoft Word Styles; a detailed description of text markup principles can be found in Using Word Styles for THL Markup. For some technical documentation about the XML language, see Introduction to XML and THL XML Markup.
The first step in marking up a Tibetan text using Word styles is to make sure that the basic input has been done correctly. At minimum, the text should be based on the Tibetan-language template, use a Unicode font, have page numbers entered, and have a proper filename. These steps should have been taken care of by the text inputter, and are described in Inputting a Tibetan Text. Once the basic input has been done, the markup process involves applying styles to different elements of the text. The basic steps here are:
- (a) Structure the text with subheads
- (b) Mark paragraph and verse styles
- (c) Mark citations
- (d) Mark the text’s topical outline (sa bcad)
- (e) Mark names and text titles
Topic headings are often implicit in a Tibetan text, but given the format of a traditional Tibetan book, they are difficult to identify: they are not marked in any special font, are not numbered, and are often barely distinguished from the body text that surrounds them. A first read-through of a difficult text might involve hours (or days) of trying to identify the basic chapter and topic divisions around which the work is organized. Thus, having an electronic edition with clearly marked sections, chapters, and subject divisions is a major benefit. Adding subheads also makes lengthy texts navigable; by using features like Word’s “document map,” a list of chapters and subheads can be easily browsed, and clicking on one can take you to that section in the document.
Subheads are material that you add to the text. Thus, no part of the text itself (none of the author’s words) are marked as subheads. Even though an author might identify a topic heading, saying for instance “Now, the third topic: a detailed etymology,” you would not mark this as a subhead. Rather, you add a subhead to the text and give it a sensible wording (in Tibetan): “Topic Three: A Detailed Etymology.” The subhead text that you create should end with a final shad.
The first subhead that you should add to the text is the title of the work itself. (This may have already been done by the inputter.) In keeping with the above rule, this is something that you add to the text; it is not the title that was typed in when the title page was input. After typing the text title, mark it with the style Heading 1.
Next, add subheads to separate out the three most basic divisions of the text, which are its (1) front matter, (2) body, and (3) concluding material. (These sections of a Tibetan text are described in detail in Tibetan Text Section Names.) Mark these three subheads with the style Heading 2, and give them each a unique number in brackets. The end result will look like:
 ཀླད།  གཞུང་།  མཇུག
The remainder of the process of adding subheads is essentially marking the divisions of each of these three sections, applying the proper subhead style to them, and giving them a unique number and title. For a lengthy, complicated text this might be days of work, while a short unstructured text might not have any more subheads than these three.
Below is an example of the subheads for a simple, hypothetical text, which consists of a front section (containing a title page, and the author’s statement of intent), a body (containing three chapters), and a back section (containing a colophon and a closing invocation).
 ཀླད། [1.1] ཁ་བྱང་། [1.2] དམ་བཅའ།  གཞུང་། [2.1] ལེའུ་དང་པོ་གཞི་བསྟན་པ། [2.2] ལེའུ་གཉིས་པ་ལམ་བསྟན་པ། [2.3] ལེའུ་གསུམ་པ་འབྲས་བུ་བསྟན་པ།  མཇུག [3.1] མཇུག་བྱང་། [3.2] ཤིས་བརྗོད།
Here, the front, body, and back headings would be marked with the style Heading 2, and the divisions of them would be marked as Heading 3. To create further divisions, for instance to create three internal divisions of the first chapter, you could make subheads numbered 2.1.1, 2.1.2, and 2.1.3, each marked with the Heading 4 style.
Note that the enumerations included in the subheads currently need to be marked with the style Added by Editor, as they may need to be removed later; having them marked will make them easy to remove. Be sure to mark the brackets and the space following the closing bracket as well.
- Outlining a Tibetan Text and Cataloging a Tibetan Text provide more information on the sections typically found in Tibetan texts.
- It is usually easy to come up with the names for chapter-level subheads, as these are written in the text itself. However, many divisions won’t be labeled in the text (such as “title page,” “colophon,” “invocation,” and so forth). For the names of these in Tibetan, see our Tibetan Text Section Names.
The text that is contained within the subheads should be marked so that it will display properly. Prose should be marked with the style Paragraph, while lines of verse should be broken up and marked with Verse 1 or Verse 2. (Verse 1 is used for the initial line, while the remaining lines are marked as Verse 2). Insert a carriage return after each line of verse; if a line is followed by two shads, the return is inserted after the second shad. Lines of verse can also be separated into stanzas, by marking the first line of each stanza as Verse 1 (note that this keeps you from having to enter an empty space between stanzas).
Citations from other works are a common feature of many Tibetan texts. While Tibetans of course have conventions for distinguishing quoted material from the author’s own words, these are sometimes imperfectly implemented, leaving the reader to struggle to decipher what is intended to be a quote and what isn’t. Having quotations clearly delineated (formatting them like “inset quotations” in Western typesetting) thus adds major value to an edition. The process is much like that described in step 2, above.
Prose citations are marked with the style Citation Prose 1; if you want to break a prose citation into paragraphs, paragraphs following the first one are marked with the style Citation Prose 2.
Verse citations are marked with the styles Citation Verse 1 (for the first line of a stanza) and Citation Verse 2 (for any subsequent lines). In both cases, these are separated from the author’s text by carriage returns at the beginning and end of the citation. Citations should (but unfortunately don’t always) end with some sort of “close quote marker” in Tibetan, such as ces so, or zhes so. These markers should not be included in the quotation, but appear on the following line. Note that the style Paragraph Continued is used following a quote, to indicate that there is no change in topic following the quote.
Following is an (abbreviated) example from Longchenpa’s Tshig don mdzod that will illustrate how to mark up quotes:
དང་པོ་ནི། ཀློང་དྲུག་པ་ལས། ཡེ་ཤེས་ཉིད་ནི་རྣམ་གསུམ་གྱིས། ། གཞི་ཡི་ཁྱད་པར་ཚིག་ཏུ་བསྟན། ། ཞེས་པ་དང༌། རྡོ་རྗེ་སེམས་དཔའ་སྙིང་གི་མེ་ལོང་ལས། གཞིའི་ཆོས་ཐམས་ཅད་ངོ་བོ་རང་བཞིན་ཐུགས་རྗེ་གསུམ་དུ་ཤེས་པར་གྱིས་ཤིག་ ཅེས་སོ། །
Here, the author gives a brief topical heading, and then states the source of his first citation. This would be in Paragraph style. Following are two lines of verse, which would be in Citation Verse 1 and Citation Verse 2 styles. The close quote marker appears on the next line, and should be in Paragraph Continued style. The author then gives a prose citation, which would be marked in the style Citation Prose 1. Note that for this prose citation, there is no closing shad; a carriage return is made after the final tsheg in the citation, and the close quote marker appears on the next line (in Paragraph Continued style).
A text’s topical outline should be marked in the style named Sa bcad. This is a character style rather than a paragraph style.
The sa bcad will usually come right after a subhead, but occasionally appears within the body of a section. The sa bcad may be a brief statement of what the topic of the section is (“Now an etymology will be given”), or it may simply be an enumeration (“Now, first…”). If the sa bcad ends with a closing shad, also mark that shad in the Sa bcad style. (In the above example, dang po ni/ would be marked in the Sa bcad style.)
The Tibetan-language template contains several styles for marking personal names. In basic markup, you should apply the style Author to the author’s name when it appears in colophons. Also mark other names that appear in colophons, such as translators, treasure revealers, scribes, and so forth. More advanced markup might involve marking the names of deities, places, historical figures, clans, and so forth in the body of the text. If there is no style appropriate for the names you need to mark, you could either create a new one in conjunction with the director of your project, or you could use a generic style like Name Personal Human.
Mark any names of texts with the style Text Title. (In the above example, Klong drug pa would be marked as a text title.) When text titles appear in colophons, mark them with the style Colophon Text Title. Similarly, chapter titles that appear in colophons should be marked in the style Colophon Chapter Title.
Authors often refer to texts without actually giving their names, making oblique statements like “as it says in sutra” (mdo las), “the root tantra states,” (rtsa rgyud las), or simply “the same source mentioned above says” (de nyid las). Mark these as text titles as well (but as with actual text titles, don’t include the particle las in the Text Title style).
The markup of simple texts may just involve creating a few subheads and marking names and colophons. But depending on the project, much more detailed markup can be done. For complicated works, it might be appropriate to apply styles to historical events, dates, religious practices, place names, and so forth. Commentaries that have a root text embedded in them can also have the root text marked, which makes for much easier reading. The Tibetan-language template already contains a wide variety of styles for such purposes, but if a particular project requires styles that have not been created yet, we can easily add these to the template.
Below are some guidelines that should help with the finer details of text markup.
(a) When inserting carriage returns (such as at the end of a paragraph), make sure you insert the carriage return after the shad+white space, and not after the shad but before the white space. It is also important to leave the space there: do not delete it! The idea here is that an electronic edition should be able to be converted into traditional pecha formatting, without all of the international formatting. As this space is intrinsic to the text, if you remove it, the pecha formatting will not appear correctly.
(b) When you have two shad marks after a verse line, insert the carriage return after the second shad so that both shads appear at the end of the line, and the next line begins freshly with no shad in that line at the beginning.
(c) When applying character styles to something that is not a whole sentence (such as for personal names), make sure you highlight the full term including the final tsheg. However, do not highlight a final shad at the end of a term. For character styles that are used to indicate whole phrases or sentences (such as sa bcad), do include the final shad in the style.
(d) Perform all special formatting (such as creating inset quotes, lists, and so forth) by using styles. Any formatting that does not use styles will be lost when the text is converted to XML. If you change the display attributes of particular styles to your own preferences, do so in the styles, but leave the style names the same.
(e) Occasionally you may want to add something to the text to make reading more clear, such as adding numbers before elements in a list. If you do so, mark these with the style Added by Editor. This makes it clear that your addition is not in the actual text, and makes it easy to find additions if you want to remove them.
(f) Note that unicode Tibetan does not always display properly in Windows XP. Microsoft Word’s built-in subheads also will sometimes display oddly. The markup process in Word is primarily about applying styles, rather than worrying about how those styles look. As long as any element of text has the proper style applied to it, it will convert properly to XML, and display properties can be set at that time.
(g) As you read through the text, you may well see problem areas that stem from misinput, conversion mistakes, or other issues. We suggest you use the Word highlighting function to color those areas yellow so that you can easily find them when reviewing the text. Likewise, you may have questions about where a citation begins, whether a shad has been mistakenly omitted, and so forth. These can also be marked so they can be found later.
(h) The above steps for “basic markup” can be done in any order. It may be easiest to first mark your whole text in Paragraph style, then to put in the subheads. Sa bcad style is often easiest to apply as you insert the subheads. Marking text titles and citations at the same time is another obvious way to save time.
(i) To add styles for page numbers and line numbers in Dege Kangyur etexts that have had these styles stripped before Esukhia proofreading:
Open Replace box
select Use Wildcards
Find what: ([)(?a)(])
Replace with: 123 (and in formatting select the PageNumber,pgn style)
Press replace all.
Then Find what: ([)(?b)(]) and do replace all
Then Find what: ([)(??a)(]) and do replace all
Then Find what: ([)(??b)(]) and do replace all
Then Find what: ([)(???a)(]) and do replace all
Then Find what: ([)(???b)(]) and do replace all
Open Replace box
select Use Wildcards
Find what: ([)(?a.?)(])
Replace with: 123 (and in formatting select the TibLineNumber,tln style)
Press replace all.
Find what: ([)(?b.?)(])
and do replace all
Find what: ([)(??a.?)(])
and do replace all
Find what: ([)(??b.?)(])
and do replace all
Find what: ([)(???a.?)(])
and do replace all
Find what: ([)(???b.?)(])
and do replace all
Documentation on constructing the syntax for such search-and-replace operations can be found on the following webpages:
This contains a list of official THL markup styles and procedures, which will be updated as more are formalized. Try to follow these styles on your markup project, though all of them may not be applicable to any particular project.
Introductory scenes go in the main body of the outline (i.e. in the gzhung section), they are not part of the front matter (the klad section).
There appear to be two ways that introductory scenes are conceptualized by Tibetan authors. (1) They are given their own chapter, for instance “chapter 1” of a tantra being the introductory scene. (2) They are not distinguished from the first chapter, although the first chapter really has its own subject. In other words, they are not really part of the stated subject matter of the first chapter, but there is no sa bcad that distinguishes them from the first chapter.
The first case presents no problem for markup, just make the first chapter the introductory scene, as it is in the work you are marking up. In the second case, break the introductory scene out of the first chapter, and give it its own subhead (even though the work itself doesn’t do this). Here, if there are two introductory scenes, you will want to give it an overarching subhead “introductory scene” (gleng gzhi), and then within that make subheads for the individual introductory scenes.
It is quite helpful to have names identified in the markup. The NormalTib template has a selection of name-styles (Name Buddhist Deity, Name Personal Human, etc.). There are some special cases for names that should be kept in mind:
If the name is the name of someone who is speaking (“Then Dorje Chang said…”), mark it with a “speaker” style. In XML, a name could be marked as a “name” and then have a qualifier attached indicating that it is also a “speaker,” a “deity,” and so forth. This is not possible in Word, so we have created a group of specialized name-styles like Speaker Buddhist Deity to deal with the problem.
Sometimes it may be helpful to mark “epithets.” For example in “Then the great Teacher said…,” the word “Teacher” is an epithet rather than a name. These can be marked with the Epithet style. Note that there is also a Speaker Epithet style.
It is fairly common to have collective speakers: “Then the host of deities spoke with a single voice.” Such collective names are marked with the Epithet style.
In projects that intend to create an exact copy of the paper version of a text, this shad should be used. In other projects, it can simply be replaced with an ordinary shad, as the rin chen spungs shad is simply an artifact of where the line-breaks occur in a paper text.