THL Toolbox > Developers' Zone > THL Database Technologies and Models > Our Guide to TEI XML Markup > Text Markup - Structural Divisions
Contributor(s): Nathaniel Garson & David Germano
Major structural divisions of a text are built into the TEI DTD through the children of the <text> element. These are <front>, <body>, and <back>. The only required one of these is the body element. A text does not necessarily need to have a front or back element. No ID or other attribute is required for these three major divisions as their names are distinct enough. In general, the front matter contains the title page, preface, dedication, and so forth; the body contains the chapters of the text (usually represented by <div1> elements), and the back includes afterwards, indices, colophons, and so forth. Subdivisions of any of these three are represented by "div" elements, described in the next section.
Subsequent divisions of either the front, body, or back are recorded using the <div> elements. There are two types of <div> elements. The unnumbered one, <div>, and a series of ten numbered ones, <div0> through <div9>. In the mark-up of modern Scholar's essays or monographs, the fundamental sections of the body of the work (chapters of a book or sections of an article) should be marked up with <div1> tags; subsections with <div2> tags; sub-subsections with <div3> tags, and so forth. For the mark-up of Tibetan texts, as they may have an outline of up to 20 or more nested levels, we are using the unnumbered <div> tag, which can be infinitely nested. The attributes used for all <div> elements are:
- type: chapter, section, outline
- n: (number of chapter, section, etc.)
Chapter is self-explanatory. Section refers to a sub-section of a chapter. Outline refers to sa bcad. In an outline of a Tibetan work, the <div>s can be nested but they cannot overlap. In other words, if a <div> is opened inside another <div> it must be closed within that same <div>. Therefore, if outline and chapter breaks overlap, the <div>s can be used to record only one of them, generally the outline. The other is recorded using <milestone> markers. The “n” attribute gives the straight number of the chapter or section, such as <div n="4" type="chapter"> would be the opening tag for chapter four.
Controlling the Section Numbering
To turn off the automatic numbering of the text's sections:
<text … rend="no-outline"> … </text>
this turns off all the automatic numbering.
To turn off just the <head> tag's auto-numbering, add rend="unnumbered" to the <head> tag itself.
Headers are self-explanatory. The element is <head> and they generally are found as the first child of a <div> element. If the editor is adding a header for clarification, when the actual text is not there, then an <add> element should be placed immediately inside the head element and then the header’s text:
<head><add resp=“ndg”>This is a header added by NDG</add></head>
In this case, “ndg” must match the ID attribute value for a <name> included in the metadata, indicating who “ndg” is.
The essential paragraph element is <p></p>. Prose of any sort should be marked-up with these <p>, or paragraph elements. They have the standard attributes but in general these do not need to be used unless a specific need arises. The paragraph element can take a number of relevant children including but not limited to clauses (<cl>), quotations (<q> or <quote>), phrases (<phr>), sentences (<s>), links (<ref>, <xref>, etc.), lists (<list>), numbers (<num>), titles (<title>), and a variety of name elements (<persName>, <placeName>, etc.). Paragraphs cannot be nested within one another but act as the basic content for <div>s that can be nested. Within paragraphs, the <s> element can be used to distinguish sentences, but this is not required. The text of each paragraph should be enclosed in <p> and </p> tags.
For quotations or citations that are a single paragraph in length, the quotation elements described below can be used in the same manner as a <p>, paragraph element. These are <q> for spoken quotations and <quote> for textual quotations. If these elements are used outside of a <p> element, they will display as a separate indented paragraph. If such quotes have multiple paragraphs, the <q> and <quote> elements should contain <p> elements. See Text Markup - Citations.
There are two elements for marking up verse. The line group element, <lg>, bundles together a number of lines into a group. Each <lg> element represents a stanza of verse. The lines of verse are tagged with <l>, or line, elements. The <seg>, shad-delimited lines, should be placed fully within the <l> tags. Thus, a <lg> element with four <l> elements within it represents a four-lined stanza .
Lists are dealt with using the <list></list> element, which contains <item>s for children. The rend attribute is used to distinguish between types of list. The two basic types of list are bulletted lists and numbered lists. Numbered lists can make use of several different formats. The rend attribute is used to distinguish between these formats, using the same conventions as HTML:
rend=“A” — Capital Letters
rend=“a” — Lowercase Letters
rend=“I” — Capital Roman Numerals
rend=“i” — Lowercase Roman Numerals
rend=“1” — Arabic Numerals
rend= “bullet” — Bulleted (Normal)
rend= “disc” — Bulleted (discs)
rend= “circle” — Bulleted (circles)
rend= “square” — Bulleted (squares)
rend= “none” — No bullet or number. Allows customized number to be placed at the beginning of each item, such as (1) or A.
The “n” attribute can be used to distinguish a starting number for numbered lists if the first item does not begin with “1”.
In order for the digital text to be fully functional, there needs to be a way to reference a quote or section from it. For this, there needs to be some sort of line numbering. We have decided to calculate line numbers in a digital text through counting the shad-delimited lines. Therefore, each shad-delimited line is marked up with a <seg> or segment element, with type=“shad”. There are 100 such lines to the digital page. The n attribute of the <seg> element should have the page number separated by a dot and then the line number. An example would be: <seg n=“3.24” type=“shad”> …. </seg>. This would be the 324th shad-delimited line in the text.