Checking Converted Bibliographic Records
<h3 class="heading-h6"><a name="THLToolboxhomegtTibetanTextsgtCatalogingaTibetanTextgtConvertingEntriesgtProofreadingConvertedBibliographicRecords" class="anchorpoint"></a><a href="/tools/wiki/home.html">THL Toolbox</a> > <a href="/tools/wiki/Tibetan%20Texts.html">Tibetan Texts</a> > <a href="/tools/wiki/Cataloging%20a%20Tibetan%20Text.html">Cataloging a Tibetan Text</a> > <a href="/tools/wiki/Converting%20Entries.html">Converting Entries</a> > Proofreading Converted Bibliographic Records</h3><p class="paragraph">
</p><h3 class="heading-h1"><a name="ProofreadingConvertedXMLBibliographicRecords" class="anchorpoint"></a>Proofreading Converted XML Bibliographic Records</h3><p class="paragraph"><strong class="bold">Contributor(s)</strong>: THL Staff.</p><p class="paragraph">Once the bibliographic records have been converted into XML files, they need to be checked and cleaned up. This involves the following tasks:</p><ul class="star"><li>Opening up each XML record in an XML editor and making sure it validates against the DTD. If it does not, then correcting problems in the document until it validates, or else try reconverting.</li>
<li>Looking at the bottom of the document for comments and notes left by the cataloger. Some of these comments may be information that needs to be entered into the XML as a note or discussion. All the cells in the third column of the entry table get converted into a comment at the bottom of the XML file. Each row's comment is preceded by the label in the first cell of the row, and the comments for different rows are separated by a row of asterisks. Some of the comments will be to add information in the markup, such as pagination for multi-volume texts. Others may be specific instructions to the converter or proofer, such as to check the accuracy of some information.</li>
<li>Once a volume of XML records has been sufficiently cleaned-up, the entire volume should be zipped into a single file called N-Kg-v###-xml.zip and uploaded to that volume's folder in the Canons Resources page.</li></ul><p class="paragraph">
</p><h3 class="heading-h3"><a name="MarkupforNotes" class="anchorpoint"></a>Markup for Notes</h3><p class="paragraph">Insert a <note> tag with the following attributes <note id="N-0004-bib-n1" resp="snw" type="translators">
</p><ul class="star"><li><ul class="star"><li><strong class="bold">id:</strong> the name of the file with <strong class="bold">xml</strong> removed and replaced with <strong class="bold">n1</strong> for the first note in the file, <strong class="bold">n2</strong> for the second note in the file, etc.</li>
<li> <strong class="bold">resp:</strong> your three-letter initials.</li>
<li><strong class="bold">type:</strong> the name of the tag to which this note refers. If the note is about the translators, then enter <em class="italic">translators</em>; if the note is about the colophon, then enter <em class="italic">colophon</em>; etc.</li>
<li><strong class="bold">Placement of note:</strong> insert the note just before the close tag of the section to which it refers (<note> is allowed in some places and not others, so you may have to experiment to find where it is allowed for a particular tag; keep a list of difficulties). For example, a note on the translators would go here:<br/></li></ul></li></ul><div class="code"><pre><origination>
<head>Provenance</head>
<respdecl type=<span class="java-quote">"translator"</span>>
<persName n=<span class="java-quote">"Indian Scholar"</span> lang=<span class="java-quote">"tib"</span> key=<span class="java-quote">""</span>><!--'phags pa gzhi thams cad yod par smra ba'i 'dul ba 'dzin pa kha che'i bye brag tu smra ba'i slob dpon dzi na mi tra/--><span class="size3">འཕགས་པ་གཞི་ཐམས་ཅད་ཡོད་པར་སྨྲ་བའི་འདུལ་བ་འཛིན་པ་ཁ་ཆེའི་བྱེ་བྲག་ཏུ་སྨྲ་བའི་སློབ་དཔོན་ཛི་ན་མི་ཏྲ།</span></persName>
<persName n=<span class="java-quote">"Tibetan Translator"</span> lang=<span class="java-quote">"tib"</span> key=<span class="java-quote">""</span>><!--zhu chen gyi lo ts+tsha ba ban de klu'i rgyal mtshan/--><span class="size3">ཞུ་ཆེན་གྱི་ལོ་ཙྪ་བ་བན་དེ་ཀླུའི་རྒྱལ་མཚན།</span></persName>
</respdecl>
<strong class="bold"><note id=<span class="java-quote">"N-0004-bib-n1"</span> resp=<span class="java-quote">"snw"</span> type=<span class="java-quote">"translators"</span>>This translator team was responsible <span class="java-keyword">for</span> translating the first half of the text.</note></strong>
</origination></pre></div><ul class="star"><li>Correct problems that occurred with the Wylie to Unicode Tibetan conversion. In Tibetan fields there may occur phrases of Wylie in brackets in the middle of the Tibetan. These are chunks of Wylie that did not convert and need to be fixed. When this occurs it is usually in Sanskrit titles and is most often the result of incorrect THL Extended Wylie (such as forgetting to add a + between letters in non-standard stacks) in the Word doc. Much less frequently this results because the stack has not yet been created in Tibetan Machine Uni.<br/><strong class="bold">Example:</strong> <span class="size3">ཨཱ་</span>[rya]<span class="size3">་པ་ཉྩ་བི་</span>[ngsha]<span class="size3">་ཏི་ཀ་པྲ་ཛྙཱ་པཱ་ར་མི་ཏཱ་མུ་ཁ་ནཱ་མ་མ་ཧཱ་ཡཱ་ན་སཱུ་ཏྲ།</span><br/><strong class="bold">Solution:</strong> manually create <span class="size3">ཪྱ</span> and <span class="size3">ངྴ</span> and replace the Wylie (including the square brackets) with them. Also make the corresponding correction to the Wylie within the comment (<!<strike class="strike"> </strike>>) tag. <strong class="bold">Note:</strong> be sure there is only one <em class="italic">tsheg</em> between the text you correct and the syllable that follows it.</li></ul><p class="paragraph">
</p><h3 class="heading-h3"><a name="MarkupforDiscussionPhysfacet" class="anchorpoint"></a>Markup for Discussion, Physfacet</h3><p class="paragraph">Data in the Word general discussion field dealing with page numbering issues should have the following XML markup:</p><div class="code"><pre><physfacet type=<span class="java-quote">"Distinctive features"</span>>Page 619 is misnumbered 618. (snw)</physfacet></pre></div><p class="paragraph">This goes in the XML file here:</p><div class="code"><pre><physfacet lang=<span class="java-quote">"tib"</span> type=<span class="java-quote">"Script"</span>>
<!--dbu can/-->
<span class="size3">དབུ་ཅན།</span>
</physfacet>
<physfacet type=<span class="java-quote">"Distinctive features"</span>>Page 619 is misnumbered 618. (snw)</physfacet></pre></div><p class="paragraph">
</p><h3 class="heading-h2"><a name="MarkupIssues" class="anchorpoint"></a>Markup Issues</h3><p class="paragraph">The following list is one of various markup issues and how they are resolved in XML. These problems may result in the XML document not validating. So, if your document doesn't validate, check the list below. (It will gradually get longer as further questions come up.)
</p><h3 class="heading-h4"><a name="MasterIDIssues" class="anchorpoint"></a>Master ID Issues</h3><p class="paragraph">(problem with Narthang Converter prior to 3/26/2007): the Master ID #, which in the Canons Project is garnered from Phil Stanley's database, is marked up in an <idno> element that follows immediately after the <tibid> … </tibid> that provides unique identification information for that version of that text, preceding any other <idno> that is recorded in the Tibbibl record. The idno element must have a type attribute set to "master". Thus, in the Kangyur it would appear as follows:</p><div class="code"><pre>… </tibid>
<idno type=<span class="java-quote">"master"</span>>0014</idno>
<idno type=<span class="java-quote">"eimer"</span>>14</idno></pre></div><p class="paragraph"><strong class="bold">Note:</strong> In versions of the Narthang converter that pre-dated 3-26-2007, this information was dropped. Any records converter prior to that date should be checked and the information entered by hand. The information can be found in the Kangyur and Tengyur Title lists.
</p><h3 class="heading-h4"><a name="TextsSpanningMultipleVolumes" class="anchorpoint"></a>Texts Spanning Multiple Volumes</h3><p class="paragraph">About 60 texts span multiple volumes. In such cases, the cataloger will insert references to the ending volume in the third column. This will create notes in the comment at the end of the XML document, which will need to be converted by hand into XML, according to these instructions:
</p><ul class="star"><li><strong class="bold">volume information:</strong> this markup follows the comment "VOLUME INFO" and looks like this:</li></ul><div class="code"><pre><!--VOLUME INFO-->
<tibid type=<span class="java-quote">"volume"</span> system=<span class="java-quote">"number"</span>>24
<altid system=<span class="java-quote">"letter"</span> lang=<span class="java-quote">"tib"</span>><!--nga--><span class="size3">ང</span></altid>
<tibid type=<span class="java-quote">"text"</span> system=<span class="java-quote">"number"</span>>3</tibid>
</tibid></pre></div><p class="paragraph"><tibid type="volume"> is the sequential volume number within the edition and is the means for connecting to the volume bibliographic record; in this example, it is the 24th volume in the edition. The letter ID (<altid system="letter">) generally restarts with each genre and therefore is not unique; it is included for reference purposes. In this example, it is the fourth volume of the genre. The text number (<tibid type="text">) refers to the number of the text within the volume; in this example, the text is the third text in the volume.</p><p class="paragraph">When a text begins in one volume and ends in another volume, this markup should be repeated for each volume. Assign an "n" attribute to each <tibid type="volume"> tag, indicating which volume it is in the sequence of the volumes that contain this text: "n="1" indicates the first volume in which the text occurs; n="2" indicates the second volume in which the text occurs; n="3" indicates the third volume in which the text occurs; etc. You also need to enter this number in the VOLUME INFO comment. <strong class="bold">Example:</strong> for a text the beginning of which is the third text in volume 24 and which ends as the first text in volume 26, the markup would be:</p><div class="code"><pre><!--VOLUME 1 INFO-->
<tibid type=<span class="java-quote">"volume"</span> system=<span class="java-quote">"number"</span> n=<span class="java-quote">"1"</span>>24
<altid system=<span class="java-quote">"letter"</span> lang=<span class="java-quote">"tib"</span>><!--nga--><span class="size3">ང</span></altid>
<tibid type=<span class="java-quote">"text"</span> system=<span class="java-quote">"number"</span>>3</tibid>
</tibid>
<!--VOLUME 2 INFO-->
<tibid type=<span class="java-quote">"volume"</span> system=<span class="java-quote">"number"</span> n=<span class="java-quote">"2"</span>>25
<altid system=<span class="java-quote">"letter"</span> lang=<span class="java-quote">"tib"</span>><!--ca--><span class="size3">ཅ</span></altid>
<tibid type=<span class="java-quote">"text"</span> system=<span class="java-quote">"number"</span>>1</tibid>
</tibid>
<!--VOLUME 3 INFO-->
<tibid type=<span class="java-quote">"volume"</span> system=<span class="java-quote">"number"</span> n=<span class="java-quote">"3"</span>>26
<altid system=<span class="java-quote">"letter"</span> lang=<span class="java-quote">"tib"</span>><!--cha--><span class="size3">ཆ</span></altid>
<tibid type=<span class="java-quote">"text"</span> system=<span class="java-quote">"number"</span>>1</tibid>
</tibid></pre></div><ul class="star"><li><strong class="bold">pagination and extent:</strong> these will need to be modified for texts that span multiple volumes. The standard markup for the page range of a text is:</li></ul><p class="paragraph">
</p><div class="code"><pre><pagination type=<span class="java-quote">"block"</span>>
<num n=<span class="java-quote">"begin"</span>>262b.1</num>
<num n=<span class="java-quote">"end"</span>>645b.7</num>
</pagination></pre></div><p class="paragraph">For multi-volume texts, there needs to be a list of page ranges for each volume. This is achieved by wrapping each pair of beginning and ending <num> elements within a parent <num> element whose type attribute is set to "volume" and whose n attribute is set to the number of the volume within the sequence of volumes in which the text occurs, as follows:</p><div class="code"><pre><pagination type=<span class="java-quote">"block"</span>>
<num n=<span class="java-quote">"1"</span> type=<span class="java-quote">"volume"</span>>
<num n=<span class="java-quote">"begin"</span>>262b.1</num>
<num n=<span class="java-quote">"end"</span>>653b.7</num>
</num>
<num n=<span class="java-quote">"2"</span> type=<span class="java-quote">"volume"</span>>
<num n=<span class="java-quote">"begin"</span>>1a.1</num>
<num n=<span class="java-quote">"end"</span>>703b.7</num>
</num>
<num n=<span class="java-quote">"3"</span> type=<span class="java-quote">"volume"</span>>
<num n=<span class="java-quote">"begin"</span>>1a.1</num>
<num n=<span class="java-quote">"end"</span>>453a.3</num>
</num>
</pagination></pre></div><p class="paragraph">Here, <num n="1" type="volume"> indicates that this is the pagination of the text in the volume in which the text begins; <num n="2" type="volume"> indicates that this is pagination of the text within the second volume in which it occurs; etc.</p><p class="paragraph">As for the extent of the text, this will have to be calculated by adding up all the paginations (full or partial) for each volume into a total number of sides. This figure should be correct if the cataloger added all the sides of the volumes other than the first volume and entered this figure in the page "differential field" of the Word entry form. In the XML file there should be a note associated with the "page differential" field indicating how the figure in the "page differential" field was calculated.
</p><h3 class="heading-h4"><a name="PaginationIssues" class="anchorpoint"></a>Pagination Issues</h3><p class="paragraph">Most paginations go within a tag by the same name. However, in a few situations the <source> element is used. In either case, there are two types of paginations that are marked up slightly differently:
</p><ul class="star"><li>if an item falls within a single page and line, that page and line reference simply goes within the pagination element, as in:</li></ul><p class="paragraph">
</p><div class="code"><pre><pagination n=<span class="java-quote">"bound"</span>>262b.2</pagination></pre></div><p class="paragraph">
</p><ul class="star"><li>a page range is marked up as two <num> elements within the parent pagination element, as follows:</li></ul><div class="code"><pre><pagination type=<span class="java-quote">"block"</span>>
<num n=<span class="java-quote">"begin"</span>>262b.1</num><num n=<span class="java-quote">"end"</span>>264a.4</num>
</pagination></pre></div><p class="paragraph"><strong class="bold">Note:</strong> this differs from how paginations are entered in the Word entry form (e.g., "262b.1-264a.4"). The converter eliminates the dash and adds the markup.
</p><h3 class="heading-h4"><a name="ProvenanceIssues" class="anchorpoint"></a>Provenance Issues</h3><p class="paragraph"><em class="italic">Provenance</em> refers to the people who were involved in creating the artifact (text) being cataloged, from the author to the printer of the particular edition and all those in between, as well as other data about its creation such as the place, date, etc. In the Tibbibl markup, this information is wrapped in <origination> … </origination> tags that contain <respDecl>. The issues below deal with this section of the markup.</p><ul class="star"><li><strong class="bold">empty provenance ("origination") will not validate:</strong> if there is no colophon information for a text, there will be no "origination" (provenance) information either. This will make the resulting XML file NOT validate because it will contain an empty origination element that will look like this:</li></ul><div class="code"><pre><origination>
<head>Provenance</head>
</origination></pre></div><p class="paragraph">If you delete this markup, then the XML will validate.</p><p class="paragraph">Furthermore, since this means there is no colophonic information, the markup for the colophon should also be deleted. It looks like:</p><div class="code"><pre><physdecl rend=<span class="java-quote">"Colophon"</span> n=<span class="java-quote">"colophon"</span>>
<head>Colophon</head>
<discussion type=<span class="java-quote">"Colophon"</span> n=<span class="java-quote">"contents"</span> lang=<span class="java-quote">"tib"</span>><!----></discussion>
<pagination type=<span class="java-quote">"Colophon"</span> n=<span class="java-quote">"block"</span>/>
</physdecl></pre></div><p class="paragraph">If the tags are empty as above, they should be deleted. If you have any questions, contact Steve or Than.
</p><h3 class="heading-h4"><a name="AdditionsMadetotheTibetanTextafterOriginalCarving" class="anchorpoint"></a>Additions Made to the Tibetan Text after Original Carving</h3><p class="paragraph">When an addition has been made to a Tibetan text – indicated by three or four dots that connect the inserted material to the place it was omitted, much like an annotation – the cataloger will add the XML markup below in the Word doc. The proofreader needs to check this to make sure it is correct. An example of this from a Tibetan text is:</p><p class="paragraph"><img src="https://collab.itc.virginia.edu/access/content/group/c06fa8cf-c49c-4ebc-007f-482de5382105/THDL%20Toolbox%20Images/inserted-text.bmp" alt="Screen shot of material added to a Tibetan text" title="Screen shot of material added to a Tibetan text" border="0"/></p><p class="paragraph">In the bottom line, a smaller <span class="size3">མ</span> was added below and slightly to the left of the regular-sized <span class="size3">མ</span> probably because the second <span class="size3">མ</span> was mistakenly omitted when the block was originally carved. If this represented the ligature <span class="size3">མྨ་</span> then there would not be space between the two letters and they would be directly above/below each other. <strong class="bold">Markup:</strong></p><div class="code"><pre><span class="size3">ནཱ་མ་</span><add place=<span class="java-quote">"infralinear"</span> resp=<span class="java-quote">"editor"</span>><span class="size3">མ་</span></add><span class="size3">ཧཱ་ཡཱ་ན་</span>…</pre></div><p class="paragraph">In the unlikely event that you know the name of the person responsible for making the addition to the Tibetan text, enter that rather than "editor." Also, select the value of the place attribute from the following list:</p><p class="paragraph"><br/><strong class="bold">inline</strong> addition is made in a space left in the witness by an earlier scribe
<br/><strong class="bold">supralinear</strong> addition is made above the line
<br/><strong class="bold">infralinear</strong> addition is made below the line
<br/><strong class="bold">left</strong> addition is made in left margin
<br/><strong class="bold">right</strong> addition is made in right margin
<br/><strong class="bold">top</strong> addition is made in top margin
<br/><strong class="bold">bottom</strong> addition is made in bottom margin
<br/><strong class="bold">opposite</strong> addition is made on opposite page
<br/><strong class="bold">verso</strong> addition is made on verso of sheet
<br/><strong class="bold">mixed</strong> addition is made somewhere, one or more of other values
</p><h3 class="heading-h3"><a name="IndividualDataFields" class="anchorpoint"></a>Individual Data Fields</h3><p class="paragraph"><strong class="bold">non-tib title in tibetan:</strong> if there is no non-Tibetan title given in the text, the cataloger will enter<br/>Not specified.<br/>In this case, make sure that lang="eng". The XML for such cases looks like this:</p><div class="code"><pre><title lang=<span class="java-quote">"eng"</span> type=<span class="java-quote">"nontibet"</span>><!---->Not specified.</title></pre></div><p class="paragraph"><strong class="bold">original language:</strong> if there is no original language specified in the text, the cataloger will enter<br/>Not specified.<br/>In this case, make sure that lang="eng". The XML for such cases looks like this:</p><div class="code"><pre><note type=<span class="java-quote">"original language"</span> lang=<span class="java-quote">"eng"</span> place=<span class="java-quote">"unspecified"</span> anchored=<span class="java-quote">"yes"</span>>
<!---->
Not specified.</note></pre></div><p class="paragraph">If there are multiple original languages, just make two different <titlediv> fields, nested within the <titleinfo> and <titlegrp> fields, and fill out the appropriate information for both languages. Change the <titlediv subtype=""> for the new language as well. Here is an example using Sanskrit and <span class="nobr"><img src="/" alt="external link: " title="external link"/><a href="http://babelstone.blogspot.com/2007/05/zhang-zhung-scripts.html" target="rwikiexternal">bru zha</a></span>:</p><div class="code"><pre><titlediv type=<span class="java-quote">"nontibet"</span> subtype=<span class="java-quote">"sanskrit"</span> lang=<span class="java-quote">"san"</span>>
<titledecl>
<title lang=<span class="java-quote">"tib"</span> type=<span class="java-quote">"nontibet"</span>><!--sa rba ta thA ga ta tsi ta ta dz+nyA na gu h+ya r+tha ga r+b+ha bU ha badz+ra tan+t+ra si d+d+hi yo ga a ga ma sa mA dza sa rba bi d+ya sU tra ma hA yA na sa b+hi sa ma ya d+ha rma pa r+ya ya bi byU ha nA ma sU traM/-->ས་རྦ་ཏ་ཐཱ་ག་ཏ་ཙི་ཏ་ཏ་ཛྙཱ་ན་གུ་ཧྱ་རྠ་ག་རྦྷ་བཱུ་ཧ་བཛྲ་ཏནྟྲ་སི་དྡྷི་ཡོ་ག་ཨ་ག་མ་ས་མཱ་ཛ་ས་རྦ་བི་དྱ་སཱུ་ཏྲ་མ་ཧཱ་ཡཱ་ན་ས་བྷི་ས་མ་ཡ་དྷ་རྨ་པ་རྱ་ཡ་བི་བྱཱུ་ཧ་ནཱ་མ་སཱུ་ཏྲཾ།</title>
<title lang=<span class="java-quote">"san"</span> type=<span class="java-quote">"nontibet"</span>>sarbatathāgatacitatajñānaguhyarthagarbhabūhabajratantrasiddhiyogaagamasamājasarbavidyasūtramahāyānasabhisamayadharmaparyayavibyūhanāmasūtraṃ</title>
<title type=<span class="java-quote">"normalized"</span> lang=<span class="java-quote">"san"</span>>sarvatathāgatacittajñānaguhyārthagarbhavyūhavajratantrasiddhiyogāgamasamājasarvavidyāsūtramahāyānābhisamayadharmaparyāyavivyūha-nāma-sūtra</title>
<note type=<span class="java-quote">"original language"</span> lang=<span class="java-quote">"tib"</span> place=<span class="java-quote">"unspecified"</span> anchored=<span class="java-quote">"yes"</span>><!--rgya gar skad/-->རྒྱ་གར་སྐད།</note>
</titledecl>
<pagination>
<num n=<span class="java-quote">"begin"</span>>120b.4</num>
<num n=<span class="java-quote">"end"</span>>120b.5</num>
</pagination>
</titlediv>
<titlediv type=<span class="java-quote">"nontibet"</span> subtype=<span class="java-quote">"bru zha"</span>>
<titledecl>
<title lang=<span class="java-quote">"tib"</span> type=<span class="java-quote">"nontibet"</span>><!--hon pa ni ral til pi bu bi til ti ta sing 'un 'ub hang pang ril la 'ub pi su bang ri zhe hal pa'i ma kyang gu'i dang rod ti/-->ཧོན་པ་ནི་རལ་ཏིལ་པི་བུ་བི་ཏིལ་ཏི་ཏ་སིང་འུན་འུབ་ཧང་པང་རིལ་ལ་འུབ་པི་སུ་བང་རི་ཞེ་ཧལ་པའི་མ་ཀྱང་གུའི་དང་རོད་ཏི།</title>
<note type=<span class="java-quote">"original language"</span> lang=<span class="java-quote">"tib"</span> place=<span class="java-quote">"unspecified"</span> anchored=<span class="java-quote">"yes"</span>><!--bru zha'i skad/-->བྲུ་ཞའི་སྐད།</note>
</titledecl>
<pagination>
<num n=<span class="java-quote">"begin"</span>>120b.5</num>
<num n=<span class="java-quote">"end"</span>>120b.6</num>
</pagination>
</titlediv></pre></div><p class="paragraph"><strong class="bold">Author's Colophon</strong>: As with all chapter-level elements, an author’s colophon is marked by <div2> tags. They are distinguished by their type attributes. Thus, the author’s colophon is marked by <div2 type= "author's colophon"> tags.
</p><h3 class="heading-h6"><a name="ProvidedforunrestrictedusebythespanclassnobrimgsrcsakairwikitoolimagesicklearrowgifaltexternallinktitleexternallinkahrefhttpwwwthdlorgtargetrwikiexternalTibetanandHimalayanLibraryaspan" class="anchorpoint"></a><em class="italic">Provided for unrestricted use by the <span class="nobr"><img src="/" alt="external link: " title="external link"/><a href="http://www.thdl.org" target="rwikiexternal">Tibetan and Himalayan Library</a></span></em></h3>