THL Toolbox > Offline THL > Contributing to THL Offline
Contributor(s): David Germano
Initially, THL editing and contribution processes were purely offline. This quickly became a major challenge. With contributors located all around the world, and a single process often involving multiple people working on different phases, offline technologies quickly generate a bewildering chaos of version conflicts (people have the same data in different versions), large amounts of data lost because it was in private hands without backing up, large amounts of staff time required for refining contributions so they are in proper form for the digital library, and other such problems. Thus from THL's founding in 2000, we have aimed at producing online interfaces for contributing, editing, and refining data. Such interfaces prevent version conflict, ensure data is properly backed up, and help enable non-technical contributors to put their data into formats that can be easily published without being delayed through the necessity of extensive staff intervention. However, there is one drawback, namely that it disenfranchises people from the communities in Tibet and the Himalayas who may not have easy access to the Internet. Thus, we are trying to produce - though this is admittedly a long term project - a system that offers offline mechanisms for contributions that can then be migrated into the online repositories. In this sense, the whole of THL's Toolbox supports this effort with its survey of tools and guidelines supporting contributors without great technical background support.
Spreadsheets have great virtues for inputting and revising data on a broad number of items. In this case, each row can represent a new instance of a feature - a monastery, polity, etc. - and each column represents a specific category of information for that feature. Since the data is all on a single page which doesn't require multiple clicks and saves, it is very efficient. When completed, the data can then uploaded into the online repository. However, there are two problems with this workflow process. Firstly, spreadsheets are great if you are just drawing upon one source of information, but if you drawing upon multiple sources, it can quickly get very messy as you struggle to record in a clear way the source of each item of information you record. Secondly, if the spreadsheet is documenting a feature which is already in, for example, the online Place Dictionary, importation can be very difficult since the spreadsheet data and online data have to synchronized with each other.
In contrast, using the online editorial interface in THL has the advantage of inputting the data directly into the repository, so that the editor him/herself synchronizes the extant and new data as they work. In addition, each piece of information can be precisely sourced in accordance with the repository's own standards. However, the significant drawback is that the Internet connection can be slow, you have to constantly save pieces of information as you input them, and you have to typically navigate back and forth between different editing components.
It may also make the most sense to input the data in a structured way within a word processing document. Then, when you are satisfied with it, you can cut and paste into the online editing interface.
In practice, while we would like to offer a one size fits all approach, even the same type of task - documenting monasteries, say - can be best done by very different workflow processes, depending on precisely the character of the tasks, and the proclivities of the person doing the task. A few scenarios are offered to help you consider the best workflow process that befits you and your task. In addition, often a given task - such as documenting monasteries or polities - may in fact consist of a multitude of subsidiary tasks, each of which requires its own workflow process.
1. Spreadsheet & Import: using an Excel or other form of spreadsheet, and then importing directly into the relevant repository. This makes the best sense when you have a lot of individual items, and a set range of discrete bits of short information you want to record about each. However, you have to make sure that the columns of your spreadsheet precisely correspond to fields of data in the repository. It is important not to combine separate pieces of information in a single spreadsheet cell. An addition thing to watch out for is sourcing. Often people will leave one column for source information, but then put multiple sources into it. It then becomes entirely unclear as to where each piece of information precisely came from. For example, which text and page number is the primary name for, vs. the secondary name? This may be fine for sloppy generic work, but is unacceptable for rigorously scholarly archives. Again, follow the principle that each column in the spreadsheet should precisely correspond to a field of information in the repository. If that is possible, you will be ok. Finally, if some types of data is very texty, and exceeds a line or two, you may find Excel a cumbersome way to input the data. In that case, you may want to Excel for most of the data, but also keep a word processing file which contains just the narrative information. The former might then be automatically imported into the repository, while the latter might be cut and paste into it.
2. Word processing & cut-and-paste: using a word processing document, and then cutting and pasting into the repository using the repository's own online interface. This makes sense especially if there is extensive descriptive data. It makes less sense if the data is highly structured for each item, and the data relatively short bits of text and numbers. The reason is that then you have a huge task of cut and paste when you are done with your data.
If you dislike spreadsheets, but are comfortable with tables in word processing programs, of course you can simple design tables following the same principles as spreadsheets - these two are easily convertible back and forth with each other.
3. Online Editing: this makes the best sense if you just have a few items to fill in, or if the editing interface feels fairly efficient to you for the type of data you have to input.
We have worked extensively on a system for creating essays and Tibetan classical texts which relies only on the mastery of a word processing program - Microsoft Word. We then convert these at THL Central into powerful XML formats, but the contributors only need to master the use of the word processing program and our guidelines for its use. Admittedly, the guidelines can be complex, but they are far easier to deal with than the purchase and mastery of XML editing programs and processes.
We have created, and are still refining, offline databases for simplifying the work of created structured description of various places, communities and institutions, even as we also build new online editing mechanisms for our Gazetteer.
See Places & Geography.
Initially we worked extensively with free or commercial offline image management systems that we then converted into our online formats. In 2007 we created an online image management system, and in the future hope to create an integrated offline application. In the meantime, the previous systems can still be used for offline work, even if the conversion is rather cumbersome for THL staff.
All editing and preparation of audio-video is done offline, though the cataloging is typically done online. However, it is not difficult to do the cataloging in a word processing program and have someone manually migrate it. We do not, however, at present have a structured database program for such offline work. See Audio-Video.
We are particularly proud of our work in making an easy to use a stand alone program for transcribing audio-video in Tibetan and other languages which has been used extensively offline in Tibet by students and faculty. See QuillDriver.