THL Toolbox > THL Technologies & Open Standards > Technologies For Dummies > Database
Contributor(s): David Germano.
Creating structured information prededes even the birth of language, and is obviously an central need and feature of human culture. As new technologies – such as spoken language, written language and computers – have developed, the capacity and means to efficiently and creatively structure information have correspondingly transformed. Such formal structure allows for easy browsing of the information, or “navigation”, such that users can easily glance at the information and compehend the types of information and relationships between them. In addition, digital technology allows for sophisticated types of automated searching, thereby formalizing a bifurcation between browsing on the one hand, and searching on the other hand.
A simplisitic introduction to thinking about structuring information in a digital context can begin with the basic format of a “table” or “spreadsheet” consisting of rows and columns of data. Each column contains a specific type of data – say names, addresses, phone numbers, job position and so forth. The rows then contain an array of specific data for specific instances of the phenomena under examination – one row for each instance. Thus row one has the corresponding data for the first person, row two for the second person, and so forth.
A “database” at its most basic level is simply a table or spreadsheet that doesn’t have to be viewed in that tabular format. Thus instead of columns, the general types of data being recorded are “fields”, while instead of rows we have “entries” which fill out the fields with specific data for a given instance of the thing being considered. Thus a database of people may have fields for name, address, phone numbers, job positions and so forth. Databases can be viewed in flexible ways, such as a tabular view presenting fields as columns, or with the fields arrayed on the page in various ways.
The real power of databases however is in relational databases. Relational databases allow one to link different databases together, each of which is referred to as an individual “table”. Thus a given field of information in one table, can be actually filled out with a field from another table. For instance, a table of contact information could have people’s names, addresses, phone numbers, job positions and so forth. However, the name itself may be drawn from a “biographical” table, which is a table docuemnting a person’s name, birth date, social security number and other information. In this way, one can create a powerful series of linked sets of structured information allowing for powerful browsing and searching. A “database” in the sense of structured information can be expressed using a variety of technologies.
Whatever the type of database one is using, one of the central isues is whether the database is intended to be a local application residing on a single person's hard drive, or a networked database that can be accessed by multiple users over a local network or the Web. A database designed for one or the other use is not immediately usable in the other context, and may very well require a completely independent development. For local use, another issue is whether the database can be used freely by anyone once designed, or whether any user has to separately purchase the supporting database technology. For example, FileMaker sells a relatively inexpensive developer's kit that allows one to freely disseminate any databases built with it for use for data input. In contrast, if you design an Access database, each user who utilizes it has to indepenently purchase Access from Microsoft. A major challenge for collaborative projects can be how to merge local databases used for editing, and Web databases used for delivery to the public. One can design online interfaces for editing, but contributors often prefer to do much of their work offline. However, having work done offline means the work must be migrated into the online database, and this can create problems in version control.
In the early years of personal computing, a specific type of language known as “structured query language” developed by IBM came to be the dominant technology to use for constructing relational databases. This is most commonly known by the abbreviation SQL, pronounced “sequel”. It is a computer language that allows one to structure information and pose systematic questions, or queries, of those structures. Different flavors of SQL have emerged over time, some of the most popular in academic contexts being MYSQL and POSTGRES.
Conventional databases can be built using a variety of technologies, most of which reply upon some version of SQL language. These technologies can be free as well as commercial. The most common free database technology is MYSQL, while another such technology is PostgreSQL (often called Postgres). In terms of commercial databases, Microsoft offers several different technologies for creating databases: Access, FoxPro (or Visual Foxpro), and SQL Server (sometimes called MS SQL). Access is for small applications (recipes, rolodex etc., Foxpro is for small to medium sized databases, and SQL server is for large databases (over a million records). FileMaker Pro is another database technology reknowned for its ease of use and designed for modest sized databases. Oracle is an expensive database technology designed for large databases. All of these are commercial packages that need to be purchased, with the price increasing exponentially in direct relation to how large of a database the packages can support.
The issues that one must be concerned with in terms of deciding what type of database technology to employ include:
- how many simultaneous user connections can it support
- speed of performance
- backup/restore procedures
- scalability and capability in terms of dealing with large files
- versatility in interacting with a number of interfaces, including command-line clients, Web browsers, and various programming interfaces such as C+, Perl, Java, PHP, and Python.
In addition to the actual database structure, one must employ a scripting language to handle issues of display and interaction. The most popular free scripting language is PHP, while other free languages include JSP (Java Server Pages). Commercial database packages include their own proprietary scripting languages, while Cold Fusion is a popular commercial package often used with Access or MYSQL databases. Recently Ruby on Rails has become a rapidly expanding framework in which to build databases.
More recently, EXtensible Markup Language (XML) has emerged as an intriguing if at times frustrating medium for creating databases. Indeed, one of the most tricky issues in projects has been whether to use XML or SQL for creating databases, or some combination of the two. XML is now widely used for Web-based projects as a way to markup text, since it allows for the intellectual identification of information to be separated from its visual reprsentation. Thus one can “tag” a word as representing “strong emphasis” or “weak emphasis”, which later can be rendered however one wants – as bold and italic, or as red and blue, or however. This allows for powerful searching, such that one can search through a series of essays for a given word only as it appears in place names, which is possible if the XML has tagged words according to whether they are place names.
Many people have extended this use of XML to create structured templates of information. The basic difference between relational databases and XML databases is that the former keeps the structure of the information and the actual data seperate, while the latter actually embeds the structure direclty within the data. Thus XML data carries its structure with it. This allows for a much greater flexibility in expressing relationships between types of data, as well as a powerful capacity for refining and qualifying the typing of data. Thus complex relationships and analysis of data which in SQL can result in a messy spaghetti-like array of crisscrossing relationships between tables upon tables. In contrast, XML may be able to elegantly express the same relationships. Once built, XSLT stylesheets can then render the data in a variety of ways. One also must independently apply tools for indexing and searching the data.
The downside of XML is that tools for developing, processing and searching XML databases are not at the same level of SQL tools in terms of availiblity, user-friendliness and cost. In addition, the powerful relationships XML allows can also take an extraordinary amount of time to express in the creation of data.