Thursday, September 11, 2008

Week Four Readings

"Database" - Wiki
  • A Computer Database = structured collections of records or data, organized by a database model, that is stored in a computer system
  • The computer database relies upon software, known as database management system, to organize the storage of data
  • The first database management systems (dms) were developed in the 1960s with the two key models being CODASYL (network model) and IMS (hierarchical)
  • IDMS were also the rave in the 1960s with PICK and MUMPS databases as the most popular
  • In the 1970s, the relational model was proposed, but for a long time it was only an academic interest with more of a theoretical perspective which did not appear until the 1976 with System R and Ingres, but even then it was not a commercial product, only research prototypes
  • By the early 1980s relational computer database became commercial products with the launch of Oracle and DB2
  • 1980s research focused on distributed database systems
  • 1990s research was focused on object-orientated databases
  • 2000s - innovation focused on the XML database
  • Database Model Structure
  1. Hierarchical model - data is organized into an inverted tree-like structure with a downward link in each node to describe nesting.
  2. Network model - records can participate in any number of named relationships - each relationship associates a record known as the owner with multiple records of a member
  3. Relational model - information is represented in columns and rows
  • Database Management Systems: Relational database management systems, post-relational database models, and object database models
  • DBMS internals
  1. Storage and physical database design - such as flat files, ISAM, heaps, hash buckets, or B+ trees (most common are B+ trees and ISAM)
  2. Indexing - most common is a sorted list of the contents of some particular table column with pointers to the row associated with the value (allows it to be located quickly)
  3. Transactions and concurrency - should enforce ACID rules of: atomicity, consistency, isolation, and durability but many DBMS allow these rules to be relaxed for better performance
  4. Replication - closely related to transactions with concepts including: Master/Slave replication, quorum, and multimaster
  5. Security - to protect the database from unintended activity through an access control, auditing, and encryption
  6. Locking - how the database handles multiple concurrent operations with locks being generally shared (take ownership one from the other of the current data structure) or exclusive (no other lock can acquire the current data object as long as the lock last)
  7. Architecture - a combination of strategies are used such as OLTP systems use row-orientated datastore architecture
  • Applications of databases - the preferred method of storage for large multiuser applications, where coordination between many users is needed
  • Some DBMS products that might be familiar: BerkeleyDB, Datawasp, FileMaker, IBM IMS, Interbase, and Microsoft Access
"Introduction to Metadata, Pathways to Digital Information"

Metadata is a term used to describe many different forms of data by many different professions. Each profession may use the term in a different way but what they have in common is that they are communities that "design, create, describe, preserve, and use information systems and resources" (1). All information objects have three key features: content, context, and structure which are all reflected through metadata.
In our field, library metadata has focused on providing intellectual and physical access to content whether through indexes, abstracts, or catalogue records. Another aspect of our field, archival and museum studies also uses metedata to organize their information. But in this area, they focus on context - preserving the context is what preserves the value of records and artifacts.
The structure of information has been focused on less in this field, but even so, it is still an important component because the professionals realize that the more organized the structure is, the more it can be used to search for information objects.
Metadata outside the repository is explained and used for a broad scope of describing different acts and information. For an example, with the Internet it could refer to the information being encoded into HTML. To an electronic archivist may use it to refer to "all the contextual, processing, and use information needed to identify document....an active of archival record..." (3).
The Dublin Core Metadata Element Set is acknowledged in this article as identifying simple sets of metadata elements that can be used by any community to describe and search across many different information resources on the WWW. This is necessary in order to make sure that the different descriptions of metadata can be searched for and found throughout the WWW.
This article further categorizes metadata so that it is easier to understand within different terms. The categories are administrative, descriptive, preservation, use, and technical metadata. Metadata also has certain attributes and characteristics such as the attribute method of metadata creation with the characteristics of automatic metadata generated by a computer and manual metadata created by humans (5).
Figure 1 makes it easier to understand the life cycle of objects contained in a digital information system (the phases that information goes through during their life in a digital environment). These phases are: Creation and Multi-versioning, organization, searching and retrieval, utilization, and preservation and disposition.
So why is metadata important? Because it increases accessibility, it retains context, expands the use of information, and can heighten the interest in multi-versioning of the information. It also helps legal issues, the preservation of digital information, and allows system improvements both technological and from the economical standpoint.

Muddiest Point: Why does the technological professionals allow metadata to describe so many things? You think (and I have) that people would become confused and just be like "okay, lets come up with a better term for this distinctive set of metadata."

"An Overview of the Dublin Core Data Model"

At first, the article is very "muddy" and hard to understand. I think their introduction and DCMI Requirements were chock-full of terminology which may throw off a less-technical savvy person from reading and understanding the DCMI.
But thankful, the article gets easier to understand, for a short period of time at least -I get the ideas of its following goals of internationalization, modularization/extensibility, element identity, semantic refinement, identification of encoding schemes, specification of controlled vocabularies, and identification of structured compound values. I think the important thing is for a student to understand the DCMI's goals to create an easy to use description that is able to succeed at the global level. So in theory, the DCMI importance is to be able to provide each property defined by a unique identity along with "human readable labels and clear semantic definitions."

1 comment:

John said...

Wow, looking at your notes is like looking into a mirror of my own, except your are longer, haha. Anyway, obviously this would mean I agree with what you say, I also do not understand how metadata can be considered a form of organizing yet be so expansive