Thursday, August 23, 2007

Quality Digital Repositories: Reviews

Book Shelf

Historical Voices

Historical Voices was created to be a robust, fully searchable digital library of 20th century spoken world collections—the first significant repository of its kind. Using audio files as a key component, Historical Voices provides storage, online exhibits, and educational sources for students, teachers, researchers and the public.

Historical Voices is part of the Digital Library Initiative II funded in part by the National Science Foundation and the National Endowment for the Humanities. It is housed at Michigan State University’s MATRIX: the Center for Humane Arts, Letters, and Social Sciences ( Historical Voices continues to build its collection by working with universities, historical societies, archives, and research partners around the world. Its goal is to create a trans-institutional resource and a federated archive. Additionally, it strives to create best practices in sound digitization and develop a digital archive structure and a metadata format.

Currently, it provides links to featured galleries, such as “Studs Terkel: Conversations with America.” The search feature and best practices are being developed. However, Historical Voices provides PDFs of research it has conducted, such as “Digitization Projects: Key Issues for Archivists, Curators and Librarians.” In its Education section, it has a small collection of sample lessons built around audio selections.

On-Line Books Page

The On-Line Books Page offers access to thousands of free online books and serials in English and foreign languages. Substantial links to directories, archives, and featured exhibits, such as Banned Books Online, are available. Users range from scholars to the intellectually curious.

The design of the page is spartan, yet effective. With a simple search interface, the user can browse books by Library of Congress Subject Headings or by Library of Congress call number. Titles and authors are both searchable and listed alphabetically. When clicking on an icon in front of the listing, the user can learn about a particular book, find other books with the same author, title, or subject, or find out how to make a stable link to the detailed book description.

The transparency of the On-Line Books Page is helpful. For example, the criteria for a listing are explicit; it must be available at no charge, a full text of a significant book in English, and a stable, well-formatted text in a standard format. A list of requested books and books in process show what will be available soon.

The repository was founded in 1993 by John Mark Ockerbloom, a digital library planner and researcher at Penn State, where the page is hosted. Its development, maintenance, and content are all volunteered.

The Modern English Collection

This collection, arranged for browsing by the author’s last name or by category of interest, contains fiction, nonfiction, poetry, drama, letters, serials, manuscripts, and illustrations from 1500 to the present. There are 10,000 texts and 164,000 images publicly accessible. Users access a search engine for results, which can be viewed at different levels of details, such as by summary, chapter, or page.

Categories include texts by African Americans, texts by women writers, texts about the American Civil War, texts by and about Thomas Jefferson, texts for young readers, literature in translation, and items from the University of Virginia’s special collection. Each text is encoded in either SGML or XML and includes a bibliographic header with administrative data about the creation of the electronic text and its print source.

The Modern English Collection, founded in 1992, is part of the Electronic Text Center at the University of Virginia. The goals of the center are to create an online humanities archive of standards-based texts and images and to foster a community of users as a traditional library would. The Center strongly promotes the standardization of data in XML because it can reshape itself so it never becomes outdated and useless.

Project Bartleby claims to be the most comprehensive reference publisher on the web, meeting the needs of students and educators. Named after Herman Melville’s character Bartleby the Scrivener, this site features public domain literature in the classics, nonfiction, and reference. It is searchable within categories, such as fiction, nonfiction, verse, reference, and by author, title, or subject. The user can also browse by clicking on titles, authors, or similar works or authors to find more information of interest. Poetry is indexed chronologically or by first line. Biographic information on most of the authors is also available.

Unlike the other digital repositories featured in this paper, Project Bartleby is a commercial site and, probably not by coincidence, the most comprehensive. Ads frame each page but are usually ignored due to the user’s banner blindness.

Project Bartleby began in January 1993 at Columbia University as a project to convert public domain resources into digital archives. It was constructed according to four principles: accurate editions created by professional editorial standards; free public access; careful, well-researched selection of works with public domain or copyright license; and up-to-date presentation using electronic-publishing methods as they become standardized.

Monday, August 20, 2007

Quality Digital Repositories: Attributes

A quality digital repository is comprised of standardized data delivered online through a common interface, as opposed to a CD-ROM collection with multiple interfaces. Common metadata standards across collections are already allowing repositories to achieve cross-database access, and this will continue in more and more sophisticated forms.

Digital repositories succeed to the extent that they can re-articulate traditional library skills (cataloging, collection development, acquisitions, preservation, reference, user services, and special collections) in a new medium. Activity in the digital medium allows librarians to state the social, pedagogical, and intellectual roles of the library in an academic institution, and to serve as a rich content provider for many thousands of other users worldwide. Nothing brings people back to an online service like a well-selected, reliable, integrated, and cataloged set of data, especially if it is searchable and can be delivered quickly through a common interface across collections.

Digital repositories should provide reliable, long-term access to managed digital resources to its designated community, now and in the future. Trusted digital repositories may take different forms. For example, the On-Line Books Page was developed and maintained without cost, while Project Bartleby is a commercial site. However, they are quality repositories because they share the following attributes. They accept responsibility for the long-term maintenance of digital resources on behalf of its depositors and for the benefit of current and future users. They have have an organizational system that supports not only long-term viability of the repository, but also the digital information for which it has responsibility. They also design their systems in accordance with commonly accepted standards to ensure the ongoing management, access, and security of materials deposited within them.

The central challenge to preservation in a digital repository is the ability to guarantee the interpretability of digital objects for future users. This includes a guarantee of integrity, authenticity, confidentiality and accessibility to the digital data, which can be compromised by aging storage media and technical advancements. Although the preservation of digital works is in its infancy, the four repositories I reviewed will be taking steps to improve and maintain their data over time by developing strategies to cope with the continuous change of information technology in a responsible—and lossless—way.

Friday, August 17, 2007

Library of Congress Global Gateway

Stack of books

The Library of Congress Global Gateway is a publicly accessible digital library, offering more than 80 thousand digital items in both English and native languages. It is a portal to international research centers, collections, and other resources available at the Library of Congress and through its website.

Introduced in 2000, Global Gateway includes collaborations with Bibliothèque Nationale de France; the National Library of the Netherlands; the Russian State Library, the National Library of Russia, and other libraries located in Siberia; the National Library of Spain; and the National Library of Brazil. Each project is funded by the foreign library involved, as well as the Library of Congress. The bilingual, multimedia presentations at Global Gateway focus on documenting ties between these countries and American culture.

For example, the collaborative digital libraries include expositions like The Atlantic World: America and the Netherlands; France In America; Meeting of Frontiers: Siberia, Alaska, and the American West; Parallel Histories: Spain, the United States, and the American Frontier; and United States and Brazil: Expanding Frontiers, Comparing Cultures.

Their digital collections converging on history and cultures from around the world include Cuneiform Tablets: From the Reign of Gudea of Lagash to Shalmanassar III, Islamic Manuscripts from Mali, the Kraus Collection of Sir Francis Drake, the Lewis Carroll Scrapbook, Polish Declarations of Admiration and Friendship for the United States, and Selections from the Naxi Manuscript Collection.

In their Research Guides and Databases division, users can access country studies, the Handbook of Latin American Studies, the Global Legal Information Network, the Vietnam-Era POW/MIA Database, and the U.S.-Russian Joint Commission Database.

Additionally, the Library of Congress has archive and library catalogs and user guides for foreign destinations, as well as advice from reference specialists concerning research services and facilities and requirements for use. Users can access information through Area Studies divisions for international collections. Similarly, Portals to the World provides access to digital information of country profiles.

Through Global Gateway, the Library of Congress promotes their holdings, encourages better access through digitization, and forges important ties between foreign libraries. The intended audience of these collections is researchers, especially those traveling internationally. Before they arrive at their destinations, they can learn what resources are available to better plan their studies.

Global Gateway’s web presentations summarize and highlight their respective themes without overwhelming the user with myriad primary sources (Library). However,
some collections exist as primary resources, such as the Polish Declarations of Admiration and Friendship for the United States, which is helpful for genealogical, historical and sociological research because it includes the signatures of nearly one-sixth of the population of Poland in 1926 (Lamolinara).

Some Global Gateway collections were created when the Library of Congress requested proposals in 2002 for the digital conversion of underutilized or unknown collections whose use and popularity would increase if they were available online (Language). By offering digital representations of materials of interest, the originals remain archived, available only if researchers must examine them in person. In the case of ancient Arabic calligraphy manuscripts with pages that have been dispersed throughout the world, the online representations of the pages will encourage other institutions to digitalize their collections allowing researchers to reconstruct the manuscripts (Language).

Global Gateway also makes extremely rare collections available online. For instance, Selections from the Naxi Manuscript Collection presents the writing of a Chinese minority with the only living pictographic language in the world. When their priests die, the ceremonial books containing these writings are buried or burned, making them exceptionally rare (Words). Now they are offered online to interested parties.

My criticism of the project is that its focus is on the interests of the United States, rather than having an international focus. However, Global Gateway is explicit about this concentration in their collection description. Global Gateway, along with American Memory, is a forerunner for the World Digital Library project announced in 2005. With three million dollars from Google, World Digital Library builds upon previous digitization projects to “celebrat[e] the depth and uniqueness of different cultures in a single global undertaking” (Fineberg). This new project will have a much needed, international scope.

Global Gateway differs from traditionally libraries or archives because the collections have been curated by librarians. Instead of offering indexed primary and secondary sources for researchers, Global Gateway created comprehensive, edited collections for the maximum usefulness. However, it is a “real” digital library as defined by our readings. It is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network (Arms).

Global Gateway will continue to be useful to researchers for many years to come, and its success has encouraged larger digital projects by the Library of Congress and other foreign libraries.

Works Cited

Arms, W. Y. (2000). Digital libraries. Cambridge, MA: MIT Press, 15.

Fineberg, G. (2006). Promoting transcultural understanding: Librarian proposes World Digital Library. [Electronic version]. Library of Congress Information Bulletin, 65, 1

Lamolinara, G. (24 March 2005). Selected volumes of “Polish Declarations of
Admiration and Friendship for the United States” now online. [Electronic version]. Library of Congress News Releases.

Library of Congress. (2006). Language of beauty and meaning: Ancient Arabic
calligraphy available online [Electronic version]. Library of Congress Information Bulletin, 65, 9

Library of Congress. (2005). Library of Congress and National Library of France launch joint web site [Electronic version]. Library of Congress Information Bulletin, 64, 5

Library of Congress. (2004). Words of the Naxi: Rare pictographic Chinese writings available online. [Electronic version]. Library of Congress Information Bulletin, 63, 9

Sunday, August 5, 2007

Metadata and Standards: Assessing VRA Core Categories 3.0

The VRA Core Categories 3.0, issued in June 2000 by the Visual Resources Association Data Standards Committee, was created to describe visual works, such as paintings, sculptures, and buildings, and their surrogate digital, photomechanical, and photographic images, as well as works of material culture and their surrogate images (VRA Core 3.0).

VRA Core arose from the need to create a shared standard for documenting images in visual collections. While it provides guidelines, not a specific structure, it is used a template for creating records for visual resources collections.

It can be crosswalked with other formats such as MARC, Dublin Core, the Categories for the Description of Works of Art (CDWA), and the RLG REACH project element (The Core Categories for Visual Resources – Introduction).

Emerging from the library practice of defining bibliographic records from minimal to full, the core is an effort to find a satisfactory middle ground. More complete than a minimal level which specified only the fewest elements needed to uniquely identify, locate and account for an item, the core is still less than an exhaustive level of description (The Core Categories for Visual Resources – Introduction).

The VRA Core relies on paired records, with the same element set, that describe both the work and its image. (See the chart at the end of this paper for a list of the 17 elements and their descriptions). The institution using VRA Core 3.0 decides how the paired elements will be linked. It is possible to create more than one image record for a single work, depending on the number of ways in which it is reproduced.

Similar to Dublin Core, VRA Core is intended to be simple and is widely used by museums, visual libraries, and other art-related institutions. The elements that comprise the VRA Core 3.0 are designed to facilitate the sharing of information among visual resource collections. Additional fields can be added, depending on the institutions’ data needs. Elements may also be repeated as often as necessary to properly describe the work or image. The order of the VRA Core 3.0 categories are arbitrary, as well. There are no required fields.

VRA Core has no controlled vocabularies, but it suggests the development of each organization’s own data standards guidelines, using resources such as the Thesaurus for Graphic Materials (TGM), Art & Architecture Thesaurus (AAT), Union List of Artist Names (ULAN), and the Thesaurus of Geographic Names (TGN) (VRA Core 3.0).

An example of VRA Core used successfully is Harvard’s Visual Information Access database (Harvard University). After conducting a search, you can click on a result to display the VRA Core elements. Luna Imaging, an image digitalization company, also uses VRA Core 3.0 with their Insight software (Luna Imaging). Additionally, Penn State’s Visual Image User Study (VIUS) used VRA Core 3.0 as a temple for their work (Pennsylvania State University).

VRA Core 4.0 was introduced in April 2007. It differs most significantly from 3.0 in that it makes compliant with XML. It will be exciting to see how VRA Core develops and refines itself further to catalog visual works and images in a simple, yet powerful, way.

VRA Core 3.0 Elements

Record Type: Either a work record, for a physical or created object, or an image record, for a visual surrogate.
Type: Identifies the specific type of work or image being described.
Title: The title or identifying phrase given to a work or an image.
Measurements: The size, shape, scale, dimensions, format of the work or image.
Material: The substance of which a work or an image is composed.
Technique: The production or manufacturing processes, techniques, and methods involved in the work or image.
Creator: The names and other identifiers assigned to the entity that has contributed to the design, creation, production, manufacture, or alteration of the work or image.
Date: Date(s) associated with the work or image.
Location: The geographic location or name of the repository or other entity whose boundaries include the work or image.
ID number: The unique identifiers assigned to a work or an image.
Style/Period: A style, historical period, group, school, or movement, whose characteristics are represented in the work or image.
Culture: The culture or people from which a work or image originates or has been associated.
Subject: Terms or phrases that describe the work or image, such as proper names, locations, topics, and terms.
Relation: Terms or phrases describing the identity of the related work and the relationship between the catalogued work and the related work.
Description: A note about the work or image that gives additional information not recorded in other categories.
Source: A reference to the source of the information recorded about the work or the image.
Rights: Rights management information including copyright and intellectual property statements.

Works Cited

Harvard University. (2004). About VIA. Retrieved July 14, 2007, from Harvard University web site:

Luna Imaging. (2005). Insight Software. Retrieved July 14, 2007, from Luna Imaging web site:

Pennsylvania State University. (December 10, 2003). Visual Image User Study (VIUS). Retrieved July 14, 2007, from Penn State University Library web site:

Visual Resources Association (December 1, 1999.) The Core Categories for Visual Resources – Introduction. Retrieved July 14, 2007, from Visual Resources Association web site:

Visual Resources Association (n.d.) VRA Core 3.0. Retrieved July 14, 2007, from Visual Resources Association web site:

Visual Resources Association (n.d.) VRA CORE CATEGORIES, Version 3.0. Retrieved July 14, 2007, from Visual Resources Association web site:

Wednesday, August 1, 2007

Information Architecture and Web 2.0

In Dan Brown’s article "Information Architecture 2.0", he discusses the particular challenges that Web 2.0 presents to information architects. Web 2.0, a phrase coined by O'Reilly Media in 2004, refers to a second generation of Internet-based services, such as social networking sites, wikis, communication tools, and folksonomies, that emphasize online collaboration and sharing among users. Since users are more comfortable and trusting of online content, they are more willing to publish content and share personal information. They see the Internet as a virtual space for collaboration to rate, review, comment, tag, and blog. Additionally, Web 2.0 is ideal for “remixing” content. For instance, users can combine different content sources, allow other users to see their content source, and enable content and services to be mixed. The information architect must be able to think more abstractly while structuring online content in light of this increased user participation.

Brown targets three Web 2.0 features and questions how information architects can better facilitate their structure and use. The new challenges facing information architects are users expecting more control over information management (exemplified by RSS), large and dynamic information spaces (exemplified by wikis), and unstructured metadata (exemplified by tagging).

For RSS feeds like Bloglines, the structure is primarily chronological, based on updates as they occur, and inherently dynamic. Potential for interesting organization and use of RSS is just beginning. Brown asks if there should there be recommendations, filters, or archives for content. What else could information architects do to enhance the RSS feed experience and make information retrieval easier for users?

The advent of wikis, says Brown, “present[s] the information architect’s ultimate challenge: a virtual information space with no structure, to which anyone can contribute.” Although the most popular wiki, Wikipedia, does not have an information architect, the user community fills the role. Taking user participation to the next level, do we foresee Web 3.0 pushing users beyond creating content to becoming information architects themselves? Although there is some structure in Wikipedia, it is minimal: the categories are extemporaneous and the search feature is used much more than the subject taxonomy. Information architects will have to create frameworks for capturing content using standard formats and displaying the content categories.

Tagging turns traditional metadata on its ear: instead of adding structure to content, tagging is unstructured metadata. In this context, information architects must determine which projects would benefit from tagging, weighing the benefits of tagging against the problems of unstructured metadata. Additionally, they would have to design systems that leverage tagging with some type of structure.

Brown writes that, “The consequent explosion of content and functionality on the Web and the new ways in which we're making use of Web content has recast the role of the information architect.” Information architecture is moving away from taking set content, categorizing it, then creating a site. Instead, it is more about creating an experience of interaction with users. Web 2.0 sites have structures, but there is less design in the fine detail and the use of classification and less reliance on strict taxonomies. User classification through tagging structures the information to some extent. Web 2.0 sites are a cross between an application and a “traditional” website; there is both content and interaction. Information architects will not have to focus on the progression of the web pages so much as the interaction between the data and the user in the application. Additionally, user involvement and participation in the site must be viewed as just as important as the authoritative content. This can be done through tagging, creating the content itself, or creating a social network.

Brown argues that the information architect must now think about the “higher-level structures that create an overall information framework for a site.” If Web 2.0 creates a network that values participation, there is a clear need for structure to lend meaning to participation. Information architects must lead the efforts to define that structure and collaborate with graphic designers and interaction designers to make sure that structure is clearly communicated to users.

Web 2.0 is dynamic and interactive with user-generated online content. and the popularity of RSS feeds, wikis, and tagging will continue to grow. Information architects must find ways to help structure and present user-created content in an organized and usable way. While a majority of information architecture work is done by people who are not information architects, the specialists do play a role in building the community and advancing the discipline. Web 2.0 will continue to bring more challenges, more opportunities, and more work to information architects.