Sunday, July 29, 2007

Digital Libraries and Personalization Part 5 of 5

The following post is part of a series about personalization of digital libraries. Please click below to read further:

Part 1
Part 2
Part 3
Part 4

The Future of Personalized DLs

The future of DLs is bright with a multitude of options to be researched, developed, and implemented. While conducting informal interviews with digital librarians, I found that while personalization tools may be presently limited, a variety of tools were being considered for the future. For example, Karen Henry of the National Science Digital Library (NSDL), wrote that NSDL is building a number of community contribution and collaboration capabilities, such as blogs and wikis, implementing a bookmarking and tagging system to support personal views of library resources, and considering a system to construct a personal view of library tools and resources, similar to iGoogle or MyYahoo.

She also imagines the possibilities of future personalized services:

One thing that would be interesting (and probably very valuable) is a way to maintain a visual trace of a set of searches, identified resources and annotations—essentially a personal record of an exploration of the library for a particular project. This might also be combined with a way to cache the associated searches, rerun them automatically, and be alerted to new resources or changes in the set of identified materials. Finally, there are a wealth of opportunities at the intersection between personalized views/annotations of the library and social networking/community views and discussions. (personal communication, June 27, 2007).


This paper presents concepts related to how users can more effectively utilize information within digital libraries by means of personalization.. Personalization is a concept that increases the usability of digital libraries by adapting the digital library to the specific needs, experiences, skills, and tasks of the user. Through collaborative, content-based, and knowledge-based filtering, as well as personalized information environments (PIE), users can better employ the resources available in digital libraries.


Adam, N. R., & Gangopadhyay, A. (1998). Content-Based retrieval in digital libraries. Computer, 31, (1), 93-95.

Adams, A., & Blandford, A. (2002). Digital libraries in academia: challenges and changes. In E-P Lim, S. Foo, C. Khoo, H. Chen, E. Fox, S. Urs & T. Costantino (Eds.) Digital Libraries: People, Knowledge, and Technology, 392-403.

Brusilovsky, P., Farzan, R., & Ahn, J. (2005). Comprehensive personalized information access in an educational digital library. Fifth ACM/IEEE-CS Joint conference on Digital libraries, 9-18.

Chowdhury, G. G., & Chowdhury, S. (2003). Introduction to digital libraries. London: Facet.

French, J. C., & Viles, C. L., (1999). Personalized information environments. D-Lib Magazine 5, (6). Retrieved on June 30, 2007 from:

Gravano, L., Garcia-Molina, H., & Shivakumar, N. (1996). dSCAM: Finding document copies across multiple databases. Proceedings of the 4th International Conference on Parallell and Distributed Information Systems, Miami, Florida.

Jayawardana, C., Hewagamage, K., Priyantha, K., & Hirakawa, M. (2001). A
Personalized information environment for digital libraries. Information Technology and Libraries, 20, (4), 185-96.

Larsen, R. L. (1997). Relaxing assumptions …stretching the vision: a modest view of some technical issues. D-Lib Magazine, 4, Retrieved on June 28, 2007 from:

Ludäscher, B., Marciano, R. & Moore, R. (2001). Towards self-validating knowledge- based archives. 11th Workshop on Research Issues in Data Engineering. Heidelberg, Germany, 9-16. Retrieved June 30, 2007 from:

Mizzaro, S., & Tasso, C. (2002). Ephemeral and persistent personalization in adaptive information access to scholarly publications on the web. Proceedings of AH2002 Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, Malaga, Spain, 306-316. Retrieved on June 28, 2007 from:

Rao, R., Pedersen, J. O., Hearst, M. A., Mackinlay, J. D., Card, S. K., Masinter, L., Halvorsen, P., & Robertson, G. G. (1995). Rich interaction in the digital library, Communications of the ACM 38, (4), 29-39.

Renda, M. E., & Straccia, U. (2005). A personalized collaborative Digital Library environment: a model and an application. Information Processing and Management, 41, 5-21. Retrieved on June 28, 2007 from:

Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P. & Riedl, J. (1994). GroupLens: An open architecture for collaborative filtering of netnews. Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, Chapel Hill, NC, 175-186.

Riecken, D. (2000). Personalized views of personalization. Communications of the ACM 43, (8), 27–28.

Schatz, B. (1997). Information retrieval in digital libraries: Bringing search to the net. Science 275, 327-334.

Van Gils, B., Proper, H. A., & Van Bommel, P. (2004). A conceptual model of
information supply. Data & Knowledge Engineering 51, (2), 189-222.

Voorhees, E., Gupta, N. K., & Johnson-Laird, B. (1995). Learning collection fusion strategies. Proceedings of the 18th International Conference on Research and Development in Information Retrieval, Seattle, WA, 172-179.

Saturday, July 28, 2007

Digital Libraries and Personalization Part 4 of 5

The following post is part of a series about personalization of digital libraries. Please click below to read further:

Part 1
Part 2
Part 3
Part 5

Personalized Information Environments

Personalized Information Environments (PIE), in a digital library, is a framework that provides a set of integrated tools based on an individual user’s requirements and interests with respect to access to library materials (Jayawardana, 2001, 187). These tools integrate the user’s personal library and a remote digital library.

Lynne Davis, lead user interface designer of the Digital Library for Earth System Education, explains:

Personalization in the context of a digital library is the degree to which the library [user interface] assists the user in populating their own personal “space” with library assets. This might be an online environment connected in some way to the library (and potentially other “spaces” created by that user) where they would store and manage items of particular interest to them and an environment where those data can be combined in specific ways defined by that user for their personal use. The products of their efforts may or may not be shareable, controlled by that user. This can be approached at many levels of complexity (personal communication, June 28, 2007).

Users perform active consuming activities, such as reading, watching, and listening when accessing multimedia library materials by using tools in PIE. Therefore, they can build personalized views on those materials while turning them into an easy-to-use reference collection. When information gathering, users may create their own documents by integrating selected segments of library materials with their own comments. These segments would be text, images, audio, or video depending on the original source. When completed, a personal library is used to maintain these new documents on top of digital library materials.

An integrated interface of the personal library and the digital library in PIE then allows users to organize and modify the library materials according to their needs. Since the digital library is a vast, ever-growing collection of materials, it is necessary to include services such as filtering and retrieval tools. Users are then able to seek suitable library materials more easily.

There are two types of personalization involved in PIEs (Voorhees, et. al, 1995, 173). Material personalization corresponds to facilities for users to use documents according to their individual requirements such as active consuming and information gathering. Technically, it describes the customization of multimedia library materials to define suitable views and how selected media segments from multimedia objects become part of their notated documents. Collection personalization, on the other hand, captures the user’s context and interest from the material personalization in order to provide a personalized view of the organization of digital library. Collection personalization, then, includes personalized filtering and personalized retrieving (French & Vales 1999). In PIE, these two schemas of personalization benefit each other by creating the cycle of interaction.

There are four central conceptual requirements that embody the PIEs: customizability, effective search, sharability, and privacy. PIEs should have tools that allow users to easily design and customize what they want to view. The building process is iterative. One can select a group of resources, send a query, evaluate the results, and then make adjustments. The PIE would have auxiliary information such as topic maps or resource summaries that allow users to better select resources. Users still want effective search capabilities, so a PIE must dynamically alter its context as resources are added or removed. PIEs should be also sharable. Since there may be considerable effort expended in building a PIE, it is logical that they might be shared and re-used. The original builder might design a core PIE and then add a small number of resources for particular tasks. The PIE might be used by a number of people with interests in the same area or who are working on the same project together. Sharing could be through reference to a single PIE or through copying. Sharability also implies access control. For a variety of reasons, a PIE owner may want to control access. The owner may have paid a fee to some of the constituent resources in order to gain access, or the PIE may be related to a proprietary or otherwise sensitive task. From the searcher's point of view, protection of usage and query patterns may be important.

Tuesday, July 17, 2007

Digital Libraries and Personalization Part 3 of 5

The following post is part of a series about personalization of digital libraries. Please click below to read further:

Part 1
Part 2
Part 4
Part 5

Collaborative Filtering

DLs are collaborative meeting places of people sharing common interests and exchanging information and knowledge with each other or with experts. Users may have overlapping interests if the information available in a DL matches their expectations, backgrounds, or motivations. Users might profit from each other’s knowledge by sharing opinions or experiences or offering advice. Some users might evolve into a community if only they were to become aware of each other. Such a service might be important for a DL as it supplies focused information. It is common for people within a community to discover resources via serendipitous means because they are tied into some larger web of social connections by community involvement.

As personalization expands, the services that support an individual user will expand to services supporting groups of users. The focus of DLs will move from the studying individual human behavior to group behavior and their technical support. Before, DLs were seen as environments of data for search and usage by individuals. Now, they have expended to include the cooperation by individuals aware of their environment as well as other users.

Concerning information seeking, the recommendation of items based on the preference patterns of other users is important. A recommender system is any system that offers information not requested by users but is either similar information to that requested or information requested by similar users. The use of opinions and knowledge of other users to predict the relevance value of items to be recommended to each user in a community is known as collaborative filtering. These methods are built on the assumption that a good way to find interesting content is to find other users who have similar interests, and then recommend items that those similar users like. In contrast to content-based filtering methods (explained below), collaborative filtering methods do not require any content analysis as they are based on aggregated user ratings of these items.

Content-Based Filtering

Early research on DL personalization used simple models of user interests to make recommendations. The user profile is a representation of the preferences of a user. The user profile can be acquired either automatically or set-up manually. In the former case, machine-learning techniques can be applied by observing user-system interactions and relying on implicit or explicit relevance assessments, while in the latter case the profile is defined by the user manually. The acquisition of a user profile and the successive matching of documents against it, in order to filter out the irrelevant ones, are known as content-based filtering or information filtering.

A content-based filtering system selects items based on the correlation between the content of the items and the user’s preferences as opposed to a collaborative filtering system that chooses items based on the correlation between people with similar preferences (Resnick et al. 1994). A content-based filtering system represents the content of a document with a set of terms. Terms are extracted from documents by parsing words (Adam & Gangopadhyay, 1998). First all HTML tags and stop words removed. The remaining words are reduced to their stem by removing prefixes and suffixes. For instance the words “computer,” “computers,” and “computing” could all be reduced to “comput.” The user profile is represented with the same terms and built by analyzing the content of documents that the user was satisfied with, determined by using either explicit or implicit feedback. Explicit feedback requires the user to evaluate examined documents on a scale; implicit feedback infers the user’s interests by observing his/her actions.

Knowledge-Based Filtering

DLs can also extend the keyword-based index schema, which is mainly used for information searching and browsing purposes, to knowledge-based index schema, so that information in DLs can be easily retrieved by both keywords and knowledge. Knowledge in this context includes a wide range of knowledge from human experts in different areas. Each piece of knowledge in the knowledge subspace is linked to a set of documents in the document subspace (Ludäscher, et. al, 2001). The linkage between the two can be built in a number of ways. Experts can indicate relevant documents while imputing the knowledge, then DL systems perform keyword-based searching. From the results obtained, relevant documents are filtered by either experts or DL systems through a closer examination. DL users may then mark the corresponding documents, and other users can reuse these findings.

Knowledge-based systems are designed to apply logical inference rules to make judgment in processing business routines or come up with a conclusion to a certain pre-defined problem (Van Gils, et. al, 2004, 189). On the contrary, the mission of a DL system equipped with a knowledge subspace is to make expertise knowledge widely available to the public. The DL acts as an information and knowledge dictionary, since a huge body of knowledge of various kinds in the world, together with their documents, is preserved, classified, and maintained inside its information space. A knowledge-based system intends to solve problems in a narrow domain. The rules stored in its knowledge base are thus limited to a particular field. Comparatively, the scope of the knowledge subspace of a DL is broad, covering a wide spread of disciplines. Users with different backgrounds can turn to DLs for expert help in carrying out their work. From DLs, users can obtain not only the requested documents, but also intelligent answers to their pressing questions.

Wednesday, July 4, 2007

Digital Libraries and Personalization Part 2 of 5

The following post is part of a series about personalization of digital libraries. Please click below to read further:

Part 1
Part 3
Part 4
Part 5

Personalized Digital Library Example

The International Children’s Digital Library is one DL with several personalization features to note:

• The library account feature allows users to identify preferences for their default search collection, interface language and search tool, either Simple or Advanced.

• The library interface permits users to choose their language and search tool preferences. A user can personalize the library interface without creating a library account; however, these preferences will not be remembered without an account and would have to be reset each time the user accesses the library.

• The book readers, in some cases, allow users to personalize how they view books by selecting specific book readers, such as standard HTML. For books with higher security levels, choosing a different reader may not be an option. The reader interface is not something that can be preset through a library account. Users who want to view books in anything other than the standard reader must select the reader they want to use each time they access a book.

• Personal bookshelves are spaces where users can gather their favorite books for future use. These personal bookshelves could be created on a theme, if that is useful to the user, but books are generally selected for their individual characteristics that catch a user’s interest.

Amy Datsko of The International Children’s Digital Library (ICDL) states, “it might be possible for the ICDL to develop personalized retrieval tools that update users’ bookshelves based on previous queries, but that is not in our immediate plans. It would also be an interesting option to include functionality whereby the library learns the context of a user’s search history in order to recommend other materials that might meet the user’s needs” (personal communication, June 22, 2007).

DL Research

The initial focus of DL research and application was on increasing digital content, basic DL services, metadata standards, interoperability, and rights management. The first generation of DLs provided a small set of services to relatively well-prepared and knowledgeable user communities. Later, many applications of personalization in DLs have focused on applying basic personalization and rudimentary recommender systems. Though it is important to provide a means of personalization to users in order to help them deal with information resources more efficiently and effectively, realizing personalization concepts is a difficult task in a highly diverse and distributed information environment.

Some research has focused on the negative aspects of DLs in regards to personalization. For many DLs, there is a general lack of personalization in accessing the information (Renda & Straccia, 2005) and (Brusilovsky & al., 2005). For instance, users are commonly treated similarly, even though they have different information needs and interests (Mizzaro & Tasso, 2002). Other studies found that resources are not integrated, which forces users to repeat the same queries in different systems (Rao & al., 1995). Yet another problem is the lack of information services capable of detecting among the large amount of new available information only the information relevant for the single specific user, without the need of explicitly requesting for it or formulating precise search queries. Larsen (1997) notes the need for alternative methods of looking at searches because the increasing size of online information assures that increasing numbers of documents fit the query criteria. A query that once yielded tens of hits now yields hundreds. In addition, most DLs are unable to search subsets of collections and to merge the accompanying results. Duplicate detection, while an active area of research, still needs to be further developed (Gravano & al., 1996).

Sunday, July 1, 2007

Digital Libraries and Personalization Part 1 of 5

The following post is part of a series about personalization of digital libraries. Please click below to read further:

Part 2
Part 3
Part 4
Part 5


A large amount of newly created information is electronically published in digital libraries, whose aim is to satisfy users’ information needs. Digital libraries are not only an information resource where users may submit queries to satisfy their information needs but also a collaborative working and meeting space of people sharing common interests. As users become more familiar with digital libraries, these libraries will become personalized environments, where users can organize the information space according to their own subjective view, build communities, become aware of each other, exchange information and knowledge with other users, and get recommendations based on their preference patterns. This paper will explore the issues of personalization and digital libraries, giving an overview of collaborative, content-based, and knowledge-based filtering, as well as personalized information environments (PIE).


The amount of information published digitally and the number of users accessing it to satisfy their information needs is growing exponentially. Although information is easier and faster to access, it is increasingly difficult for users to control and seek specific information available on the Internet unless they know exactly what they need and where and how to obtain it. Emerging services are required to assist users to access and organize information among this growing data torrent.

Among the various information sources, digital libraries play an important role in terms of providing information and services to these users. Digital libraries (DLs) can be defined as consisting of collections of information (text, audio, video, and/or multimedia) which have associated services delivered to the user communities by a variety of technologies. An essential technological component of DLs is that they are networked, meaning that access is shared and collaborative.

Chowdhury and Chowdhury (2003) in their book Introduction to Digital Libraries provide a clear picture of why digital libraries are important:

Digital Libraries have the potential to make a tremendous impact on our everyday life. They will bring a paradigm shift in the ways we create, distribute, seek and use information, and thus will make significant impact on the way we do our day to day work—study, research, jobs, problems solving, decision making, and so on. Digital libraries will also have a tremendous impact on the information industry, affecting the information generators, publishers, and distributors, and information service providers (12).

Additionally, Schatz (1997) alludes that the emergence of the Internet, “has made the process of organizing and searching digital collections a critical international need. As the Internet itself becomes increasingly part of the structure of the world, so will the process of creating useful digital libraries become a critical part of society” (332). Internet access has resulted in DLs that are increasingly used by communities with diverse needs. As DLs become more commonplace and as their service and contents become more varied, users will expect advanced services, especially as they become more experienced with technology. In order to remain effective, DLs must offer and tailor information for their users and support community efforts to capture, structure, and share knowledge.

The emerging generation of DLs is more diverse then before. The collections themselves are becoming more heterogeneous, in terms of their creators, content, media, and communities served. The range of library types is expanding to include long-term personal DLs, and well as DLs that serve specific organizations, educational needs, and cultural heritage that vary in their reliability, authority, and quality. The user communities are also becoming diverse in terms of their interests, backgrounds, and skill levels, ranging from novices to experts in specific subject areas. The growing diversity of DLs, the communities accessing them, and how the information is used requires the next generation of DLs to be more effective at providing information that is tailored to a person’s background knowledge, skills, tasks, and intended use of the information. One cannot deny the significance and need for research to be undertaken where digital library usability is made optimal for users.

Despite technological advances, usability issues in digital libraries and other information systems persist. The usability problems may be associated with the emergence of a new community of users that lack the technological orientation and are much less tolerant of poorly designed systems. Earlier, people were counted on to adapt to systems, but now they expect more. Adams and Blandford (2002) believe that “the invisible presence of [DLs], their poor usability, and user support has made their impact less dramatic. These issues cannot be avoided as digital libraries become more and more influential…” (393).


What is needed are tools that will enable users to create personal collections of information resources of interest to them. It will be necessary to cull tens of thousands of resources for those of a specific interest and to monitor available resources to detect new useful sources or to decide that others are no longer of interest. Efficient search strategies are required to support the discovery of resources and to search and fuse information gleaned from those resources.

Personalization is one option in which DLs can support this goal, offer users better service, and assure their patronage. Personalization, in the context of DLs, refers to the ability of users to customize the information they access and to adapt the presentation according to their personal preferences (Riecken 2000). Personalized DLs brings to light the suite of resources being offered and empower users to create information systems that are responsive to their needs.

Lisa Cerrato, of the Perseus Digital Library defines personalization as “the ability of a user to customize the display and actions available in the DL. Thus, limiting the results of a vocabulary tool to only textbooks the user has seen or remembering where the user left off in a particular text, storing search results, and allowing the user to view or hide certain default features” (personal communication, June 24, 2007).

Personalization can range from user-driven to automatic, with many personalization tools combining elements of both approaches. User-driven personalization asks users to provide explicit input about their information needs. Automatic personalization is engaged when the system observes user activity and tailors the system accordingly. Users who supply an array of information about their interests and skill levels might make their searches more effective and the results better tailored to their real needs. At some institutions, like universities, some of this information could be automatically supplied from registration files and other sources.