Tuesday, July 17, 2007

Digital Libraries and Personalization Part 3 of 5

The following post is part of a series about personalization of digital libraries. Please click below to read further:

Part 1
Part 2
Part 4
Part 5

Collaborative Filtering

DLs are collaborative meeting places of people sharing common interests and exchanging information and knowledge with each other or with experts. Users may have overlapping interests if the information available in a DL matches their expectations, backgrounds, or motivations. Users might profit from each other’s knowledge by sharing opinions or experiences or offering advice. Some users might evolve into a community if only they were to become aware of each other. Such a service might be important for a DL as it supplies focused information. It is common for people within a community to discover resources via serendipitous means because they are tied into some larger web of social connections by community involvement.

As personalization expands, the services that support an individual user will expand to services supporting groups of users. The focus of DLs will move from the studying individual human behavior to group behavior and their technical support. Before, DLs were seen as environments of data for search and usage by individuals. Now, they have expended to include the cooperation by individuals aware of their environment as well as other users.

Concerning information seeking, the recommendation of items based on the preference patterns of other users is important. A recommender system is any system that offers information not requested by users but is either similar information to that requested or information requested by similar users. The use of opinions and knowledge of other users to predict the relevance value of items to be recommended to each user in a community is known as collaborative filtering. These methods are built on the assumption that a good way to find interesting content is to find other users who have similar interests, and then recommend items that those similar users like. In contrast to content-based filtering methods (explained below), collaborative filtering methods do not require any content analysis as they are based on aggregated user ratings of these items.

Content-Based Filtering

Early research on DL personalization used simple models of user interests to make recommendations. The user profile is a representation of the preferences of a user. The user profile can be acquired either automatically or set-up manually. In the former case, machine-learning techniques can be applied by observing user-system interactions and relying on implicit or explicit relevance assessments, while in the latter case the profile is defined by the user manually. The acquisition of a user profile and the successive matching of documents against it, in order to filter out the irrelevant ones, are known as content-based filtering or information filtering.

A content-based filtering system selects items based on the correlation between the content of the items and the user’s preferences as opposed to a collaborative filtering system that chooses items based on the correlation between people with similar preferences (Resnick et al. 1994). A content-based filtering system represents the content of a document with a set of terms. Terms are extracted from documents by parsing words (Adam & Gangopadhyay, 1998). First all HTML tags and stop words removed. The remaining words are reduced to their stem by removing prefixes and suffixes. For instance the words “computer,” “computers,” and “computing” could all be reduced to “comput.” The user profile is represented with the same terms and built by analyzing the content of documents that the user was satisfied with, determined by using either explicit or implicit feedback. Explicit feedback requires the user to evaluate examined documents on a scale; implicit feedback infers the user’s interests by observing his/her actions.

Knowledge-Based Filtering

DLs can also extend the keyword-based index schema, which is mainly used for information searching and browsing purposes, to knowledge-based index schema, so that information in DLs can be easily retrieved by both keywords and knowledge. Knowledge in this context includes a wide range of knowledge from human experts in different areas. Each piece of knowledge in the knowledge subspace is linked to a set of documents in the document subspace (Ludäscher, et. al, 2001). The linkage between the two can be built in a number of ways. Experts can indicate relevant documents while imputing the knowledge, then DL systems perform keyword-based searching. From the results obtained, relevant documents are filtered by either experts or DL systems through a closer examination. DL users may then mark the corresponding documents, and other users can reuse these findings.

Knowledge-based systems are designed to apply logical inference rules to make judgment in processing business routines or come up with a conclusion to a certain pre-defined problem (Van Gils, et. al, 2004, 189). On the contrary, the mission of a DL system equipped with a knowledge subspace is to make expertise knowledge widely available to the public. The DL acts as an information and knowledge dictionary, since a huge body of knowledge of various kinds in the world, together with their documents, is preserved, classified, and maintained inside its information space. A knowledge-based system intends to solve problems in a narrow domain. The rules stored in its knowledge base are thus limited to a particular field. Comparatively, the scope of the knowledge subspace of a DL is broad, covering a wide spread of disciplines. Users with different backgrounds can turn to DLs for expert help in carrying out their work. From DLs, users can obtain not only the requested documents, but also intelligent answers to their pressing questions.

No comments: