Sunday, December 30, 2007

Information Management and the Nazis

Edwin Black's IBM and the Holocaust

Information management is usually a positive force. In some instances, though, it can be lethal. Edwin Black’s controversial book IBM and the Holocaust: The Strategic Alliance Between Nazi Germany and America's Most Powerful Corporation charges that IBM’s German affiliate combined the technology of the information age and modern warfare to enact the Final Solution.

The IBM Hollerith punch card machine

The IBM Hollerith punch card machine, above, a precursor to the computer originally created to compile census data, was reconfigured to streamline the Holocaust. Each “undesirable” had a punch hole—Jews were “8”—which helped them be identified, ghettoized, deported to concentration camps, and terminated. Each step was made infinitely more smoother by the custom-designed Hollerith machines.

IBM CEO Thomas Watson, headquartered in New York, straddled a position of keeping both the United States and Germany satisfied. Black provides archival evidence that Watson willingly assisted in genocide, ultimately receiving a Merit Cross from Hitler and profiting from leasing deals and the sale of billions of punch cards. Additionally, the Germany subsidiary knew what was happening as well. How could they not? The machines had to be serviced monthly at the death camps by engineers.

From a NPR interview with the author:

"There were many people who found fascination with Hitler during this period.. .but the only one who bestowed the Promethean gift of data automation was IBM and at no point did they stop providing the materials that Hitler needed to proceed with his persecution of the Jews."

Below is an excerpt of the Introduction:

This book will be profoundly uncomfortable to read. It was profoundly uncomfortable to write. It tells the story of IBM's conscious involvement--directly and through its subsidiaries--in the Holocaust, as well as its involvement in the Nazi war machine that murdered millions of others throughout Europe.

Mankind barely noticed when the concept of massively organized information quietly emerged to become a means of social control, a weapon of war, and a roadmap for group destruction. The unique igniting event was the most fateful day of the last century, January 30, 1933, the day Adolf Hitler came to power. Hitler and his hatred of the Jews was the ironic driving force behind this intellectual turning point. But his quest was greatly enhanced and energized by the ingenuity and craving for profit of a single American company and its legendary, autocratic chairman. That company was International Business Machines, and its chairman was Thomas J. Watson.

Der Führer's obsession with Jewish destruction was hardly original. There had been czars and tyrants before him. But for the first time in history, an anti-Semite had automation on his side. Hitler didn't do it alone. He had help.

In the upside-down world of the Holocaust, dignified professionals were Hitler's advance troops. Police officials disregarded their duty in favor of protecting villains and persecuting victims. Lawyers perverted concepts of justice to create anti-Jewish laws. Doctors defiled the art of medicine to perpetrate ghastly experiments and even choose who was healthy enough to be worked to death--and who could be cost-effectively sent to the gas chamber. Scientists and engineers debased their higher calling to devise the instruments and rationales of destruction. And statisticians used their little known but powerful discipline to identify the victims, project and rationalize the benefits of their destruction, organize their persecution, and even audit the efficiency of genocide. Enter IBM and its overseas subsidiaries.

Solipsistic and dazzled by its own swirling universe of technical possibilities, IBM was self-gripped by a special amoral corporate mantra: if it can be done, it should be done. To the blind technocrat, the means were more important than the ends. The destruction of the Jewish people became even less important because the invigorating nature of IBM's technical achievement was only heightened by the fantastical profits to be made at a time when bread lines stretched across the world.

So how did it work?

When Hitler came to power, a central Nazi goal was to identify and destroy Germany's 600,000 Jews. To Nazis, Jews were not just those who practiced Judaism, but those of Jewish blood, regardless of their assimilation, intermarriage, religious activity, or even conversion to Christianity. Only after Jews were identified could they be targeted for asset confiscation, ghettoization, deportation, and ultimately extermination. To search generations of communal, church, and governmental records all across Germany--and later throughout Europe--was a cross-indexing task so monumental, it called for a computer. But in 1933, no computer existed.

When the Reich needed to mount a systematic campaign of Jewish economic disenfranchisement and later began the massive movement of European Jews out of their homes and into ghettos, once again, the task was so prodigious it called for a computer. But in 1933, no computer existed.

When the Final Solution sought to efficiently transport Jews out of European ghettos along railroad lines and into death camps, with timing so precise the victims were able to walk right out of the boxcar and into a waiting gas chamber, the coordination was so complex a task, this too called for a computer. But in 1933, no computer existed.

However, another invention did exist: the IBM punch card and card sorting system--a precursor to the computer. IBM, primarily through its German subsidiary, made Hitler's program of Jewish destruction a technologic mission the company pursued with chilling success. IBM Germany, using its own staff and equipment, designed, executed, and supplied the indispensable technologic assistance Hitler's Third Reich needed to accomplish what had never been done before--the automation of human destruction. More than 2,000 such multi-machine sets were dispatched throughout Germany, and thousands more throughout German-dominated Europe. Card sorting operations were established in every major concentration camp. People were moved from place to place, systematically worked to death, and their remains cataloged with icy automation.

IBM Germany, known in those days as Deutsche Hollerith Maschinen Gesellschaft, or Dehomag, did not simply sell the Reich machines and then walk away. IBM's subsidiary, with the knowledge of its New York headquarters, enthusiastically custom-designed the complex devices and specialized applications as an official corporate undertaking. Dehomag's top management was comprised of openly rabid Nazis who were arrested after the war for their Party affiliation. IBM NY always understood--from the outset in 1933--that it was courting and doing business with the upper echelon of the Nazi Party. The company leveraged its Nazi Party connections to continuously enhance its business relationship with Hitler's Reich, in Germany and throughout Nazi-dominated Europe.

Dehomag and other IBM subsidiaries custom-designed the applications. Its technicians sent mock-ups of punch cards back and forth to Reich offices until the data columns were acceptable, much as any software designer would today. Punch cards could only be designed, printed, and purchased from one source: IBM. The machines were not sold, they were leased, and regularly maintained and upgraded by only one source: IBM. IBM subsidiaries trained the Nazi officers and their surrogates throughout Europe, set up branch offices and local dealerships throughout Nazi Europe staffed by a revolving door of IBM employees, and scoured paper mills to produce as many as 1.5 billion punch cards a year in Germany alone. Moreover, the fragile machines were serviced on site about once per month, even when that site was in or near a concentration camp. IBM Germany's headquarters in Berlin maintained duplicates of many code books, much as any IBM service bureau today would maintain data backups for computers.

I was haunted by a question whose answer has long eluded historians. The Germans always had the lists of Jewish names. Suddenly, a squadron of grim-faced SS would burst into a city square and post a notice demanding those listed assemble the next day at the train station for deportation to the East. But how did the Nazis get the lists? For decades, no one has known. Few have asked.

The answer: IBM Germany's census operations and similar advanced people counting and registration technologies. IBM was founded in 1898 by German inventor Herman Hollerith as a census tabulating company. Census was its business. But when IBM Germany formed its philosophical and technologic alliance with Nazi Germany, census and registration took on a new mission. IBM Germany invented the racial census--listing not just religious affiliation, but bloodline going back generations. This was the Nazi data lust. Not just to count the Jews--but to identify them.

People and asset registration was only one of the many uses Nazi Germany found for high-speed data sorters. Food allocation was organized around databases, allowing Germany to starve the Jews. Slave labor was identified, tracked, and managed largely through punch cards. Punch cards even made the trains run on time and cataloged their human cargo. German Railway, the Reichsbahn, Dehomag's biggest customer, dealt directly with senior management in Berlin. Dehomag maintained punch card installations at train depots across Germany, and eventually across all Europe.

How much did IBM know? Some of it IBM knew on a daily basis throughout the 12-year Reich. The worst of it IBM preferred not to know--"don't ask, don't tell" was the order of the day. Yet IBM NY officials, and frequently Watson's personal representatives, Harrison Chauncey and Werner Lier, were almost constantly in Berlin or Geneva, monitoring activities, ensuring that the parent company in New York was not cut out of any of the profits or business opportunities Nazism presented. When U.S. law made such direct contact illegal, IBM's Swiss office became the nexus, providing the New York office continuous information and credible deniability. . .

More information about the book can be found on its webpage.

Monday, December 10, 2007

Karel Vredenburg and User Centered Design

Karel Vredenburg, the Program Director for Design Leadership at IBM and author of
User-Centered Design: An Integrated Approach and Designing the Total User Experience at IBM: An Examination of Case Studies, Research Findings, and Advanced Methods has posted several videos on YouTube illustrating a user centered design approach to product development.

Task Analysis

Benchmark Assessment

Conceptual Design

Detailed Design

Design Validation

Competitive Evaluation

Design Walkthrough

Wednesday, December 5, 2007

Michael Wesch's Information R/evolution and The Machine is Us/ing Us

Michael Wesch's Information R/evolution and The Machine is Us/ing Us are short films which effectively illustrate how information will be created, managed, searched, stored, and shared in the Internet age.

Information R/evolution

The Machine is Us/ing Us

Saturday, December 1, 2007

The Design of Everday Things

Donald A. Norman's The Design of Everyday Things

As part of Human-Computer Interaction class, I read Donald A. Norman’s The Design of Everyday Things, a seminal work in the field of User Centered Design (UCD). It is an engaging, highly readable book that teaches interactive design principles to apply to daily life. My boyfriend, for example, a gift processing manager for a large, quickly expanding non-profit organization, uses the concepts to create user-centered methods to centralize gift entry in the national office. I have used the theories myself to construct a myriad of information management protocols.

Donald Norman, a cognitive science professor, uses mundane example of bad design from daily life to illustrate his points. Who among us hasn’t been flummoxed by our initial use of office telephones, shower knobs, or doors that we pushed when we should have pulled or vice versa?

The crux of the book is simple: user needs differ from designer needs. When designers fail to understand the processes by which devices work, they create unworkable technology. A well-designed object should be self-explanatory.

Norman recapitulates his points at the end of the book by listing the seven UCD principles for transforming difficult tasks into easy ones:

1. Use both knowledge in the world and in the head.
2. Simplify the structure of tasks.
3. Make things visible.
4. Get the mappings right.
5. Exploit the powers of constraints, both natural and artificial.
6. Design for error.
7. When all else fails, standardize.

Below is an excerpt from the book:

"You would need an engineering degree from MIT to work this," someone once told me, shaking his head in puzzlement over his brand new digital watch. Well, I have an engineering degree from MIT… Give me a few hours and I can figure out the watch. But why should it take hours? I have talked with many people who can’t figure out how to work a sewing machine or a video cassette recorder, who habitually turn on the wrong stove burner.

Why do we put up with the frustrations of everyday objects, with objects that we can’t figure out how to use, with those neat plastic-wrapped packages that seem impossible to open, with doors that trap people, with washing machines and dryers that have become too confusing to use, with audio-stereo-television-video-cassette-recorders that claim in their advertisements to do everything, but that make it almost impossible to do anything?

The human mind is exquisitely tailored to make sense of the world. Give it the slightest clue and off it goes, providing explanation, rationalization, understanding. Consider the objects—books, radios, kitchen appliances, office machines, and light switches—that make up our everyday lives. Well-designed objects are easy to interpret and understand. They contain visible clues to their operation. Poorly designed objects can be difficult and frustrating to use. They provide no clues—or sometimes false clues. They trap the user and thwart the normal process of interpretation and understanding. Alas, poor design predominates. The result is a world filled with frustration, with objects that cannot be understood, with devices that lead to error. This book is an attempt to change things.

Link to Donald A. Norman's website

Saturday, November 24, 2007

Information Needs of Prisoners: Annotated Bibliography Part 3 of 3

This post is part of a series exploring prisoners as a user group with distinct information needs. The next segments are an annotated bibliography with applied search strategies. Please click below to read further:

Information Needs of Prisoners: A Review of the Literature
Part 1
Part 2
Part 3

Information Needs of Prisoners: Annotated Bibliography
Part 1
Part 2

Lithgow, S., & Hepworth, J. B. (1993). Performance measurement in prison libraries: research methods, problems, and perspectives. Journal of Librarianship and Information Science, 25(2), 61-9.

University of Wales’ Department of Information Science and Library Studies conducted a research project to develop and test a series of performance indicators for use in evaluating prison libraries to improve their effectiveness and efficiency, through a literature survey and a pilot study. The study assessed the current practices of prison libraries, including structured interviews with inmates and staff, opinion leader interviews, test site selection, observation and monitoring. Inmate interviews of a minimum of 10% of the population were conducted at two women’s prisons and four men’s prisons.

I searched the Library Literature and Information Science database in Dialog with this search statement:
ss prison libraries/DE AND (research OR survey?)

Stevens, T. (1994). The information needs of prisoners: a study of three penal establishments. Library and Information Research News, 18(60), 29-33.

A 1992 study, based on three prisons, inspects inmates’ information needs and how effectively they were met. The length of sentence and the time left to serve exercises control over information needs, and the institutions’ administration determines the inmates’ satisfaction. Formal information channels such as the library may be perceived as ineffective by inmates, regardless of their objective validity. Informal networks among inmates were the most preferred source of information.

I searched LISA for “prisoner* AND librar* AND research.”

Stevens, T. (1993). The role of the prison library in the reform and rehabilitation process. Library and Information Research News, 16(57), 13-5.

A study of prison libraries in England and Wales determined how prison libraries define their roles and objectives, their participation in rehabilitation programs within their institutions, and the effectiveness of their participation. It also examines how the prisoners’ information and usage needs are met through these programs.

I searched LISA with “prison libraries” in the descriptor field and “program*” in the keyword field.

Womboh, B. S. (1995). Research summary: an assessment of Nigerian prison libraries. Third World Libraries, 5(2), 74-5.

A questionnaire survey of prison officials and inmates at three Nigerian prisons analyzed prisoner usage of the library. Findings showed that the low use of prison libraries were caused by inaccessibility, poor resources, and inhuman prison conditions, not a lack of the inmates’ desire or ability. Intellectual growth was found to play a significant role in the reform and rehabilitation of incarcerated criminals in Nigeria.

I searched LISA with “prison libraries” in the descriptor field and “survey” in the keyword field.

Wednesday, November 21, 2007

Information Needs of Prisoners: Annotated Bibliography Part 2 of 3

This post is part of a series exploring prisoners as a user group with distinct information needs. The next segments are an annotated bibliography with applied search strategies. Please click below to read further:

Information Needs of Prisoners: A Review of the Literature
Part 1
Part 2
Part 3

Information Needs of Prisoners: Annotated Bibliography
Part 1
Part 3

Jones, P. (2004). Reaching out to young adults in jail. Young Adult Library Services, 3 (1), 16-8.

This survey of 44 public libraries, with 16 responses, revealed that few public libraries provide services to juvenile correctional facilities. Despite the restrictions on what materials could be accessed, the teenage users benefited from access. Details on prisoner usage in Hennepin County, Minnesota demonstrated this, with the creation of reading and writing promotional activities and a 5,000-item collection.

I searched LISA for “prison libraries” and “surveys” as keywords.

Kozup, P. C. (1992). Major issues in prison librarianship: a case study of three libraries. Unpublished masters thesis, Kent State University, Kent, Ohio.

A case study of three prison librarians analyzed their methods of meeting the needs of prison library users. Issues examined were prisoner usage, including access, library-based programs, and outreach services, inmate staff in libraries, in including selection, training, and evaluation, and collection development. Results indicate that the library’s service varies according to the prison administration and staff, physical space, civilian staffing availability, and the type and number of prisoners in the institution.

I searched the Eric database through Dialog with this search statement:
ss ((correctional institutions/de AND librar?) OR (prisoners/de AND librar?) OR prison libraries/de) AND (use studies/de OR library surveys/de OR research needs/de OR state surveys/de OR users/de OR case studies/de)

LaPoint, V. A. (1997). Electronic resources in Ohio prison libraries. Unpublished masters thesis, Kent State University, Kent, Ohio.

A survey questionnaire was sent to 29 state prison libraries of the Ohio Department of Rehabilitation and Corrections; twenty-two responded to the survey for a response rate of 76%. The survey analyzed prisoner access and usage of electronic resources, describing general operational procedures and current standards. Prisoners were able to access a variety of data made possible by OPAC, CD-ROMs, modems, Internet access, cooperative networks and interlibrary loan activity, and periodicals. The data indicated that most prison librarians think the addition of electronic resources can save space, save costs, and provide better access to information to meet prisoner needs.

I searched the Eric database through Dialog with this search statement:
ss prison libraries/DE AND questionnaire*

Liggett, J. M. (1996). Survey of Ohio’s prison libraries. Journal of Interlibrary Loan, 7(1), 31-45.

A survey of Ohio prison libraries collected information about library users, types of materials, serial subscriptions, circulation, budget, size, security, hours, automation, and librarian qualifications.

I searched the Eric database through Dialog with this search statement:
ss ((correctional institutions/de AND librar?) OR (prisoners/de AND librar?) OR prison libraries/de) AND (use studies/de OR users/de)

Saturday, November 17, 2007

Information Needs of Prisoners: Annotated Bibliography Part 1 of 3

This post is part of a series exploring prisoners as a user group with distinct information needs. The next segments are an annotated bibliography with applied search strategies. Please click below to read further:

Information Needs of Prisoners: A Review of the Literature
Part 1
Part 2
Part 3

Information Needs of Prisoners: Annotated Bibliography
Part 2
Part 3

Boutilier, S., Chan, H., Curry, A., & Wolf, K. (2003). Canadian federal prison libraries: a national survey. Journal of Librarianship and Information Science, 35(3), 141-152.

A nationwide survey of Canada’s 51 minimum-, medium-, and maximum-security federal prisons was conducted in 2001 through a mailed questionnaire to the head librarian. The survey gathered information about the users, as well as staffing, funding, the collection, and limitations on acquisitions and access. The respondents believed that the prisoner usage of the library materials were meeting recreational, cultural, educational, and informative needs, but more funding was needed.

I searched Library and Information Science Abstracts (LISA) with “prison libraries” in the descriptor field and “research OR survey?” in the keyword field.

Bowden, T. S. (2003). A snapshot of state prison libraries with a focus on technology. Behavioral and Social Sciences Librarian, 21(2), 1-12.

A nationwide survey of 207 state prisons, representative of the national population in terms of size, security level, and sex distribution, studied current trends of prisoner usage of technology. Almost 97% of the prisons had libraries offering a variety of print and electronic resources, although staffing, size, and budgets varied. Prisoners could access computers, with sporadic training, at almost half of the facilities, and access to word processing, CDs and library catalogs were common. Monitored Internet usage was only available at one library.

I searched LISA with “prison libraries” in the descriptor field and “technology” in the keyword field.

Glennor, L. S. (2003). Correctional libraries, library standards, and diversity. Journal of Correctional Education, 54(2), 70-4.

An online questionnaire of 110 state prison libraries examined prisoner usage, service, programs, and collections; 35 responses from 12 states were received. The survey reviewed the racial breakdown of inmates and staff, racial conflicts, the collection and its development criteria, cultural programs, and the satisfaction level of users. Using the Library Standards for Adult Correctional Institutions as a model, defined as meeting the informational, cultural, educational, vocational, and recreational needs of its users, librarians analyzed the demographics of library users and determined other services and programs available in the institution. Although the survey indicates that most prison librarians aspire to operate under the same model as public libraries, in practice this is difficult because of the restrictions imposed by prison administration, which curtails prisoner library usage.

I searched the National Criminal Justice Reference Service (NCJRS) for “prison* AND librar* AND information* AND needs.”

Haymann-Diaz, B. (1989). Establishing a selection process model for an ethnic collection in a prison library. Behavioral & Social Sciences Librarian, 8(1-2), 33-49.

This case study examines the selection process used by two inmate/library assistants to develop Hispanic and African-American ethnic collections in a prison library to meet inmate informational needs. Its findings are used to formulate a selection process model for the development of ethnic collections to encourage and support prisoner usage of the library.

I searched the Eric database in Dialog with this search statement:
ss prison libraries/de AND prison(w)librar?/ti AND (study or research)

Sunday, November 11, 2007

Information Needs of Prisoners: A Review of the Literature Part 3 of 3

This post is part of a series exploring prisoners as a user group with distinct information needs, beginning with a review of the literature. Please click below to read further:

Information Needs of Prisoners: A Review of the Literature
Part 1
Part 2

Information Needs of Prisoners: Annotated Bibliography
Part 1
Part 2
Part 3


Studies that examined technology in correctional libraries found that prisoners have access to computers with legal, reference, and educational software, but they usually do not have internet access due to safety concerns. This means that a prisoner with a long incarceration period, who will return to the community, will be at a disadvantage in seeking critical information. Shirley (2003), in an online questionnaire of 110 state prison libraries examining prisoner usage, service, programs, and collections, reported success with a CD-ROM in Maryland prisons that explained the internet and its practical uses to inmates. The purpose was not only to increase their knowledge of this vital resource, but also to temper unrealistic ideas about what a librarian can and cannot find when the prisoners submitted information requests.

LaPoint’s 1997 survey of Ohioan prisoner access and usage of electronic resources described general operational procedures and current standards. Prisoners were able to fulfill their user needs through OPAC, CD-ROMs, modems, monitored internet access, cooperative networks and interlibrary loan activity, and periodicals. LaPoint found that 81% of the librarians she surveyed said they had difficulty answering reference questions because of the lack of electronic resources, and 95% believed security could still be maintained with improved, accessible electronic resources. One librarian stated, “in this system, standard library procedures are difficult to implement. Automaton and information technologies [are] almost impossible to integrate into daily operations. I could dramatically increase service, reduce costs, and integrate into a local network if I was allowed” (p. 20). LaPoint argued that the benefits of electronic implementation would increase inmate user needs by replacing printed materials, which can be damaged or stolen, providing access to more info sources for staff and inmates, as well as saving costs, the need for storage, and librarians’ time.

Further Research

Future studies should continue to focus on user satisfaction, censorship, and technology. To improve user satisfaction, Wales should update or adapt its own set of prison guidelines with performance measures. Similarly, Canada should develop its own national set of standards and guidelines to measure its service. As censorship can be an issue between librarians and prison administration, Kozup suggested a study that investigates the relationship between these camps to determine any causal relationship that alters the censorship of materials. Since the rate of technology continues to grow at an astounding rate, studies looking at technology can always be repeated in three to five years to gauge improvement in meeting the user needs of inmates. These are but a few suggestions for further research and documentation in these vast subject areas.


As library users, prisoners are disadvantaged by a disproportionately high level of illiteracy, lack of education, insufficient vocational skills, and a high rate of mental illness. One can safely say that incarcerated persons have a large number of unmet needs, which translate into a high demand for information, learning materials, and self-improvement resources; the library, in cooperation with other prison programs, can play a vital role in meeting these needs. Inmates who want to use their time constructively are likely to become avid library users. As release nears, the prison library can provide them with a wealth of job-related materials as well as community information that may help them survive the first critical months on the outside. Prisoners user needs, from initial lock-up to final discharge, can be met through implementation of the guidelines outlined in Library standards for adult correctional institutions. More than any other group, inmates, who have so many privileges stripped from them, uphold the right to have their user needs fulfilled.


American Library Association. (1992). Library standards for adult correctional institutions. Chicago: Association of Specialized and Cooperative Library Agencies.

Bowden, T. S. (2003). A snapshot of state prison libraries with a focus on technology. Behavioral and Social Sciences Librarian, 21(2), 1-12.

Curry, A., Wolf, K., Boutilier, S., &. Chan, H. (2003). Canadian federal prison libraries: National survey. Journal of Librarianship and Information Science, 35, 141-152.

Haymann-Diaz, B. (1989). Establishing a selection process model for an ethnic collection in a prison library. Behavioral & Social Sciences Librarian, 8, 33-49.

Jones, P. (2004). Reaching out to young adults in jail. Young Adult Library Services, 3 (1), 16-18.

Kozup, P. C. (1992). Major issues in prison librarianship: A case study of three libraries. Unpublished masters thesis, Kent State University, Kent, Ohio.

LaPoint, V. A. (1997). Electronic resources in Ohio prison libraries. Unpublished masters thesis, Kent State University, Kent, Ohio.

Library Association (1981). Library Association guidelines for library provision in penal department establishments. 2nd ed. London: Library Association

Liggett, J. M. (1996). Survey of Ohio’s prison libraries. Journal of Interlibrary Loan, 7(1), 31-45.

Lithgow, S., & Hepworth, J. B. (1993). Performance measurement in prison libraries: Research methods, problems, and perspectives. Journal of Librarianship and Information Science, 25, 61-69.

Shirley, G. L. (2003). Correctional libraries, library standards, and diversity. Journal of Correctional Education, 54, 70-74.

Stevens, T. (1994). The information needs of prisoners: A study of three penal establishments. Library and Information Research News, 18, 29-33.

U.S. Bureau of Justice Statistics. (2003). Tables No. 348, 349, 350. In U.S. Census Bureau. Statistical abstracts of the United States: 2003 (pp. 218-219). Washington, DC: GPO

Monday, November 5, 2007

Information Needs of Prisoners: A Review of the Literature Part 2 of 3

This post is part of a series exploring prisoners as a user group with distinct information needs, beginning with a review of the literature. Please click below to read further:

Information Needs of Prisoners: A Review of the Literature
Part 1
Part 3

Information Needs of Prisoners: Annotated Bibliography
Part 1
Part 2
Part 3

Standards and Guidelines

The American Library Association’s Library standards for adult correctional institutions (1992) is the benchmark for establishing and maintaining prison library services. It outlines access, minimum standards for staffing, budget, facility size, and collection—elements that are necessary for the provision of acceptable library services in correctional facilities. Prison librarians aim to emulate the public library’s role as a community information center, a formal education support center, an independent learning center, a popular materials library, and a reference library. However, services and facilities vary among and within states, depending on politics, the administration’s philosophy, budget, and the nature of the institution. Certain prison facilities are concerned about complying with federal guidelines for access to the courts and have little interest in general library services. Others, recognizing the importance of rehabilitation for successful reentry into the community, have supported library collections and services that encourage information seeking and reading for pleasure or for self-help.

Widening the standards focus outside of the United States, the prisons studied in Wales use the Library Association’s (1981) Guidelines for library provision in penal establishments, although Lithgow and Hepworth (1993) warn that librarians must move beyond guidelines to universal performance measures. Curry, Wolf, Boutilier and Chan (2003) report that Canadian prison libraries have no official guidelines or standards and abide by region- or institution-specific guidelines.

User Satisfaction

Lithgow and Hepworth’s study revealed the difficulties associated with conducting research in a prison setting and showed that many indicators frequently used to assess library performance are inappropriate for a prison library. A survey of library opening hours is meaningless, for example, if inmates are allowed only one 20-minute visit per week. They suggested that customized indicators, which take into account the daily prison regime, are needed. Lithgow and Hepworth also discussed the challenges of prison library reference service, particularly the importance of serving all inmates in a non-judgmental manner because they can feel vulnerable or intimidated especially if literacy is an issue. Their study indicated that 81% of prison library users would use the public library upon release.

User satisfaction was also increased by collections both developed by and representative of minorities. Haymann-Diaz (1989) explored ethnic collections created by minority inmate library assistants in the Greenhaven Correctional Facility in Stormville, New York, finding that the core criterion for the collections were materials that emphasized unification between the historical and cultural backgrounds of various nationalities considered African-American and Hispanic. Acquiring books suggested by prisoners and balancing between educational and recreational materials amplified the usage and satisfaction levels of the facility’s inmates, as well as empowering the library assistants who developed the collections.

A 1996 survey of Ohio prison libraries by Liggett demonstrated that prison libraries are diverse, even within one state. Liggett found that Ohio’s federal prisons vary greatly regarding space per inmate, budgets, access to the library, available materials, librarian qualifications, and categories of censored materials. Despite their diversity, Ohio prison libraries are used by a large percentage of their population, and the inmates check out a large number of materials. One argument advanced by the study is that prison libraries should have a higher profile because they exhibit library use rates that are considerably higher (65% of the prison population) than those of public libraries (42% of the national population).

Stevens’ (1994) research on the role of the prison library in the reform and rehabilitation process identified a number of areas in which the work of the prison library can have an important influence. He argued that much of the information held in a prison library could be used by inmates to have a direct and positive influence on their future behavior. Perhaps most crucially, providing the information relating to the offending behavior patterns of inmates is of more importance than trying to address their “everyday problems.” Interestingly enough, he found that the length of sentence and the time left to serve affect the inmates’ information needs. He also reported that the nature of the administration at the institution determines user satisfaction. For instance, a higher emphasize on security will restrict the inmate’s opportunities for access information.

Stevens discovered that some inmates perceive that formal channels of information, like prison libraries, are ineffective in satisfying their information needs and may rely on informal information networks among inmates themselves instead. He explained, “There is easier access to fellow prisoners, other inmates may possess exclusive information, they may offer affective support and they are perceived as more reliable and trustworthy than staff” (p. 33). This critical attitude may influence both user needs and ultimately user satisfaction more than any other factor considered in these reviewed works.


Although The Standards assert the prisoner’s right to read and non-censorship, except for obvious security and pornographic issues, many librarians face scrutiny by prison personnel who try to impose restrictions on certain library materials. This, in turn, impounds the prisoner’s user needs by obstructing or restraining important information.

Bowden (2003) conducted a nationwide survey of 207 state prisons, representative of the national population in terms of size, security level, and sex distribution. In an open-ended question about restricted resources, she noticed the following trends, representative of censorship queries in the other studies reviewed. Pornography (mentioned by 43% of respondents) was the most banned material, followed by weapons manufacturing (31%), violence, martial arts, drug manufacturing, and prison escapes (10% each). Receiving smaller percentages were literature about hate groups, gangs, homosexuals, survivalism, revolution, the occult, organized crime, true crime, and computer access, as well as road maps and medical information. Curry et al.’s survey of Canada’s 51 minimum-, medium-, and maximum-security federal prisons reported similar conclusions to Bowden, with the additional ban of manuals about the internet and home repairs. Bowden discovered that even though librarians order items, the administration has the ultimate decision regarding acquisition. This finding was echoed in every work in this review that queried on censorship.

Besides subject matter, some materials were restricted because of their physical nature, such as hardbound books and computer discs (Bowden). Jones (2004) surveyed public libraries that provided services to juvenile correctional facilities and reported that magazines and spiral-bound books were banned at some institutions because the staples and spirals could be used as weapons. Despite the restrictions on theme or physicality, prison library users mentioned in Jones study reported a wish to use libraries on the outside, improved reading levels and attitudes towards reading in an independent study.

Surprisingly, Curry et al. were the only research study coordinators to ask about the appeals process for challenged materials. Forty-one percent said the process was available, and 24% said it was not. The rest checked “unknown” or provided no answer, which is troubling because it indicated that the staff was uncertain about procedures in this controversial subject area. It also pointed to a possible tension between the relationship between the administration and librarians regarding censored materials.

For instance, Kozup’s (1992) case study of three prison librarians, which analyzed their methods of meeting the needs of prison library users, reported a procedure in which items were canceled from an order without the librarian being notified. Despite an agreement that items were not to be challenged until they arrived in the institution, materials continued to be canceled—without notifying the librarian or being viewed by authorities. Similarly, Jones noted that a librarian reported that a guard from the facility examined materials and decided whether the inmate could have it and other staff members removed items from prisoners that they found personally offensive.

Security is a shared concern between librarians and the prison administration, and the restriction of some items, such as pornography and escape literature, seems reasonable. However, Bowden found that in some prison libraries, medical literature is banned so that prisoners cannot fake illnesses, yet wondered how ailing prisoners could have their user needs filled. Bowden wrote, “For librarians, this is an example of the inherent conflict between the professional mandate to provide access to information and restrictive institutional requirements” (p. 10).

Thursday, November 1, 2007

Information Needs of Prisoners: A Review of the Literature Part 1 of 3

This post is part of a series exploring prisoners as a user group with distinct information needs, beginning with a review of the literature. Please click below to read further:

Information Needs of Prisoners: A Review of the Literature
Part 2
Part 3

Information Needs of Prisoners: Annotated Bibliography
Part 1
Part 2
Part 3

Libraries are reservoirs of strength, grace and wit, reminders of order, calm and continuity, lakes of mental energy, neither warm nor cold, light nor dark. The pleasure they give is steady, unorgastic, reliable, deep and long-lasting. In any library in the world, I am at home, unselfconscious, still an absorbed.
Germaine Greer, 1989

In prison, those things withheld from and denied to the prisoner become precisely what he wants most of all.
Eldridge Cleaver, 1968


This paper presents a literature review of user studies conducted on the information needs of prison inmates. Using primarily the American Library Association’s Library standards for adult correctional institutions as an outline for professional library service in an institutional setting, this review of literature examines issues concerning user satisfaction, censorship, and technology. Of the prison system, George Bernard Shaw once wrote that there were three official objectives: “vengeance, deterrence, and reformation of the criminal, only one is achieved; and that is the one which is nakedly abominable.” Yet he overlooked the prison library, a means of reformation, empowerment, and even entertainment to its patrons. As a correctional library user stated, “In a prison, one of the few things we rely on is getting the truth, the right information” (Stevens, 1994, p. 32). In this search for truth—for survival—inmates use the library with a determined purpose, one in which librarians must uphold in terms of professional service.

Characterization of the User Group

According to a recent 2003 survey of the prison population by the U.S. Bureau of Justice Statistics, 3.1% of the adult population in the United States is either on probation, in jail, in prison, or on parole. The prison population is over 1.3 million, more than four times what it was in 1980. The population is overwhelmingly (93%) male and contained a disproportionate number of minorities. The largest demographic group is black and Hispanic males aged 20 to 39 (U.S. Bureau of Justice Statistics, 2003). In general, the prison population is less educated than the general population, but one out of every ten state prison inmates and one of every six federal prison inmates have some college education (U.S. Bureau of Justice Statistics). The prison population is a microcosm of the general population from which it is drawn and can be divided along many different lines for the purposes of analyzing information needs.

Scope of Literature Reviewed

A majority of the studies were conducted in the 1990s, with several in the early 2000s, and one from 1989. Geographically, the studies were conducted mostly in America, with three inquiries carried out in Ohio, due in part to Kent State’s esteemed library and information science program; other studies focused on prison library users in Canada and Wales. The articles were located in diverse databases, including ERIC, Library and Information Science Abstracts, Library Literature and Information Science, and the National Criminal Justice Reference Service. Two studies were masters theses; two each were from Behavioral and Social Sciences Librarian and the Journal of Librarianship and Information Science. Other publications included the Journal of Correctional Education, the Journal of Interlibrary Loan, Library and Information Research News, and Young Adult Library Services.


Questionnaires by work reviewed, year, geographic scope, and response rate:
Bowden, 2003, US, 44% (200 of 455)
Curry et al., 2003, Canada, 73% (37 of 51)
Jones, 2004, US, 36% (16 of 44)
LaPoint, 1997, Ohio, 76% (22 of 29)
Liggett, 1996, Ohio, 57% (12 of 21)
Shirley, 2003, US, 32% (35 of 110)

Interviews by work reviewed, year, geographic scope, and details:
Haymann-Diaz, 1989, New York, 2 inmates at 1 site
Kozup, 1992, Ohio, 3 staff at 3 sites
Lithgow, 1993, Wales, 270 inmates, approx. 42 staff at 6 sites
Stevens, 1994, Wales, 36 inmates, 24 staff at 3 sites

Thursday, October 25, 2007

No Cookies In The Library

Just for fun, I've posted a video from Sesame Street. Cookie Monster and quiet libraries do not mix!

Friday, October 5, 2007

Library Anxiety and the Reference Interview

Shelf of vintage books

What is "Library Anxiety"?

In 1986, Constance Mellon, a library science professor, coined the term to describe a phenomenon that she and other librarians observed when patrons were overwhelmed by libraries and too intimidated to ask for assistance. One of the ways that information professionals can counteract this common occurrence is to conduct the gentle art of the reference interview.

I went undercover in a bustling academic library to experience a reference interview from a patron's point of view. To conceal the real reason for my visit, I posed as a student and asked, “Where can I find information about herbal medicine, especially its history?” I was immediately aware of the librarian’s body language. I could tell that she was frazzled, but she greeted me with direct eye contact and a smile, which I returned, and it put us both at ease. Throughout the interview, she looked as though she was actively listening to what I said and talked in a friendly, professional voice. An open, willing-to-listen attitude and neutral questioning conveyed by search professionals contributes to the success of the reference interview.

Her first question was “When do you need this information by?” which surprised me until I realized that at this time of year, more students are asking for reference assistance to work on their final papers. I replied that I would like to access the information now, if she had the time to help me find it. She then asked about the uses of the information: “Why is this information needed, and what are you going to do with it?” I replied that I wanted it for my own knowledge and did not have any formal plans for its use. She then focused on the type and amount of material I needed: “What type of information would be most useful?” and “How much information do you need?” She asked about the gaps of my information: “What sources are you already familiar with?” I said that I was primarily concerned with books that the library owned, and that I had general knowledge of the history, but had never taken out books about the subject before. She then asked, “What regions do you want to focus on?” and I replied that I wanted to start with the United States and Europe.

She conducted a simple keyword search in the library catalog using “herbal medicine” and “history” and received these results:

Green Pharmacy: A History of Herbal Medicine by Barbara Griggs
Natural History of Medicinal Plants by Judith Sumner
Nature’s Pharmacy: A History of Plants and Healing by Christine Stockwell
The Story of Taxol: Nature and Politics in the Pursuit of an Anti-Cancer Drug by Jordan Goodman and Vivien Walsh.

Looking through the citations, she found the keywords “Materia medica” and “Herbs Therapeutic use History” which she also searched for, resulting in a promising title:

Heal Thyself: Nicholas Culpeper and the Seventeenth Century Struggle to Bring Medicine to the People by Benjamin Woolley

During the search, she asked if I was interested in finding information on homeopathy or the use of ayurveda or Traditional Chinese medicine in the West, which I declined.

Although the interview only lasted five minutes, there were two other patrons waiting for reference assistance. I wrote down the call numbers, and the librarian asked, “Do you think these will be helpful?” It was the only closed question she asked and signaled the end of the interview. The question made me realize the importance of proper closure to the reference interview. It must be done in a manner that will encourage the patron to either revisit for more help or return in the future with another question. Since I was giving her feedback that the books she found would be helpful (by writing them down), she noticed my physical cues and queried me if I was satisfied.

As I left the interview, I realized that the librarian could have simply directed me to the library catalog to search for myself, as my query was simple and straight-to-the-point. However, she conducted the reference interview in such a way to make me think deeper about the question I approached her with. One of the hidden bonuses of neutral questioning allows the patron to ponder the Big Question she wants the librarian to answer, as well as raising other deeper questions of the information request.

Later, after I retrieved the books, the librarian passed me and asked, “Did you find everything okay?” The question made me feel comfortable approaching her again to revise or redo my search results in the future.

Overall, I left the reference interview satisfied. Although the interview was brief because of the other patrons, the librarian presented herself in a professional, efficient, and friendly manner by using a majority of neutral questions.

Discover Reference Interview Resources at,

Monday, October 1, 2007

Finding Your Career Path

It seems as though many information professionals stumble into their career paths. We all have innate characteristics that made us "Information People," which can be useful to just about any industry.

What's your narrative?

Below, some Yale University archivists share their stories:

Thursday, September 20, 2007

Anchor Text Analysis Part 4 of 4

This post is part of a series about anchor text analysis. Please click on the links below to explore further:

Part 1
Part 2
Part 3

Anchor text can be used to facilitate a focused crawler, which fetches relevant pages of topic-oriented search engines (Chakrabarti, et al., 1999). A focused crawler would begin at the homepage of a limited URL domain, such as a corporation’s website.

Since they usually provide good summaries of the destination pages, anchor text can be used to develop a strategy for crawling unvisited URLs (Diligenti, et al., 2000; Li et al., 2005). In 2005, Li et al. compared web crawlers across the data sets of four Japanese universities. A standard breadth-first crawler, a hierarchical focused crawler which moved from parent to children pages, and a focused crawler guided by anchor texts using a decision tree were evaluated. Li et al. (2005) discovered that the anchor text crawler significantly outperformed the other crawlers and only had to crawl 3% of the pages to crawl 50% of the relevant pages. Additionally, the anchor text crawler was able to find deep, relevant pages more effectively than the others were.

Anchor text is also helpful for locating homepages, which can be viewed as destination pages with many anchor texts pointing to them. Westerveld et al. (2002) experimented with combinations of link, URL, and anchor text based methods to create algorithms to find entry pages. They discovered that using anchor text only methods outperformed full text only methods, and anchor text provided high precision results.

Chakrabarti et al. (1998a) suggests that anchor texts can be used to create an automatic resource compiler (ARC). An ARC is a program that combines link and text analysis to create a list of authoritative web resources like those on human-compiled resource lists like Yahoo! Building on the Kleinberg’s (1999) theory of authority pages, hubs, and extended anchor text, their study created automatically generated resource lists that fared equal or better than the human-compiled lists in user studies. The automated lists were generated faster and were easier to update than the human ones. The ARC received the worst scores for subjects such as “affirmative action,” which has many web pages and human-compiled resources. Topics such as “cheese,” “alcoholism,” and “Zen Buddhism” fared the best with the ARC because they are less political and have a smaller web presence.

Kumar et al. (2006), using the CLEVER (CLient-side EigenVector Enhanced Retrival) search system developed at the IBM Almaden Research Center, discovered that link-based ranking with anchor text performed better than content-based techniques. They enhanced Kleinberg’s (1999) HITS algorithm with weighted relevancy values using extended anchor text. Kumar et al. also found that linked-based ranking with anchor text was even more successful when combined with content.

Anchor text was also efficient in searching the intranet. Fagan et al. (2003) point out the differences between querying the Internet and the intranet. Unlike the democracy of the Internet, the intranet reflects an autocratic or bureaucratic position of the organization it serves. Intranet documents are designed to be informative, and there is little incentive to design a page that will attract traffic or have many links. In fact, the concept of rank is inconsequential, as all pages, no matter their relevance or importance, are treated equally. Internet searches are deemed successfully by users when they satisfice; intranet queries are more precise because they are looking for a specific answer. When the researchers crawled IBM’s intranet, they found that anchor text was surprisingly efficient. Its ranking performance increased as the recall parameter was relaxed. For recall at the 20th position, anchor text led to a 15% improvement in recalled and influenced more than 87%, doubling its recall over all. It also out-performed title indexing.

Hawking et al., 1999 point out that experiments, using TREC-8 (Text Retrieval Conference) Small and Large Web Tracks, did not find searching with anchor text effective. However, Craswell et al. (2001, 2003) explain that the TREC trials were based on subject search, rather than site finding tasks, the focus of their work. They define the difference as being that site search is when a user is looking for a particular item; a subject search results in as many relevant documents as possible. Eiron & McCurley (2003) explain the dichotomy as “related to the fact that the predominant use of web search engines is for the entry page search task, but that ad hoc queries appear in a very heaving tail of distribution of queries. Thus commercial search engines achieve their success by serving most to the people most of the time” (459). Craswell et al. (2001, 2003) discovered that ranking based on anchor text was twice as effective for site finding as content ranking. Overall, anchor text is ideal for locating sites, but may underperform in some subject search experiments.

Traditional information retrieval systems use measures of precision and recall on large bodies of closed information systems (Salton & Buckley, 1990). The Internet, due to its democratic nature, is large, growing, and dynamic, making total recall impossible. Few identifying structures increase precision and recall. In this environment, anchor text analysis has proven to be a surprisingly valuable and versatile method to retrieve authoritative and relevant information.

Works Cited

Anick, P. G. (1994). “Adapting a full-text information retrieval system to computer the troubleshooting domain. Proceedings of ACM SIGIR, 349-358.

Bharat, K., & Henzinger, M. (1998). Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (102-111). New York: ACM Press.

Brin, S. & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 30 (1-7), 107-117.

Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., & Rajagopalan, S. (1998a) Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference, 1-7, 65-74.

Chakrabarti, S., Dom, B., Gibson, D., Kumar, S. R., Raghavan, P., Rajagopalan, S. & Tomkins, A. (1998b). Experiments in Topic Distillation. ACM SIGIR Workshop on Hypertext Information Retrieval on the Web. 13-21.

Chakrabarti, S., van den Berg, M., & Dom, B. (1999). Focused crawling: a new approach to topic-specific web resource discovery. In The Eighth International World Wide Web Conference, Toronto, Canada, May 1999.

Craswell, N., Hawking, D., & Robertson, S.E. (2001). Effective site finding using link anchor information. In Proceedings of the 24th CAN SIGIR. 250-257.

Craswell, N., Hawking, D., Wilkinson, R, & Wu, M. (2003). Overview of the trec-2003 web track. In Proceedings of TREC 2003.

Croft, W. B., Cook, R., and Wilder, D. (1995). Providing government information on the internet: Experience with ‘THOMAS.’ University of Massachusetts Technical Report 95-96.

Dash, R. K. (2005). Increasing blog traffic: 10 factors affecting your search engine Rankings. BlogSpinner-X. Retrieved on August 13, 2007 from:
Diligenti, M., Coetzee, F. M., Lawrence, S., Giles, C. L., & Gori, M. (2000). Focused crawling using context graphs. In Proceedings of the 26th VLDB Conference, 527-534.

Eiron, N., & McCurley, K. S. (2003). Analysis of anchor text for web search. In
Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (459-460). New York: ACM Press.

Fagin, R., Kumar, R., McCurley, K.S., Novak, J., Sivakumar, D., Tomlin, J. A., & Williamson, D. P. (2003). Searching the workplace web. In Proceedings of WWW2003.

Fürnkranz, J. (1999). Exploiting structural information for text classification on the WWW. Proceedings of IDA-99, 3rd Symposium on Intelligent Data Analysis, 487-498.

Glover, E. J., Tsioutsiouliklis, K., Lawrence, S., Pennock, D., & Flack, G. W. (2002). Using web structure for classifying and describing web pages. In Proceedings of WWW2002.

Hawking, D., Voorhees, E., Bailey, P., & Craswell, N. (1999). Overview of TREC-8 Web Track. In Proceedings of TREC-8.

Hiler, J. (March 3, 2002). Google time bomb. Microcontent News, Retrieved
December 2, 2007 from:

Jin, R., Hauptmann, A. G., & Zhai, C. (2002) Title language model for information retrieval. In Proceedings of the 25th ACM SIGIR, 42-48.

Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604-632.

Kraft, R., & Zien, J. (2004). Mining anchor text for query refinement. WWW2004, 666-674.

Kumar, R., Raghavan, P., Rajagopalan, S. & Tomkins, A. (2001). On semi-automated web taxonomy construction. In Proceedings of the 4th ACM WebDB. New York: ACM Press, 91-96.

Kumar, R., Raghavan, P., Rajagopalan, S., & Tomkins, A. (2006). Core algorithms in the CLEVER System. ACM Transactions on Internet Technology, 6(2), 131-152.

Langville, A. N. & Meyer, C. D. (2006). Google’s PageRank and beyond: The science of search engine rankings. Princeton, NJ: Princeton University Press.

Li, J., Furuse, K., & Yamaguchi, K. (2005). Focused crawling by exploiting anchor text using decision tree. In International World Wide Web Conference, Chiba, Japan, 1190-1191.

Lu, W. H., Chien, L. F., & Lee, H. J. (2002). Translation of web queries using anchor text mining. ACM Transactions on Asian Language Information Processing, 1(2), 159-172.

Lu, W. H., Chien, L. F., & Lee, H. J. (2004). Anchor text mining for translation of web queries: A transitive translation approach. ACM Transactions on Information Systems, 22 (2), 242-269.

Maarek, Y. & Smadja, F. (1989). Full text indexing based on lexical relations. In Proceedings of the 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 198-206.

McBryan, O. (1994). GENVL and WWWW: Tools for taming the web. In First
International World Wide Web Conference
, Geneva, Switzerland.

Salton, G. & Buckley, C. (1990). Improving retrieval performance for relevance
feedback. Journal of the American Society for Information Science, 41(4), 288-297.

Westerveld, T., Kraaij, W., & Hiemstra, D. (2002). Retrieving web pages using content, links, URLs and anchors. In Tenth Text Retrieval Conference, 663-672.

Monday, September 10, 2007

Anchor Text Analysis Part 3 of 4

This post is part of a series about anchor text analysis. Please click on the links below to explore further:

Part 1
Part 2
Part 4

Eiron & McCurley (2003) observe that queries are often very short and contain few terms on average. Some researchers have noted that users create queries of one to three words without thought of query construction (Anick, 1994; Croft et al., 1995). Anchor text is remarkable similar to queries because it is short and summarizes what the user hopes to find. It also matches queries in term distribution: they are both either a noun phrase or a noun with adjectives, and they rarely contain verbs (Eiron & McCurley, 2003). The vocabulary and grammatical form of queries and anchor text are more similar than either compared to full-text documents (Eiron & McCurley, 2003). When the query terms and anchor text match, the found documents are relevant to the query (Eiron & McCurley, 2003).

The similarity between anchor text and query formation can be exploited by creating an algorithm that suggests refinements. Kraft & Zien (2004) suggest that better quality refinement originates from the anchor text than from mining the document’s content. They found that anchor text used with algorithms provided high quality refinements for queries of one or two words, the typical length of everyday search engine queries. They propose that further research should explore how anchor text can be used to broaden or change the direction of the search or how extended anchor text can refine queries (Kraft & Zien, 2004).

Theoretically, an algorithm for query refinement would find anchor texts that are similar to the text used in the query. However, there are problems using anchor text alone (Kraft & Zien, 2004). There may be too many anchor texts to match a single term query, and some anchor texts are automatically generated by Web authoring tools (Kraft & Zien, 2004). Ranking methods, in addition to anchor text, are employed instead. With this method, queries are easier to process because the algorithm only has to process anchor texts, not full documents. Additionally, destination documents with a large amount of anchor texts pointing to it are, on the whole, more relevant and popular than pages with less anchor texts.

Anchor text can also act synonymously. Eiron & McCurley (2003), in their experiments with querying the IBM intranet with anchor text, point out that the common term “layoff” is often called “restructuring,” “downsizing,” “rightsizing,” “outsourcing” or a dozen other euphemisms in the corporate world. Anchor text could assist in locating documents about layoffs that do not contain the term in their title, text, or keywords.

Similarly, anchor text, with link structures, can be used to constructed multilingual lexicons and find translation equivalents of query terms. Since bi- or multilingual human editors create anchor text, they can differentiate between the subtleties of connotation better than any machine translation. Exploiting the anchor text is less time-consuming and expensive than multilingual lexicons or thesauri. Lu et al. (2002) created an experimental system to perform English-Chinese web searches, which proved efficient for translating queries containing new terminology or proper names. Later experiments by Lu et al. (2004) combined anchor text with a bilingual dictionary to suggest helpful query terms to refine the search.

Anchor text works the same way as folksonomy. Instead of using traditional subject indexing or a controlled vocabulary, users employ freely chosen keywords to describe the content of the destination page. Often the anchor text is more descriptive and concise than the full text of the destination page. For example, the website for General Motors,, is about the automobile manufacturer, yet “automobile manufacturer” does not appear on the page. The anchor text, “automobile manufacturer” on a web page pointing to the General Motors site provides this information.

Anchor text is also important when the destination pages do not have text that can be crawled by a search engine. For instance, a destination page of magnolia illustrations can only be qualified by anchor text because the illustrations are jpegs, which do not provide textual information about what they represent.

Anchor text, combined with document titles, provides subject information about the document (Jin et al, 2002). They both express a concept of what the document is about, although they are constructed differently linguistically. However, anchor text is more useful than titles, because there can be multiple anchor text terms, but only one document title.

Thursday, September 6, 2007

Anchor Text Analysis Part 2 of 4

This post is part of a series about anchor text analysis. Please click on the links below to explore further:

Part 1
Part 3
Part 4

Oliver McBryan first discussed the usefulness of anchor text in Internet searching in 1994. He discussed the operations of GENVL (GENerate Virtual Library), an interactive hierarchical virtual library cataloged by subject, and WWWW, the WWW Worm, a search engine. He proposed that a search engine could be constructed on anchor texts alone.

Anchor text supports the ranking of websites, built on the premise that if an author creates a web page and uses anchor text to describe another page, than the anchor text is a good indicator of the destination page’s content. Google, as well as many other search engines, uses link analysis to create PageRank values (Brin & Page, 1998).

When other sites point to the destination page with the same anchor text, then the page will be more likely to be relevant to users querying with terms within the anchor text. Anchor text, especially with text-based methods, improves ranking quality (Bharat and Henzinger, 1998; Chakrabarti et al, 1998b; Kleinberg, 1999; Kumar et al., 2001).

Kleinberg (1999) suggests that anchor text can be used to determine authoritative content on the web, when he wrote, “Hyperlinks encode a considerable amount of latent human judgment, and we claim that this type of judgment is precisely what is needed to formulate a notion of authority” (204). Using the HITS (Hypertext Induced Topic Search) algorithm, Kleinberg divides the web into two parts. An authority page contains high-quality information about a subject. A hub contains many links to high-quality web pages. Authority pages and hubs exist in a mutually reinforcing relationship in which anchor texts are embedded and point to reliable information.

However, some anchor text may not be useful at all. “Click here” is a popular anchor text that conveys nothing about the source or destination pages. Anchor texts can also be exploited through black hat practices, which spammers create multiple pages with anchor text that point to their website and artificially inflate its rank. PageRank, and other well-engineered systems, minimize the practice by requiring pages to be cited by important pages.

Destination pages may also suffer from inaccurate or spurious anchor texts (Hiler, 2002). A well-known example of anchor text mischief occurred in 2003 when the highest-ranking result of the query terms “miserable failure” was George W. Bush’s official biography. Of the more than 800 links to Bush’s page, only 32 of them used the phrase (Langville & Meyer, 2006). In the years since, search engines have improved their algorithms to defeat the Googlebombing phenomenon.

Saturday, September 1, 2007

Anchor Text Analysis Part 1 of 4

This post is part of a series about anchor text analysis. Please click on the links below to explore further:

Part 2
Part 3
Part 4

Due to its enormity, diversity, and lack of formal structure, the Internet presents a unique dilemma for information retrieval. To fulfill informational needs, users translate their questions into short queries through a search engine. Using an algorithm, the search engine scours the web to locate and rank relevant documents.

Several factors make it difficult to evaluate the effectiveness of the search. Relevancy is subjective, gauged by the user. A document that may be judged relevant by evaluative techniques may not satisfy the user’s needs. Additionally, users often do not know exactly what they are looking for and experiment with different queries to refine their search. The growth and popularity of the Internet has also made ascertaining the authoritativeness of a document complicated. Unlike the closed corpus used in traditional methods of information retrieval, the web presents a massive collection with varying degrees of reliability.

Anchor text, however, is one of the few structures on the Internet that can improve the effectiveness of a search. It enhances a query’s quality because it assigns concise, human-generated terms to the web page it describes. Kumar et al. (2006) explains it best when they wrote:

The networking revolution made it possible for hundreds of millions of individuals to create, share, and consume content on a truly global scale, demanding new techniques for information management and searching. One particularly fruitful direction has emerged here: exploiting the link structure of the Web to deliver better search, classification and mining. The idea is to tap into the annotation implicit in the actions of content creators: for instance when a page-creator incorporates hyperlinks to the home pages of several football teams, it suggests that the page-creator is a football fan (132).

This paper presents an overview of information retrieval using anchor text analysis from the past decade, which has proven to be a versatile and efficient method of finding information.

On the web, a hyperlink consists of two parts: the destination page and the anchor text on the source page. The author of the source page determines the anchor text, which describes the destination page. Anchor text is defined as the highlighted clickable text that is displayed for a hyperlink on a web page.

An illustrative example is the anchor text for this link:

information retrieval

“Information retrieval” is associated with the document “index.html.” From this configuration, it is assumed that a human editor of the web page believed that “index.html” was about information retrieval. When a web crawler follows links in a document to crawl additional documents, it indexes the anchor text “information retrieval” with not only the source document it is embedded in, but also the destination document, index.html.

Extended anchor text is the words surrounding the anchor text as well as the anchor text itself. Fürnkranz (1999) defined extended anchor text as the headings immediately before the anchor text and the paragraph containing it. Glover et al. (2002) narrowed the definition to the 25 words before and after the anchor text. Extended anchor text displays lexical affinity, which states that if there is relevant text close to a link, the link is more likely to be relevant (Maarek & Smadja, 1989).

Thursday, August 23, 2007

Quality Digital Repositories: Reviews

Book Shelf

Historical Voices

Historical Voices was created to be a robust, fully searchable digital library of 20th century spoken world collections—the first significant repository of its kind. Using audio files as a key component, Historical Voices provides storage, online exhibits, and educational sources for students, teachers, researchers and the public.

Historical Voices is part of the Digital Library Initiative II funded in part by the National Science Foundation and the National Endowment for the Humanities. It is housed at Michigan State University’s MATRIX: the Center for Humane Arts, Letters, and Social Sciences ( Historical Voices continues to build its collection by working with universities, historical societies, archives, and research partners around the world. Its goal is to create a trans-institutional resource and a federated archive. Additionally, it strives to create best practices in sound digitization and develop a digital archive structure and a metadata format.

Currently, it provides links to featured galleries, such as “Studs Terkel: Conversations with America.” The search feature and best practices are being developed. However, Historical Voices provides PDFs of research it has conducted, such as “Digitization Projects: Key Issues for Archivists, Curators and Librarians.” In its Education section, it has a small collection of sample lessons built around audio selections.

On-Line Books Page

The On-Line Books Page offers access to thousands of free online books and serials in English and foreign languages. Substantial links to directories, archives, and featured exhibits, such as Banned Books Online, are available. Users range from scholars to the intellectually curious.

The design of the page is spartan, yet effective. With a simple search interface, the user can browse books by Library of Congress Subject Headings or by Library of Congress call number. Titles and authors are both searchable and listed alphabetically. When clicking on an icon in front of the listing, the user can learn about a particular book, find other books with the same author, title, or subject, or find out how to make a stable link to the detailed book description.

The transparency of the On-Line Books Page is helpful. For example, the criteria for a listing are explicit; it must be available at no charge, a full text of a significant book in English, and a stable, well-formatted text in a standard format. A list of requested books and books in process show what will be available soon.

The repository was founded in 1993 by John Mark Ockerbloom, a digital library planner and researcher at Penn State, where the page is hosted. Its development, maintenance, and content are all volunteered.

The Modern English Collection

This collection, arranged for browsing by the author’s last name or by category of interest, contains fiction, nonfiction, poetry, drama, letters, serials, manuscripts, and illustrations from 1500 to the present. There are 10,000 texts and 164,000 images publicly accessible. Users access a search engine for results, which can be viewed at different levels of details, such as by summary, chapter, or page.

Categories include texts by African Americans, texts by women writers, texts about the American Civil War, texts by and about Thomas Jefferson, texts for young readers, literature in translation, and items from the University of Virginia’s special collection. Each text is encoded in either SGML or XML and includes a bibliographic header with administrative data about the creation of the electronic text and its print source.

The Modern English Collection, founded in 1992, is part of the Electronic Text Center at the University of Virginia. The goals of the center are to create an online humanities archive of standards-based texts and images and to foster a community of users as a traditional library would. The Center strongly promotes the standardization of data in XML because it can reshape itself so it never becomes outdated and useless.

Project Bartleby claims to be the most comprehensive reference publisher on the web, meeting the needs of students and educators. Named after Herman Melville’s character Bartleby the Scrivener, this site features public domain literature in the classics, nonfiction, and reference. It is searchable within categories, such as fiction, nonfiction, verse, reference, and by author, title, or subject. The user can also browse by clicking on titles, authors, or similar works or authors to find more information of interest. Poetry is indexed chronologically or by first line. Biographic information on most of the authors is also available.

Unlike the other digital repositories featured in this paper, Project Bartleby is a commercial site and, probably not by coincidence, the most comprehensive. Ads frame each page but are usually ignored due to the user’s banner blindness.

Project Bartleby began in January 1993 at Columbia University as a project to convert public domain resources into digital archives. It was constructed according to four principles: accurate editions created by professional editorial standards; free public access; careful, well-researched selection of works with public domain or copyright license; and up-to-date presentation using electronic-publishing methods as they become standardized.

Monday, August 20, 2007

Quality Digital Repositories: Attributes

A quality digital repository is comprised of standardized data delivered online through a common interface, as opposed to a CD-ROM collection with multiple interfaces. Common metadata standards across collections are already allowing repositories to achieve cross-database access, and this will continue in more and more sophisticated forms.

Digital repositories succeed to the extent that they can re-articulate traditional library skills (cataloging, collection development, acquisitions, preservation, reference, user services, and special collections) in a new medium. Activity in the digital medium allows librarians to state the social, pedagogical, and intellectual roles of the library in an academic institution, and to serve as a rich content provider for many thousands of other users worldwide. Nothing brings people back to an online service like a well-selected, reliable, integrated, and cataloged set of data, especially if it is searchable and can be delivered quickly through a common interface across collections.

Digital repositories should provide reliable, long-term access to managed digital resources to its designated community, now and in the future. Trusted digital repositories may take different forms. For example, the On-Line Books Page was developed and maintained without cost, while Project Bartleby is a commercial site. However, they are quality repositories because they share the following attributes. They accept responsibility for the long-term maintenance of digital resources on behalf of its depositors and for the benefit of current and future users. They have have an organizational system that supports not only long-term viability of the repository, but also the digital information for which it has responsibility. They also design their systems in accordance with commonly accepted standards to ensure the ongoing management, access, and security of materials deposited within them.

The central challenge to preservation in a digital repository is the ability to guarantee the interpretability of digital objects for future users. This includes a guarantee of integrity, authenticity, confidentiality and accessibility to the digital data, which can be compromised by aging storage media and technical advancements. Although the preservation of digital works is in its infancy, the four repositories I reviewed will be taking steps to improve and maintain their data over time by developing strategies to cope with the continuous change of information technology in a responsible—and lossless—way.

Friday, August 17, 2007

Library of Congress Global Gateway

Stack of books

The Library of Congress Global Gateway is a publicly accessible digital library, offering more than 80 thousand digital items in both English and native languages. It is a portal to international research centers, collections, and other resources available at the Library of Congress and through its website.

Introduced in 2000, Global Gateway includes collaborations with Bibliothèque Nationale de France; the National Library of the Netherlands; the Russian State Library, the National Library of Russia, and other libraries located in Siberia; the National Library of Spain; and the National Library of Brazil. Each project is funded by the foreign library involved, as well as the Library of Congress. The bilingual, multimedia presentations at Global Gateway focus on documenting ties between these countries and American culture.

For example, the collaborative digital libraries include expositions like The Atlantic World: America and the Netherlands; France In America; Meeting of Frontiers: Siberia, Alaska, and the American West; Parallel Histories: Spain, the United States, and the American Frontier; and United States and Brazil: Expanding Frontiers, Comparing Cultures.

Their digital collections converging on history and cultures from around the world include Cuneiform Tablets: From the Reign of Gudea of Lagash to Shalmanassar III, Islamic Manuscripts from Mali, the Kraus Collection of Sir Francis Drake, the Lewis Carroll Scrapbook, Polish Declarations of Admiration and Friendship for the United States, and Selections from the Naxi Manuscript Collection.

In their Research Guides and Databases division, users can access country studies, the Handbook of Latin American Studies, the Global Legal Information Network, the Vietnam-Era POW/MIA Database, and the U.S.-Russian Joint Commission Database.

Additionally, the Library of Congress has archive and library catalogs and user guides for foreign destinations, as well as advice from reference specialists concerning research services and facilities and requirements for use. Users can access information through Area Studies divisions for international collections. Similarly, Portals to the World provides access to digital information of country profiles.

Through Global Gateway, the Library of Congress promotes their holdings, encourages better access through digitization, and forges important ties between foreign libraries. The intended audience of these collections is researchers, especially those traveling internationally. Before they arrive at their destinations, they can learn what resources are available to better plan their studies.

Global Gateway’s web presentations summarize and highlight their respective themes without overwhelming the user with myriad primary sources (Library). However,
some collections exist as primary resources, such as the Polish Declarations of Admiration and Friendship for the United States, which is helpful for genealogical, historical and sociological research because it includes the signatures of nearly one-sixth of the population of Poland in 1926 (Lamolinara).

Some Global Gateway collections were created when the Library of Congress requested proposals in 2002 for the digital conversion of underutilized or unknown collections whose use and popularity would increase if they were available online (Language). By offering digital representations of materials of interest, the originals remain archived, available only if researchers must examine them in person. In the case of ancient Arabic calligraphy manuscripts with pages that have been dispersed throughout the world, the online representations of the pages will encourage other institutions to digitalize their collections allowing researchers to reconstruct the manuscripts (Language).

Global Gateway also makes extremely rare collections available online. For instance, Selections from the Naxi Manuscript Collection presents the writing of a Chinese minority with the only living pictographic language in the world. When their priests die, the ceremonial books containing these writings are buried or burned, making them exceptionally rare (Words). Now they are offered online to interested parties.

My criticism of the project is that its focus is on the interests of the United States, rather than having an international focus. However, Global Gateway is explicit about this concentration in their collection description. Global Gateway, along with American Memory, is a forerunner for the World Digital Library project announced in 2005. With three million dollars from Google, World Digital Library builds upon previous digitization projects to “celebrat[e] the depth and uniqueness of different cultures in a single global undertaking” (Fineberg). This new project will have a much needed, international scope.

Global Gateway differs from traditionally libraries or archives because the collections have been curated by librarians. Instead of offering indexed primary and secondary sources for researchers, Global Gateway created comprehensive, edited collections for the maximum usefulness. However, it is a “real” digital library as defined by our readings. It is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network (Arms).

Global Gateway will continue to be useful to researchers for many years to come, and its success has encouraged larger digital projects by the Library of Congress and other foreign libraries.

Works Cited

Arms, W. Y. (2000). Digital libraries. Cambridge, MA: MIT Press, 15.

Fineberg, G. (2006). Promoting transcultural understanding: Librarian proposes World Digital Library. [Electronic version]. Library of Congress Information Bulletin, 65, 1

Lamolinara, G. (24 March 2005). Selected volumes of “Polish Declarations of
Admiration and Friendship for the United States” now online. [Electronic version]. Library of Congress News Releases.

Library of Congress. (2006). Language of beauty and meaning: Ancient Arabic
calligraphy available online [Electronic version]. Library of Congress Information Bulletin, 65, 9

Library of Congress. (2005). Library of Congress and National Library of France launch joint web site [Electronic version]. Library of Congress Information Bulletin, 64, 5

Library of Congress. (2004). Words of the Naxi: Rare pictographic Chinese writings available online. [Electronic version]. Library of Congress Information Bulletin, 63, 9