Social Annotations in Digital Library Collections

Search | Back Issues | Author Index | Title Index | Contents

D-Lib Magazine
November/December 2008

Volume 14 Number 11/12

ISSN 1082-9873

Social Annotations in Digital Library Collections

Rich Gazan
Department of Information and Computer Sciences, University of Hawaii
1680 East-West Road, Honolulu, HI 96822
<gazan@hawaii.edu>

	Abstract In order to incorporate Web 2.0 functionality effectively, digital libraries must fundamentally recast users not just as content consumers, but as content creators. This article analyzes the integration of social annotations – uncontrolled user-generated content – into digital collection items. The literature review briefly summarizes the value of annotations and finds that there is conceptual room to include user-generated content in digital libraries, that they have been imagined as forums for social interaction since their inception, and that encouraging a collaborative approach to knowledge discovery and creation might make digital libraries serve as boundary objects that increase participation and engagement. The results of an ongoing case study of a Web 2.0 question and answer site that has made a similar transition from factual to social content are analyzed, and eight decision points for digital libraries to consider when integrating social annotations with digital collection items are proposed. 1. Introduction At the beginning of a term, many college students are faced with the choice of whether to buy a new or used textbook. While used textbooks are obviously less costly, they often carry another benefit new textbooks don't: highlights, underscores and other annotations by their previous owners. Even though the author of, and rationale for, the annotations may be unknown, the fact that somebody found particular sections of the book important enough to emphasize tends to make the eye linger. Ideally, annotations can make learning and knowledge discovery feel less like a solitary pursuit and more like a collaborative effort. From the earliest legal and religious texts, marginal annotations and glosses have informed, challenged and often confused subsequent readers. At first glance, it would seem that the trustworthiness of an unknown individual who has interpreted or appended an author's work would be questionable, but several reasonable assumptions can be made that contribute to the perceived authority of an unknown annotator. At the very least, they read the work and took the time to make the annotations, which may question or clarify certain statements in the text, and create links to other works, authors or ideas. The subsequent reader of an annotated work then has one or more additional perspectives from which to evaluate the usefulness of the text and annotations, and more implied permission to add his or her own interpretations than in an unannotated text. Published scholarly works are objects for discussion in an ongoing conversation among a community of knowledge seekers, and whether via formal citation in later publications or annotations in existing ones, all are designed to advance the generation and exchange of ideas. Social computing, or Web 2.0, operates in much the same way. Whether via links, tags, social bookmarks, comments, ratings or other means, providing users the means to create, share and interact around content typifies the Web 2.0 approach. Most instances of Web 2.0 operate from a model of aggregate peer authority. For example, no single expert tags (essentially categorizes) photographs on a site like flickr.com, but tags from an aggregation of non-experts can make a photograph 'findable enough.' Similarly, hotel ratings or movie reviews from a large-enough number of non-experts can provide a general sense of quality or trustworthiness. Most critically, knowledge discovery and transfer is no longer restricted to a model of one expert creator to many consumers. In Web 2.0, consumers are creators, who can add their voices to both expert and non-expert claims. Users get the benefit of multiple perspectives and can evaluate claims in the best tradition of participative, critical inquiry. Though designed as systems for knowledge discovery, the majority of digital libraries operate from the traditional expert model. Subject experts create content, digital library experts provide access to it, and individual users consume it. Very few systems have been built with an architecture that encourages users to create content, associate it with collection items, or share their impressions with other users. Providing digital library users read-access to collections is the traditional finish line. Providing them write-access – the ability to append content to that in a digital collection – is something else entirely. Usually, we would no sooner invite user alteration of digital collection items than we would distribute crayons with illustrated children's books, but this is the way of the Web. It is rare to find an online article, blog or product review that does not now have a space for user ratings, comments or both. In this way, conversations spring up and ideas are exchanged, resulting in an added dimension of engagement with both the text and fellow readers. I use the term 'social annotation' to refer to uncontrolled user comments and interactions around a digital resource, to distinguish it from more formal senses of content annotation. This article argues that integrating Web 2.0-type social annotations into digital libraries can serve the larger goals of supporting users' information seeking needs and practices, and encourage increased exploration and engagement. This article addresses what social annotations could look like in digital libraries, with examples drawn from Answerbag, a Web 2.0 social question and answer site that has confronted some of the same challenges of both encouraging and harnessing uncontrolled content. Eight decision points for digital libraries considering social annotations are proposed. 2. Background Digital libraries are complex sociotechnical artifacts that are much more than searchable electronic collections. Even initial definitions in the literature were fairly broad; Borgman (1999) bisects the conceptions of digital libraries into those of researchers (content collected on behalf of user communities) and librarians (digital libraries as institutions or services). At that time, digital library literature was understandably concerned with mapping the boundaries of the field, and Lesk (1999) identifies an inadequate focus on user needs in digital library research. In the Answer Garden project, Ackerman (1994) called for elements of social interaction to be included in digital libraries, with user-user communication and exploration of collections identified as important components of digital library architecture. Ackerman specifically mentioned leveraging the collected wisdom of others, rather than launching a search cold, but it is difficult to find a modern system that embraces this idea. However, the most recent DELOS Digital Reference Model (DELOS 2007) adopts a much broader view of digital libraries, one with room for users as both content creators and interactors: "The DELOS Network of Excellence on Digital Libraries now envisions a Digital Library as a tool at the centre of intellectual activity having no logical, conceptual, physical, temporal, or personal borders or barriers on information. It has moved from a content-centric system that simply organizes and provides access to particular collections of data and information, to a person-centric system that aims to provide interesting, novel, personalized experiences to users. Its main role has moved from static storage and retrieval of information to facilitation of communication, collaboration, and other forms of interaction among scientists, researchers, or the general public on themes that are pertinent to the information stored in the Digital Library." (DELOS 2007, p. 14) The idea of allowing users to annotate digital library collection items is not new, but most previous efforts have been concerned primarily with annotations as supplemental avenues for retrieval (see, for example, Frommholz 2006; Golovchinsky, Price and Schilit 1999). However, the Digital Library for Earth System Education (DLESE) collaborators claim that digital library annotations can "engage the community" by allowing users write-access to the collection, and thus "capture diffuse and ephemeral information" (Arko et al. 2006). DLESE's focus is on the educational uses of digital libraries, and they used annotations of digital library collection items to capture content in the realm of effective pedagogical strategies surrounding collection items, as well as feedback about the system for iterative evaluation. Also, in both DLESE and the OpenDLib system (Agosti and Ferro 2003), annotations adhere to a formal metadata structure. This contrasts with a Web 2.0 approach, where the content and process of user content contributions are much less restricted, but the core issues of increasing user engagement and capturing user-generated content for the benefit of future users are the same. Previous studies of digital collection item annotations have tended to focus on task-based environments such as academic collaborations, where social convention tends to keep annotations formal and content-focused. However, as with annotations in paper books, sometimes the value of an annotation goes beyond its content. Marshall (1998) suggests that the very act of evaluating a handwritten annotation's relevance creates a level of critical engagement that would not happen while reading a clean copy of a book. Marshall studied university students' annotations in textbooks, and found that students preferred books that had been marked by previous readers, as long as the marks were intelligible. She also found that annotations serve many functions beyond formal analysis of content and concluded that digital library annotation functions should support: Naturally placed annotations, distinguishable from the source item Non-interpretative markings Fluidity of form Informal codings Smooth transitions between public and private annotations Integration with reading Similarly, Sherman (2008) studied marginalia in English Renaissance texts and found that students of the time were routinely taught that simply reading a book was insufficient. In order to have a "fruitful interaction" (p. 4) with a text, marking it up with one's thoughts and reactions was considered essential. Marginalia and other signs of engagement and use – even such apparently content-neutral additions as food stains – Sherman sees as valuable evidence of reader reaction, and the place of the physical information object in people's lives. Providing users the ability to annotate digital content also creates new streams of naturalistic evaluation data, evidence of engagement stronger than a page view or a link to the collection item from another page. In a study of flickr.com, Ames and Naaman (2007) created a taxonomy of motivations for annotation along two dimensions: sociality and function. The latter dimension echoes people's motivation to annotate printed textbooks: the function of making important or interesting passages more easily findable for later review. The sociality dimension is a component of the Web infrastructure – making photographs findable for others, and creating shared tagsets for people with similar interests, so they might collaborate more easily. In this sense, photographs are boundary objects (Star and Griesemer 1989), around which diverse individuals can interact and communities can build (Gal, Yoo and Boland 2006). Digital collection items can also be boundary objects, even if those conversations take place asynchronously. Can social annotations fit into current digital library architecture? Two concept maps in the DELOS 2007 reference model, in the Resource (Figure 1) and User (Figure 2) domains, suggest that they can. Giving users write-access to collections essentially means they would be creating a new resource type, one that need not append content directly to the item record, but may populate a separate table with an associative link. Figure 1 shows that according to the DELOS conceptual model, a Resource can have a "hasAnnotation" relationship with an Information Object that is conceptually equivalent to other metadata. Figure 1: DELOS Digital Library Resource Domain Concept Map (DELOS 2007) Similarly, the DELOS 2007 User domain concept map (Figure 2) shows that an End-user can have roles of Librarian, Content Consumer or Content Creator. Starting from the more general Actor node, there is a direct conceptual relationship with the Resource node; the Actor "isa" Resource, in the same sense as a collection item. Figure 2: DELOS Digital Library User Domain Concept Map (DELOS 2007) One promising application of Web 2.0-type collaborative annotation of digital collection items can be found in the PennTags project (Day 2006). Designed as both a personal resource and as a toolkit for collaboration around digital objects, PennTags allows social bookmarking and annotation of OPAC items as well as Web URLs and journal article database items (though the latter are available only within the University of Pennsylvania community). This brief review suggests that there is conceptual room for users as both interactors and content creators in digital libraries, and that annotations have been a historically valid form of user-generated content. Web 2.0 has provided an infrastructure within which users can participate, and when given the chance, they have done so enthusiastically. This leads to the research question driving this article: Can social computing functionality in the form of social annotation translate well to a digital library? To address this question, the results of a long-term participant observation of a Web 2.0 social question and answer site are analyzed, resulting in eight decision points that should be considered when deciding how or whether to incorporate social annotation in a digital library environment. 3. Research setting and method Answerbag (http://www.answerbag.com) is a social question and answer site designed around a "one question – multiple answers" architecture. Launched in 2003, Answerbag became the author's research testbed in 2004 and is now a thriving Web site with over 7 million unique visitors per month. Administrator-level access to all site data is available, and research is conducted as a participant observation. Users submit and rate questions and answers, and the highest-rated answers are listed first, serving a collaborative filtering function that still allows people to view the full range of answers to any question. Most relevant to this discussion is the answer comment function, which allows users to append uncontrolled content to any answer, essentially a social annotation. As with most digital libraries, Answerbag was not initially designed to be a social site. At first, only factual Q&A were permitted, and there was no user comment function, only answer ratings to fuel collaborative filtering. However, as so often happens, people began to use the site in ways beyond those the designers had intended. They used answer fields to conduct discussions about the finer points of questions and answers, many of which took the form of opinion. Once they had discovered other users with similar interests (or those with divergent views), they also used answer fields to communicate with one another on a purely social level. Social content was thought to dilute the value of the site and was removed by site moderators. But soon, moderators were removing more content than they were allowing. As site traffic grew, human review of every piece of content submitted to the site could no longer be maintained, and social questions and answers were allowed by default. When answer comments were introduced soon after, in mid-2005, page views and site traffic nearly doubled in the following four months. The primary goal of Answerbag – allowing users to integrate diverse perspectives through multiple, collaboratively rated answers to a single question – was now taking place at the level of individual answers as well. Answer comments allowed users to interact freely, build on one another's ideas and link to other content. More importantly, it created an environment where users could see the engagement of others with the question at hand and were encouraged to join the conversation. Figure 3 shows an excerpt from a typical Answerbag answer comment page. Both the high answer rating and the combined effect of other users' annotations add to the perceived authority of the original answer, creating a collaborative, participative response instead of a single person's opinion. Figure 3. Excerpt from an Answerbag answer comment page While Figure 3 is a rather whimsical example of social annotation, it provides clear evidence of engagement with the content beyond mere page views. Registered users now submit as many answer comments – social annotations – as they do answers. Applying this Web 2.0 model to digital libraries generates a number of design questions discussed in the following section. 4. Analysis and discussion Comparing the results of a long-term participant observation of the development and use of Answerbag's answer comment function with some previous attempts by digital libraries to include annotations summarized in the literature review reveals eight major decision points: Display Ease of annotation Anonymity Control of content Harvesting annotation content Ease of retrieval Traffic and network effects Notification and sharing While this list is neither exhaustive nor applicable to every situation, these decision points reflect some of the tensions that can result from applying an information sharing model from one environment to another. It should also be noted that these decision points are interdependent – choices in one area will expand or constrain options in another. 4.1 Display Most digital library interfaces have been professionally crafted for optimum usefulness and engagement for their target audience. The display represents and enhances the collection, the institution(s) providing access, and the individuals responsible for developing and maintaining the resource. Recalling Marshall (1998), how social annotations are placed and navigated in relation to the associated collection items is critical to identifying and balancing controlled and uncontrolled content. At a minimum, three options should be visible: one to "View X previous annotations," where X is the number of annotations all previous users have associated with the item, another to "Add annotation," and a third to "View my annotations." In this way, users must opt in to view annotations, providing a filter for users and designers who may feel uncomfortable with uncontrolled content. However, when Answerbag introduced answer comments, it was a new feature, unfamiliar to many longtime users. Simply providing a "View comments" link did not immediately result in clickthroughs. Defaulting to displaying all comments inline beneath the answer (see Figure 3) made users more aware of the comment feature and increased participation, but long comment threads made answer pages extremely long and difficult to scroll through. The compromise solution has been to show only the most recent comments inline, with a "View all comments" link available to access the entire thread (in Figure 3, the "View all comments" link has already been clicked). 4.2 Ease of annotation Annotating digital collection items should be as easy as making marginal comments in a physical book. Forcing users to populate multiple fields or to use pull-down menus for controlled data entry risks the reaction that annotating is more effort than it's worth. Answerbag invites users to "Add a comment" beneath any answer or existing comment thread via a simple button. The button opens a text entry box, positioned such that the answer and previous comments are viewable for context. Social annotation in a digital library environment should be equally low-effort. 4.3 Anonymity How users are given the option to identify themselves directly influences the quality of content they will contribute. If the use of real names is enforced, many people will be more hesitant to post content. For example, an early version of Answerbag was imagined for use in corporate Intranets, using an organization's policy manual as a question and answer framework, with staff comments around each answer as a way to capture and share tacit knowledge relevant to each procedure. Unsurprisingly, few employees were willing to submit critiques of current policy with their names attached. Conversely, a purely anonymous environment removes checks on behavior and can result in everything from unprofessional content to virtual vandalism. In a digital library environment, annotations may or may not need to be associated with a real name, or even a pseudonym, to convey authority. Whether names or pseudonyms are made viewable to other users or not, some identifying information must be collected if a "View my annotations" function is desired. From the example in Figure 3, Answerbag users are identified by pseudonyms of their choosing, and some small measure of accountability is enforced by making their user profiles public. The DLESE annotation metadata (Arko et al. 2005) includes fields for an annotator's name and/or organizational affiliation ("Contributor") and whether or not their contact information was to be displayed in the user interface ("Share"). 4.4 Control of content While allowing users to annotate without restriction can lead to meaningful knowledge discovery, it also invites profanity, chat conversations, spamming, and similar digital graffiti. While a personal or institutional philosophy of radical acceptance may seek to value and preserve all contributions, in most cases some oversight will be preferred. Allowing users themselves to perform this function typifies the Web 2.0 approach, either by allowing them to rate content on a scale of usefulness (however the rater defines it), or to flag inappropriate content, akin to the "Flag this answer" link in Figure 3. As Arko et al. (2006) report in DLESE, any feedback function necessitates regular human review, but the job of overseeing flag submissions allows librarians to see how the collection is being used, which can be a valuable source of naturalistic data. 4.5 Harvesting annotation content The study by Golovchinsky, Price and Schilit (1999) is an example of an attempt to use annotations as search queries. Annotation content can be appended to, or associated with, a digital collection item record in order to generate additional keyword hits at search time. In some cases, especially where the collection content or audience is highly specialized, annotations may be a source of supplemental content-bearing key terms and thus worthwhile to include. Since Answerbag has no restrictions on content or audience, and a significant percentage of comments are purely social (e.g., "thanks for your answer"), comments have not been made searchable. 4.6 Ease of retrieval People spend time annotating now to save time reviewing later. A digital annotation system should allow users to quickly access and review their previous annotations. The solution used by Answerbag is to provide each user a profile page from which their activity can be viewed at any time, by anyone. In a digital library environment this has implications for both privacy and activity tracking that may be at odds with institutional policies, but which can be addressed by adding a permissions layer. 4.7 Traffic and network effects A critical mass of traffic and participation is required to make any social computing function useful. For example, in the very early days of the Answerbag rating system, most answers on the site were unrated, rendering the collaborative filtering function essentially useless. If a digital library's traffic consists of a few users per year accessing collection items, the social part of the social annotation function will essentially fall away and would resemble annotation for personal use only. Interaction will be one-way, from past annotators to future readers, following the used textbook model, with no timely discussion around collection items. This is particularly true in an academic environment, where library usage tends to be linked to coursework and research projects with definite end dates. However, in an academic setting, interactions could be encouraged or orchestrated surrounding an assignment, and possibly integrated with courseware. Social annotations could also be a forum for outreach following the book club model; non-traditional audiences could interact around digital collection items in a guided or semi-guided activity. A formal or informal cost-benefit analysis would need to reflect the goals of the institution, and the extent to which traffic reflects success. 4.8 Notification and sharing Two of the primary engines of Web 2.0 are the ability to create update notifications and to share content across different sites. Users can set up RSS feeds and receive alerts when certain conditions are met, such as when content they have created draws a response. Providing tangible evidence that the effort they took to post was not in vain encourages people to return and continue the conversation. Similarly, articles on many sites include a link and icon inviting users to, for example, "Digg this article," which registers it on Digg.com, a social article popularity ranking site. If digital libraries invited users to share collection item pages via email and other social bookmarking sites, usage would predictably increase – as would instances of people harvesting and repurposing content. In a Web 2.0 environment, users expect the freedom to share content across sites and to use it for their own purposes. Digital library content would very likely find its way onto MySpace pages or YouTube videos, in original or customized form. The core issue that needs to be confronted is one of prioritization: maximize usage, or maximize control? Though institutional policy and intellectual property rights will dictate that decision for some collection items, a user-centered focus requires that truly open access be considered. 5. Conclusion Imagining students, researchers and the public interacting around a digital library collection item via social annotations is an attractive idea. An unobtrusive list of social annotations associated with digital library collection items would allow alternative views of digital content, and create a sense of collaborative endeavor. Taking advantage of the Web 2.0 infrastructure and embracing a philosophy of releasing control of collection items would open the conversation to an even wider audience. However, the technological barriers are not as considerable as the institutional barriers. Digital libraries, as institutions, services and as a research field, have created professional standards that have resulted in innovative systems, high expectations of service quality, and codified best practices. It is natural for digital library professionals to believe that we already know what users value. These assumptions need to be continually questioned, especially in light of other information models, like Web 2.0, whose sheer popularity necessitates consideration in more traditionally formal information environments. In a review and synthesis of digital library evaluation concepts, Saracevic (2000) concludes that digital libraries provide for interaction among people, and that the ultimate question for evaluation of digital libraries is "How are digital libraries transforming research, education, learning and living?" (p. 368). Encouraging users to engage as freely as possible with both collection items and one another seems a logical strategy to address this question. Just like a dog-eared textbook or a toy-strewn living room, social annotations may make a digital library look messy, but there is value and life in a physical or virtual space that has a lived-in, well-used and well-loved appearance. References Ackerman, Mark S. (1994). Providing Social Interaction in the Digital Library. Proceedings of Digital Libraries '94: First Annual Conference on the Theory and Practice of Digital Libraries (College Station, TX), 198-200. Agosti, Maristella and Nicola Ferro (2003). Annotations: Enriching a Digital Library. In Panos Constantopoulos and Ingeborg T. Sølvberg, editors. Research and Advanced Technology for Digital Libraries. Proceedings of the European Conference on Digital Libraries (ECDL 2003), Lecture Notes in Computer Science, Heidelberg et al., 2003. Springer, 88-100. Ames, Morgan and Mor Naaman (2007). Why We Tag: Motivations for Annotation in Mobile and Online Media. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, San Jose, CA, 971-980. Arko, Robert A., Kathryn M. Ginger, Kim A. Kastens and John Weatherley (2006). Using Annotations to Add Value to a Digital Library for Education. D-Lib Magazine, 12(5) <doi:10.1045/may2006-arko>. Borgman, Christine L. (1999). What are Digital Libraries? Competing Visions. Information Processing and Management, 35(3), 227-243. Day, Annette (2006). Using Social Bookmarks in an Academic Setting – PennTags. <http://units.sla.org/division/dche/2006/day.pdf>. DELOS (2007). The DELOS Digital Library Reference Model: Foundations for Digital Libraries, version 0.96, (November 2007). <http://www.delos.info/files/pdf/ReferenceModel/DELOS_DLReferenceModel_096.pdf>. Frommholz, Ingo (2006). What did the Others Say? Probabilistic Indexing and Retrieval Models in Annotation-based Discussions. TCDL Bulletin, 2(2). <http://www.ieee-tcdl.org/Bulletin/v2n2/frommholz/frommholz.html>. Gal, Uri, Youngjin Yoo and Richard J. Boland, Jr. (2004). The Dynamics of Boundary Objects, Social Infrastructures and Social Identities. Sprouts: Working Papers on Information Environments, Systems and Organizations. 4(4), 193-206. <http://sprouts.aisnet.org/123/1/040411.pdf>. Golovchinsky, Gene, Morgan N. Price and Bill N. Schilit (1999). From Reading to Retrieval: Freeform Ink Annotations as Queries. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, 19-25. Lesk, Michael (1999). Expanding Digital Library Research: Media, Genre, Place and Subjects. Proceedings of the International Symposium on Digital Libraries 1999: ISDL'99, Tsukuba, Ibaraki, Japan, September 1999, 51-57. Marshall, Catherine C. (1998). The Future of Annotation in a Digital (Paper) World. Paper Presented at the 35th Annual GSLIS Clinic: Successes and Failures of Digital Libraries, University of Illinois at Urbana-Champaign, March 24, 1998. Saracevic, Tefko (2000). Digital Library Evaluation: Toward an Evolution of Concepts. Library Trends, 49(3), 350-369. Sherman, William H. (2008). Used Books: Marking Readers in Renaissance England. Philadelphia: University of Pennsylvania Press. Star, Susan Leigh and James R. Griesemer (1989): Institutional Ecology, 'Translations' and Boundary Objects: Amateurs and Professionals in Berkeley's Museum of Vertebrate Zoology. Social Studies of Science, 19, 387-420. Copyright © 2008 Rich Gazan

	Top \| Contents Search \| Author Index \| Title Index \| Back Issues Previous Article \| Next Article Home \| E-mail the Editor

	D-Lib Magazine Access Terms and Conditions doi:10.1045/november2008-gazan