Articles
spacer

D-Lib Magazine
September 2001

Volume 7 Number 9

ISSN 1082-9873

HILT - High-Level Thesaurus Project

Building Consensus for Interoperable Subject Access across Communities

 

Susannah Wake
HILT Research Assistant
susannah.wake@strath.ac.uk

Dennis Nicholson
HILT Project Director
d.m.nicholson@strath.ac.uk

Center for Digital Library Research
http://cdlr.strath.ac.uk/
University of Strathclyde, Glasgow, UK.

Red Line

spacer

Abstract

This article provides an overview of the work carried out by the HILT Project <http://hilt.cdlr.strath.ac.uk> in making recommendations towards interoperable subject access, or cross-searching and browsing distributed services amongst the archives, libraries, museums and electronic services sectors. The article details consensus achieved at the 19 June 2001 HILT Workshop and discusses the HILT Stakeholder Survey.

1. The Problem

In 1999 Péter Jascó [1] wrote that "savvy searchers" are asking for direction. Three years later the scenario he describes, that of searchers cross-searching databases where the subject vocabulary used in each case is different, still rings true. Jascó states that, in many cases, databases do not offer the necessary aids required to use the "preferred terms of the subject-controlled vocabulary" [1]. The databases to which Jascó refers are Dialog and DataStar. However, the situation he describes applies as well to the area that HILT is researching: that of cross-searching and browsing by subject across databases and catalogues in archives, libraries, museums and online information services. So how does a user access information on a particular subject when it is indexed across a multitude of services under different, but quite often similar, subject terms? Also, if experienced searchers are having problems, what about novice searchers? As information professionals, it is our role to investigate such problems and recommend solutions.

Although there is no hard empirical evidence one way or another, HILT participants agree that the problem for users attempting to search across databases is real. There is a strong likelihood that users are disadvantaged by the use of different subject terminology combined with a multitude of different practices taking place within the archive, library, museums and online communities. Arguably, failure to address this problem of interoperability undermines the value of cross-searching and browsing facilities, and wastes public money because relevant resources are 'hidden' from searchers.

HILT is charged with analysing this broad problem through qualitative methods, with the main aim of presenting a set of recommendations on how to make it easier to cross-search and browse distributed services. Because this is a very large problem composed of many strands, HILT recognizes that any proposed solutions must address a host of issues. Recommended solutions must be affordable, sustainable, politically acceptable, useful, future-proof and international in scope. It also became clear to the HILT team that progress toward finding solutions to the interoperability problem could only be achieved through direct dialogue with other parties keen to solve this problem, and that the problem was as much about consensus building as it was about finding a solution.

This article describes how HILT approached the cross-searching problem; how it investigated the nature of the problem, detailing results from the HILT Stakeholder Survey [2]; and how it achieved consensus through the recent HILT Workshop [3].

2. HILT Project Overview

By definition, HILT Project funders [4], partners [5] and stakeholders [6] have a common interest in facilitating user searching and browsing by subject, whether this be within a single service (e.g., the SLAINTE service [7]), across a group of similar services (e.g., the Resource Discovery Network (RDN) [8]), or across a group of services spanning different sectors, domains, regions, professions, languages, countries, or any mixture of these. The HILT Project aims to determine how this requirement to offer users subject searching and browsing -- or, more commonly, cross-searching and browsing -- can best be met when the various communities, services and initiatives involved (archives, Higher Education (HE), Further Education (FE), libraries, museums, projects, online information services and hubs) frequently have different requirements, take different approaches and, more often than not, use different subject schemes.

Having reached a consensus at the HILT Workshop (described later in this article), the HILT team is now in the process of finalising its interim report, which will be disseminated for feedback before being evaluated and then published to the wider community.

3. HILT Stakeholder Survey

At the beginning of the project, the HILT team debated the nature of the cross-searching problem and even considered whether a real problem existed. As it was not in the remit of the HILT Project to perform a user study, the team decided that the most appropriate action would be to survey the HILT stakeholders in order to benefit from their perspectives within the working environment.

HILT devised a questionnaire that was sent to 46 HILT stakeholders comprised of projects, services, organisations, and institutions representing all the abovementioned communities that do indexing by subject. The majority of the surveyed stakeholders are in the UK. However, in order to gain an international perspective, HILT also surveyed parties from Australia, Canada, and the USA. Forty-two stakeholders completed and returned questionnaires. Results from key areas of the questionnaire are presented below.

3.1 Summary of Major Findings

In order to determine where the cross-searching problems lie, the questionnaire was structured in four broad sections: subject access, data access, staff and user requirements, and problems and issues.

3.1.1 Subject Access

The section on subject access was designed to discover how stakeholders organise their resources and what practices they follow. Results of the survey showed that:

  • Library of Congress Subject Headings (LCSH) is the most commonly used controlled vocabulary, followed by Dewey Decimal Classification (DDC) and the United Nations Educational, Scientific and Cultural Organization (UNESCO) Thesaurus [9]:
    • LCSH is most commonly used amongst library stakeholders, and to a lesser extent LCSH is also used by stakeholders in online information services, archives, museums and special collections;
    • DDC is most commonly used by library and online information service stakeholders and, to a much lesser extent, by museums (in their libraries);
    • the UNESCO Thesaurus is used primarily by archives stakeholders and, to a lesser extent, by online information services and government agencies.
  • A number of stakeholders representing various sectors, use custom, in-house schemes, some of which schemes are being discovered and used by the wider community.
  • 64% of stakeholders currently use schemes they have adapted for local use. Reasons given for adapting the schemes include:
    • to overcome weaknesses in subject areas;
    • to represent new concepts not yet accounted for in a scheme;
    • to represent differences in concepts/collections;
    • to represent a defined geographic location;
    • to complement an established scheme until an overriding solution to combine them can be found.
  • Retroconversion and legacy data present a large problem.

3.1.2 Data Access

The aim of this section of the questionnaire was to gain an overall idea of how users access stakeholder resources:

  • Exactly half of stakeholders are able to have their databases/catalogues cross-searched with others:
    • 36% of stakeholders facilitate cross-searching through Z39.50 protocol;
    • 17% of stakeholders facilitate cross-searching through WWW;
    • Other options available include Telnet and Internet Search Engine.

3.1.3 Staff and User Requirements

Any viable solution has to benefit the user. This section of the questionnaire aimed to discover whether cross-searching is needed by stakeholders and their users:

  • 95% of stakeholders said that, to the best of their knowledge, their users require subject access to their resources;
  • 93% of stakeholders said that, to the best of their knowledge, their staff require subject access to their resources;
  • 83% of stakeholders said that, to the best of their knowledge, their users would find it useful to be able to cross-search their catalogues with other catalogues;
  • 79% of stakeholders said that, to the best of their knowledge, their staff would find it useful to be able to cross-search their catalogues with other catalogues;

3.1.4 Problems and Issues

Recommending solutions for the interoperability problem of cross-searching and browsing by subject raises many terminological as well as political issues. Survey responses to this section of the questionnaire revealed the following:

  • 62% of stakeholders said they would seriously consider adopting a new high-level terminological scheme if it was widely used by other services of interest to their users;
  • 36% of stakeholders said their response on whether they would seriously consider adopting a new high-level terminological scheme would depend on the following factors:
    • whether the new scheme would benefit the individual stakeholder, the wider community, and the user;
    • whether the scheme offered additional benefits over and above their existing approach;
    • what the degree of difference would be between the new scheme and their existing scheme;
    • whether the change to a new scheme fitted with organizational objectives;
    • whether it was feasible and worthwhile to commit the time, effort, and expense required to make the change to a new scheme;
    • whether or not existing subject data could be mapped to the scheme being proposed.

The overall response to the HILT stakeholder survey was very positive. Most stakeholders know that providing optimal searching services to their users necessitates looking beyond subject access to their local collections, to access to the collections of the wider community, which -- depending on the topic being searched -- could span sectors. However, stakeholders also know that change implies costs for which resources must be found. As most funding is allocated to a project and not on an institutional or service basis, these costs to change to a new scheme could be problematic [10]. In addition, potentially at least, change can cause considerable disruption for users. It is the stakeholders' responsibility to weigh the costs and benefits and try to find a balance that complements both needs.

For more details about the survey, see the HILT Stakeholder Survey [Wake 2001].

4. HILT Workshop

Having determined the various practices used by the different communities, the HILT team, along with the HILT Project Management Group and Steering Group, devised a range of Option Sets distilled from a set of collaborative hypotheses. The Option Sets were then put forward at the HILT Workshop held on 19 June 2001. With a view to achieving consensus on which of the options offered the best route forward, HILT brought together 50 delegates representing archives, libraries, museums, and online information services.

4.1 Workshop Structure: Process to Consensus

Workshop delegates were assigned to one of four breakout groups that were balanced to ensure fair representation of the different communities. The breakout sessions used the document HILT Workshop Breakout Sessions: Discussion Issues [11] as a basis for their discussions. The document detailed the Option Sets and provided arguments for and against each option so delegates could make informed decisions. Each group was given the task of reaching consensus in choosing the following: a preferred Option Set; an option from that Option Set; an additional option from any other Option Set; and an alternative option. Following the option selection process, each group reported and discussed its choices at the plenary session.

4.2 Option Sets

[Option Set 1] Do nothing:

  1. Artificial Intelligence will solve it in time.
  2. Big business -- Microsoft or similar -- will solve it.
  3. It is not important.
  4. No solution is necessary.
  5. The problem cannot be solved.

[Option Set 2] Set up a human process intended to lead to a solution in time:

  1. Set up a Terminologies Agency, perhaps based on National Libraries/mda/National Council on Archives.
  2. Set up an inter-domain, inter-sectoral Task Force to move the communities towards a solution.
  3. Set up a Terminologies Agency and a Task Force.

[Option Set 3] Adopt a base-level, gradual approach, with an eye on future developments:

  1. Adopt a single scheme such as DDC and apply it to all collection level descriptions in the UK.
  2. Gradually create inter-service and inter-community terminology 'cross-walks', eventually building up to a partial but adequately broad solution.
  3. Aim to solve the problem for electronic services only, perhaps via the semantic web vision.
  4. Provide more flexible retrieval facilities for users.
  5. One or more of these four together (please specify).

[Option Set 4] Adopt a single scheme:

  1. Adopt: LCSH/UNESCO/DDC/UDC/AAT/Another scheme (say which)/A New Scheme in addition to the existing scheme used by any given service [Please specify which scheme]
  2. Adopt: LCSH/UNESCO/DDC/UDC/AAT/Another scheme (say which)/A New Scheme instead of the existing scheme used by any given service [Please specify which scheme]
  3. Adopt a single scheme: without retroconversion of legacy metadata/with retroconversion funded by the host organisation/with retroconversion funded centrally [Specify which]

[Option Set 5] Mapping service alternatives:

  1. Set up a mapping service, ideally with international participation and support, and gradually build towards a complete mapping of LCSH, UNESCO, UDC, and AAT to a DDC backbone. Include local adaptations and extensions from major services such as the National Libraries. Use the international service with the mapping of UK adaptations and extensions as a model for other countries. Determine and implement the best international funding and maintenance model.
  2. Set up a two-year mapping service pilot to measure costs against benefits of both a full-scale service and all of the various alternative responses detailed on this page.

4.3 HILT Workshop Conclusions

There was clear consensus from all four breakout groups that the best way forward for HILT was the pilot mapping service, listed as option 5.2, combined to an extent with a terminologies task force or agency, outlined in Option Set 2. Workshop delegates favoured a pilot mapping service because a mapping service was viewed as the best -- and possibly the only -- basis for consensus between the stakeholder communities. The pilot mapping service also solved a greater part of the problem more quickly than any other option, due to the fact that a pilot would allow HILT to collect more information on costs against benefits as well as information on a range of other issues (e.g., costs, user needs, user terminology as 'spine,' how best to incorporate semantic web and artificial intelligence developments, design, and so on) before making a long-term commitment.

The service envisaged -- probably based on an existing commercial approach such as Wordmap [12] or Semio [13] -- would map key schemes like LCSH, UNESCO, DDC, Universal Decimal Classification (UDC), Art and Architecture Thesaurus (AAT), possibly user and regional terminologies, and local adaptations of standard schemes (perhaps using one of them as the central spine of the approach). Users would be able to: input the term or terms that describe their problem using the terminology that is most meaningful to them; specify their query more closely if necessary by specifying a context (e.g., lotus the flower, or lotus the software, or lotus the position); and obtain a list of equivalent or near-equivalent terms with which they could then cross-search or cross-browse the various services [14]. Due to the variety of technologies in use throughout the different communities, technical issues should not be underestimated. Much work would be required to interface such a system with the many services -- Z39.50 services in particular. However, work in CAIRNS [15], which interfaces web-based Z39.50 clients with an sql-compliant collection descriptions database, suggests that this is quite feasible.

A range of issues and suggestions came out of the discussion at the plenary session. Discussions indicated that the proposed pilot service should aim to:

  • have a strong user focus;
  • be implemented quickly to stop the problem worsening;
  • determine reliable data on costs, as well as costs against benefits at various service and mapping levels as a key deliverable;
  • use existing machine-readable mappings wherever possible;
  • use contexts, relationships, clustering etc.;
  • investigate how best to integrate semantic web and artificial intelligence developments;
  • look at user terminologies as against DDC as the central spine to which other schemes are to be mapped;
  • involve major international players in funding and management;
  • involve a broad range of target services;
  • be closely linked to a cross-sectoral and cross-domain task force at a practical level;
  • define terms such as 'mapping', 'task force', and 'terminologies agency' more closely.

It was also clear from discussion in the plenary session that everyone agreed a problem exists and that doing nothing is not acceptable, as suggested in Option Set 2. Additionally, no one had any enthusiasm for Option Set 4, which suggested the adoption of one or other existing schemes, even as a fall back option. Even adopting DDC, regarded by HILT as the most likely scheme on which there might be consensus, was not seen as a way forward in this respect (although it was seen as a key part of a mapping service).

5. The Future

Currently, the option of the two-year pilot mapping service described above is being canvassed with a wider group of stakeholders. Given the degree of consensus achieved in the HILT Workshop, there is reason to hope that a recommendation to proceed on follow up research based on a pilot terminologies mapping service will meet with general approval. The next stage is crucial for the development of the global information landscape, one that would give direction to Jascó's 'savvy searchers' and turn consensus into practical progress.

Notes and References

[1] Péter Jascó. "Savvy Searchers Do Ask For Direction." Online and CD-ROM Review, 23 (2), 1999. pp 99-102.

[2] Susannah Wake. HILT Stakeholder Survey: Results, March 2001. <http://hilt.cdlr.strath.ac.uk/Reports/SurveyResults.html>.

[3] Dennis Nicholson and Susannah Wake. HILT Workshop: Report and Conclusions, June 2001. <http://hilt.cdlr.strath.ac.uk/Dissemination/WorkshopNew.html>.

[4] HILT Funders are the Research Support Libraries Programme (RSLP) <http://www.rslp.ac.uk/> and the Joint Information Systems Committee (JISC) <http://www.jisc.ac.uk/>.

[5] Partners are: CDLR (lead), mda, National Council on Archives, National Grid for Learning: Scotland, Online Computer Library Centre <http://www.oclc.org/>, Scottish Library and Information Council (SLIC) <http://www.slainte.org.uk/Slic/slichome.htm>, Scottish University for Industry (SUfI) <http://www.scottishufi.co.uk> and UK Office for Library and Information Networking (UKOLN) <http://www.ukoln.ac.uk/>.

[6] HILT Stakeholders. <http://hilt.cdlr.strath.ac.uk/AboutHILT/stakeholders.html>.

[7] SLAINTE. <http://www.slainte.org.uk/slainte.html>.

[8] Resource Discovery Network (RDN). <http://www.rdn.ac.uk/>.

[9] UNESCO Thesaurus. <http://www.ulcc.ac.uk/unesco/>.

[10] Sarah Currier. HILT Focus Group Report. <http://hilt.cdlr.strath.ac.uk/Reports/Focus2602.html>.

[11] Dennis Nicholson. HILT Workshop Breakout Sessions: Discussion Issues. <http://hilt.cdlr.strath.ac.uk/Dissemination/WorkshopNew.html>.

[12] Wordmap. <http://www.wordmap.com/>.

[13] Semio. <http://www.semio.com/>.

[14] For an example of this type of service see <http://www.wordmap.com/> and try typing in 'lotus' then choosing the last context option (starts science > biology). You will see the additional terms. The software will let you control the nature of the search sent using the Boolean operator OR, so synonym A OR synonym B OR synonym C would be one search possibility.

[15] Co-operative Academic Information Retrieval Network for Scotland: CAIRNS. <http://cairns.lib.gla.ac.uk/> and <http://cairns.lib.strath.ac.uk/>.

Copyright 2001 Susannah Wake and Dennis Nicholson
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | In Brief
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/september2001-wake