Spacer Search  |    Back Issues  |    Author Index  |    Title Index  |    Contents
Spacer  
Spacer

In Brief

Spacer

D-Lib Magazine
July/August 2003

Volume 9 Number 7/8

ISSN 1082-9873

In Brief


Building a More Meaningful Web: From Traditional Knowledge Organization Systems to New Semantic Tools

Contributed by:
Gail Hodge
Senior Information Scientist
Information International Associates, Inc.
Havertown, PA USA
<gailhodge@aol.com>

The 6th Networked Knowledge Organization Systems/Sources (NKOS) Workshop, organized by Gail Hodge, Dagobert Soergel, and Marcia Zeng on May 31, 2003 in Houston, TX as part of JCDL 2003, focused on the transformation of traditional knowledge organization systems (KOSs), such as classification schemes and thesauri, to new forms of knowledge representation such as ontologies, topic maps, and semantic Web components in which the requirements of computer processing are met. The application of principles from more traditional practices to the design of new knowledge organization systems and exploitation of the extensive intellectual capital available in traditional KOSs were discussed.

Newer tools for computer-based analysis and reasoning require richer constructs, more explicit relationships, standard syntax to perform decomposition, and knowledge of the domain (Dagobert Soergel). The most complex traditional tools generally support only the standard broader term (BT), narrower term (NT) and related term (RT) relationships. These relationships embody a variety of more specific relationships, but there is insufficient differentiation, particularly among RT relationships, for rule based reasoning by computer. Specific examples from the Food and Agricultural Organization's AGROVOC Thesaurus showed how some relationships can be successfully extracted but where the lack of specificity and irregularities in the original construction limit the thesaurus' ability as a development tool for an ontology (Frehiwot Fisseha).

Despite limitations, several traditional tools have been used in more semantically meaningful applications. Adding the NASA Thesaurus to a general vocabulary in a commercial tool for video indexing significantly improved the search results and the hierarchical structure of the terms supports query expansion and improves the identification of retrieval intervals, segments of video considered applicable to a users' query (Gail Hodge and Janet Ormes). Thesaurus constructs and principles of faceted analysis and concept maps support the reuse and navigation of learning objects (images, text, simulations and models) in a geography course by providing structured information about scientific concepts in the domain (Marcia Zeng for Terence Smith). Semantic networks, ontologies, and concept maps can benefit from traditional controlled terminology constructs such as thesauri and classification schemes to improve the organization, retrieval and visualization for digital libraries (Marcos Andre Gonçalves).

The workshop concluded that the intellectual capital contained in traditional tools can be mined for use in newer tools, but it cannot be an automatic process. One possibility is to consider terminology Web Services, particularly as part of knowledge management initiatives (Adam Farquhar). By using Web service technology, the terminology can be incorporated as a component by various corporate systems. Attention to standards, including the revision of NISO Z39.19 Standard for Monolingual Thesauri, is important (Amy Warner).

The participants identified several follow-on activities:

  • Identify a core set of relationships that would further define the traditional RT with extensibility for specific domain relationships.
  • Develop data exchange/interchange formats in XML/RDF.
  • Develop use cases to better define terminology Web services.
  • Define a tool suite for converting traditional tools into intermediate formats that can support the development of new semantic tools.
  • Evaluate the cost and benefits of reusing traditional tools versus building new tools "from scratch", particularly with regard to machine-to-machine in the KM environment.
  • Investigate licensing versus public domain business models for terminology.

NKOS is a community of researchers, developers and practitioners seeking to enable knowledge organization systems, such as classification systems, thesauri, gazetteers, and ontologies, as networked interactive information services to support the description, retrieval, and presentation of diverse information resources through the Internet. Presentations are available from the NKOS Web site (http://nkos.slis.kent.edu).


Information Visualization Interfaces for Retrieval and Analysis (IVIRA) Workshop Summary

Contributed by:
Javed Mostafa
Laboratory for Applied Informatics Research
<http://lair.indiana.edu>
Indiana University, Bloomington
Bloomington, Indiana, USA
<jm@indiana.edu>

The IVIRA workshop was held on Saturday May 31st in conjunction with the Joint Conference on Digital Libraries 2003 at the Rice University Campus in Houston, Texas. Katy Börner and Javed Mostafa jointly organized the workshop.

Javed Mostafa opened the half-day with a brief introduction in which he summarized the significance of three key areas in information retrieval visualization: functions, infrastructure, and evaluation.

  • Functions of IV/IR systems include "prototypical" ones such as browse, search, refinement, and presentation. However, increasingly users demand more — features and tools to help them interactively mine data, generate patterns, and conduct analysis on data.
  • In IV/IR systems infrastructure can play a critical role by providing access to different standard corpora for training and testing systems, pre-established results for validation, and well-tested modules to reduce development time.
  • Evaluation is a related area requiring attention especially in an emerging field such as information visualization as evaluation can guide development and ensure that new techniques are based on sound empirical studies.

Javed concluded his introduction by observing that IV/IR systems are now being developed and used in a wide variety of application domains, ranging from music to proteomics; hence, it is imperative that researchers and practitioners reflect on the three general areas (i.e., function, infrastructure, and evaluation) in terms of domain-level implications.

Seven papers were presented at the workshop, each lasting approximately 20 minutes. The visualization approaches discussed can be grouped into three broad categories: user-centered, metadata-driven, and data mining applied on raw data.

The first paper entitled "Collection understanding through streaming collage" was presented by Michelle Chang. She discussed a technique that is unusual in that its main aim is to support comprehensive understanding of an image collection instead of facilitating retrieval of a narrow subset from the collection. The presentation method takes advantage of Andruid Kerne's "Collage Machine" to generate streaming collages from the collection. Users can review randomly produced collages from the collection, manipulate collages by moving around individual images displayed, or filter collages by specifying metadata values associated with images. This is an example of a user-centered approach with significant control given to the user for manipulating the presentation of information.

The second presentation was by Yueye Fu and focused on visualization of email-boxes with the main goal of identifying patterns in communication behavior. A set of emails related to a collaborative software development project was used as the "collection" to illustrate the application of the Treemap visualization algorithm. Two Treemap visualizations were produced: one that coded individual senders differently based on the amount of emails each contributed and the other grouped emails based on specific projects. The Treemap approach presented is an example of a basic data mining technique applied on raw data to produce visualizations.

Eric Isaacson was the third presenter and his paper dealt with content visualization of a music digital library. Eric discussed several techniques that aid in visualizing musical scores and a technique that permits display of different homogeneous sections of music over time. An important aspect of the visualization techniques discussed by Eric was the fact that they permit the user to annotate or "edit" the presentation and thus provide a significant degree of control to the user to manipulate the visualization.

Following a brief break after the third presentation, the workshop continued with a presentation by Carlos Monroy. He described a system to create, visualize, and analyze series of art objects. For a given collection of art object images, the system permits the user to browse and select a subset, group them as a series, and save the series. Thus the user can build a catalog of self-created series. Additional functions offered include analysis of series content by creation time-line, animation of series content, and comparison of up to four series in parallel. This system is also a good example of a user-centered approach.

Next, Rao Shen described the GetSmart project — a collaborative effort between Virginia Tech and University of Arizona to integrate digital library tools in education. For this project, a knowledge visualization method has been implemented called "concept maps": the method involves displaying prominent concepts and association as a network graph with connection among nodes representing associations among concepts. Rao mentioned that such maps have been helpful in summarizing the main themes of a domain, allowing students to check their own understanding, and promoting collaborations.

The next paper presented was on development of a search system named Oncosifter that allows users to locate information on cancer using multiple means. One particular function supported by this system is search based on visualization of key concepts. The system maintains a database of "consumer health vocabularies" that are presented to the user as hyperbolic tree structure – the user can begin at any of the dozen or so top level concept nodes and gradually refine the information needed by selecting lower level concepts that are also presented as clustered branches in the visualization. Based on the concepts selected the MedlinePlus and CancerGov databases are queried to retrieve related documents. Oncosifter is an example of a system that generates visualization from metadata that are then used to support retrieval functions.

The final paper was presented by James Cooper of IBM who discussed development of a concept relation discovery system in the biomedical domain. It uses a noun and technical term discovery library called JTalent, the output of which is filtered using the Medical Subject Headings (MeSH) dictionary. The MeSH dictionary is used to identify biomedical terms and a mutual information formula is employed to calculate association weights among terms. Finally, using a spring graph algorithm, the terms and term associations are displayed for the user to browse. This research is clearly in the category of data mining applied on raw text.

The workshop concluded with a brief discussion of the highlights of the presented techniques and how they relate to the basic taxonomy of user-centered, metadata-driven, and data mining approaches. It was pointed out by Javed that the workshop was being held in conjunction with one of the major digital libraries forums, and it is not unrealistic to expect that a large digital library may cover the whole range of basic data formats, i.e., text, image, audio, and video. Hence, a challenge for researchers and practitioners is to develop scalable information visualization techniques that can also be generalized across heterogeneous data formats.

Papers as well as presentation slides of all talks are online at <http://vw.indiana.edu/ivira03>.

Acknowledgments

The organizers are grateful to the program committee members who helped in reviewing papers. They are:

Kevin Boyack, Sandia National Laboratory, USA
Robin Burke, DePaul University, USA
Chaomei Chen, Drexel University, USA
Martin Dodge, University College London, UK
James French, University of Virginia Charlottesville, USA
Xia Lin, Drexel University, USA
André Skupin, University of New Orleans, USA
Kiduk Yang, Indiana University, Bloomington, USA

Related Sites:

SPIE Conference on Visualization and Data Analysis, San Jose, CA, USA, January 18-22, 2004. <http://vw.indiana.edu/vda2004>.

Special Issue on Collaborative Information Visualization Environments in PRESENCE: Teleoperators and Virtual Environments. Submission Deadline: September 1st, 2003. <http://vw.indiana.edu/cive03/03-cive-presence.pdf>.

Symposium on Collaborative Information Visualization Environments, IV 2003, London, UK, July 16-18, 2003. <http://vw.indiana.edu/cive03>.


Report on the "OAI Metadata Harvesting Workshop" at JCDL 2003

Contributed by:
Simeon Warner
Research Associate
Cornell University
Ithaca, New York, USA
<simeon@cs.cornell.edu>
<http://www.cs.cornell.edu/people/simeon>

The "OAI Metadata Harvesting Workshop" was help on Saturday 31 May as part of JCDL 2003. There were 11 participants including OAI service provider implementers, data provider implementers and researchers, from both the US and Europe. The participants were: Donatella Castelli, Naomi Dushay, Ed Fox, Tom Habing, Kat Hagedorn, Terry Harrison, Xiaoming Liu, Michael Nelson, Heinrich Stamerjohanns, Jewel Ward and Simeon Warner. Most participants made short presentations to highlight interesting topics or issues, and time was allocated for discussion following each presentation.

Several recurring themes emerged as the workshop progressed. These were: the need for better documentation, issues of metadata quality, and ideas for additional services. I will report on each theme separately.

The need for additional and improved documentation was mentioned many times. Apart from minor corrections and clarifications, few issues were raised regarding the OAI-PMH specification. However, as OAI-PMH servers and services are now being deployed using the various programs and toolkits available, it is apparent that we need documentation suitable for these users. Both the OAForum and the NSDL are working on OAI documentation and tutorials. These and other resources need to be linked from the OAI website. At an even higher level, some very basic education and dissemination is required to dispel a number of persistent misunderstandings about exactly what the OAI framework does and does not provide. In particular, the OAI must address the common misunderstanding that "OAI is just about Simple Dublin Core". The OAI-PMH actually supports any metadata format that can be described with an XML schema. Dublin Core is mandated only to provide a baseline for OAI-wide interoperability.

Representatives of harvesting projects, especially the NSDL and PhysDoc, reported widely varying metadata quality and said that metadata normalization is essential. Both have developed heuristics for data cleaning and find it necessary to hand-customize these algorithms on a per-repository basis. While these problems are not new to the OAI there is clearly considerable scope for development of tools and improved practices to support the creation of services on metadata from many sources and of varying quality.

There were several suggestions for additional infrastructure components to provide specialized services. In all cases, discussion lead us to believe that these could be provided at the service-provider level of OAI-PMH, without need to change the protocol. The suggestions included services to aid the identification of duplicate records; to create maps of repositories and proxies; to track deleted records in upstream repositories; and to classify repositories based on content, size, update schedule, availability, etc.

Participants mentioned numerous private and intra-net OAI-PMH implementations, making it apparent that the OAI-PMH is used more widely than the number of registered repositories and services suggests. The community building work of the Open Languages Archives Community (OLAC) was admired and it was agreed that more community-specific development is required within the e-prints community. Along with the development of higher level documentation, the draft static repository specification was considered an important way to encourage participation by further lowering of the barrier to OAI interoperability.

Further details, including slides from the presentations, are available from the workshop web site at <http://www.cs.cornell.edu/people/simeon/workshops/JCDL2003>.


Cross-cultural Considerations in Digital Library Research: Report from the JCDL 2003 Workshop

Contributed by:
Nadia Caidi
University of Toronto, Canada
<caidi@fis.utoronto.ca>

Anita Komlodi
University of Maryland, Baltimore County
<komlodi@umbc.edu>

The scope and reach of digital libraries (DL) is truly global, spanning geographical and cultural boundaries, yet few scholars have investigated the influence of culture as it pertains to the design and use of digital libraries. The aim of the workshop on Cross-Cultural Usability for DLs organized at JCDL 2003, was to identify the areas of research at the intersection of digital libraries and culture. The topics and questions outlined below provide a roadmap for research in this area.

1. Digital libraries and cultural heritage

The advent of digitization has transformed what it means to acquire, control, deliver and use information resources in society. Communities are able to publish and disseminate information about their cultural heritage and create their own digital libraries. Beyond the potential for diversity, plurality of voices and empowerment of these communities, new issues were identified by workshop participants, such as:

  • Whose culture is being re-presented online (e.g., biases, content selection and relevance, self identification vs. perception by others)?
  • Is it possible (or desirable) to digitally 'backup' a culture?
  • What are the implications and dangers of objectifying culture (representation vs. experience)?
  • What are the access vs. ownership issues over digital cultural heritage?
  • What implications are there for the role of heritage and information professions (e.g., libraries, archives and museums).

Culture is a complex and shifty notion that is very hard to operationalize. Workshop participants discussed whether it would be more relevant, in the context of DL development, to replace 'culture' by 'cultures' or even 'identities' (e.g., national, cultural, ethnic, religious, or sexual identities, etc.). There was a call for more research on defining culture in the context of information processing and management, as a means to assess whether and to what extent culture can be considered a design variable in the construction of DLs.

2. Cross-cultural usability of DLs

Cross-cultural usability is concerned with the usability and comprehensibility of user interfaces by users from different national cultural backgrounds. Cross-cultural usability methods are applied in user interface design of software products and websites and they should be considered when creating DL resources. The following questions arise when considering cross-cultural usability issues:

  • To what extent does culture (national, organizational, domain) influence information-seeking behavior?
  • How do other factors (contextual, individual, etc.) mediate culture, and which affect use?
  • What research methods are appropriate to investigate cross-cultural differences? What measures should be used? What would be best the best way to characterize cultural/ethnic belonging of participants?
  • Do cross-cultural studies of behavior lead to new design dimensions and guidelines (i.e., relevance, interactivity, involvement, community, novelty, etc.) for DLs?

One clear outcome of the day-long workshop is the strong need for research on cross-cultural issues in the design and implementation of DLs (e.g., general user interface design dimensions [language, images, content organization, etc.], studies the behavior of culturally diverse users, etc.).

We would like to thank our participants and the workshop program committee. More information is available at <http://www.fis.utoronto.ca/faculty/caidi/JCDL03.html>.


New Preservation Metadata Working Group

Contributed by:
Priscilla Caplan
Assistant Director for Digital Library Services
Florida Center for Library Automation
<pcaplan@ufl.edu>

OCLC and the Research Libraries Group are sponsoring PREMIS, a working group charged with developing recommendations and best practices for implementing preservation metadata in digital preservation systems. Preservation metadata is the information necessary to carry out, document, and evaluate the processes that support the long-term retention and accessibility of digital materials.

PREMIS (PREservation Metadata: Implementation Strategies) will build on work completed in 2002 by an earlier OCLC/RLG-sponsored working group on preservation metadata, which established a metadata framework in the context of the Open Archival Information System model. The new group is charged with:

  • Developing an implementable set of "core" preservation metadata elements, with broad applicability within the digital preservation community
  • Drafting a data dictionary to support the core preservation metadata element set
  • Examining and evaluating alternative strategies for the encoding, storage, and management of preservation metadata within a digital preservation system, as well as for the exchange of preservation metadata between systems
  • Establishing pilot programs for testing the group's recommendations and best practices in a variety of system settings
  • Exploring opportunities for the cooperative creation and sharing of preservation metadata

The effort is intended to help move the community from theoretical frameworks such as OAIS to practical implementation by addressing issues surrounding the actual use of preservation metadata and its application to real digital objects. The group will consider questions such as: How can complex objects be managed both as logical entities and as sets of individual data files? How should metadata be stored in relation to the objects it describes? What are the significant properties of various types of objects and how can they be documented so that these properties are preserved across reformattings? What is the cost of obtaining needed metadata and how much can be provided automatically?

PREMIS membership is divided between a Working Group and an Advisory Committee drawn internationally from the library, academic, museum, arts, government, and commercial communities. Most members are actively working on digital preservation projects. For more information, see <http://www.oclc.org/research/pmwg/>.


MetaTest: Putting Metadata to the Test

Contributed by:
Elizabeth D. Liddy
Director; Professor
Center for Natural Language Processing, School of Information Studies, Syracuse University
Syracuse, New York, USA
<liddy@mailbox.syr.edu>

Syracuse University's Center for Natural Language Processing, with Cornell University's Human-Computer Interaction Group, is in the midst of a two-year evaluation of metadata. Through funding from the National Science Foundation's National STEM Education Digital Library Program, the MetaTest project is evaluating the usefulness of metadata to the user and the quality of automatically generated metadata.

Though much effort has gone into developing and implementing metadata standards within the library community, virtually no evaluative studies have been done. MetaTest is endeavoring to evaluate metadata holistically by looking at the entire cycle of metadata: from creation to information retrieval to the use of metadata by the user.

Using the domain of STEM lesson plans, MetaTest is measuring the degree to which metadata can be generated automatically and is assessing the differences in the retrieval results of queries when the index uses: (1) automatically generated metadata, (2) manually generated metadata, and (3) no metadata but only the free text of the resource. Currently we are designing this Information Retrieval Experiment, which will use the open-source Lucene search engine, the engine that NSDL has recently adopted for its library.

A preliminary comparison of 35 lesson plans, looking at which metadata elements were present in the automatic and manual metadata descriptions, found that the Grade Level and Subject elements were more likely to be missing for the automatic metadata, while the Materials, Relations, & Pedagogy elements were more likely to be found in the automatic metadata than the manual metadata. It would be expected that abstract elements that require human inference and judgments are more likely to be present in manual metadata descriptions. Elements that require more tedious entry, such as Relations and Materials, understandably are more likely to be found in automatic metadata descriptions. The unexpected finding is that Pedagogy, perhaps the most abstract element, was found in more of the automatic metadata descriptions. This may indicate that elements such as Pedagogy that require more understanding of the metadata schema are omitted because they are too taxing for the cataloger to complete. In developing CNLP's natural language system we focused on the Pedagogy element because a prior study had informed us that teachers find Pedagogy quite useful. It is still too early in our study to make broad generalizations as this was a small sample of one particular digital library collection, and we will have more understanding when the project is complete.

In addition to the study on retrieval effectiveness, MetaTest is conducting a Metadata Quality Study starting at the end of July. This study will have STEM educators evaluate how well the automatic and manual metadata represent STEM lesson plans. Through this Web-based study we hope to have geographic diversity in our respondents. If you are a STEM educator and interested in participating in this or other studies, please email us at <metatest@cnlp.net>.

The Human-Computer Interaction Group has completed pre-testing of the User Study in which they observe how actual educators view and use metadata for searching and browsing. By using eye-tracking and post-interviews, they are observing the connection between how users seek informative elements of a Web-based resource and judge whether the resource meets their specific information need. In the upcoming months as more users are studied, the Human-Computer Interaction Group will gain more insight into how educators use metadata.

For more information on the MetaTest project, please visit the MetaTest Project Description.

Acknowledgments

This project is funded by the National Science Foundation under Grant No. 0226312.


Cinema Context Collection (CCC)

Contributed by:
Renze Brandsma
Head Digital Production Centre
University Library, University of Amsterdam
Amsterdam, The Netherlands
<r.brandsma@uva.nl>

The Universiteit van Amsterdam has received funding from the Netherlands Organisation for Scientific Research (NWO) which will allow the university to develop a database documenting the history of film culture in the Netherlands. Dr. Karel Dibbets from the department of Media Studies of the Faculty of Humanities will lead the content side of the project. The database will be made available via the Internet.

The CCC will use open, international standards and will be encoded in XML. To index the data and provide fast and precise information retrieval, a powerful search engine which supports XML element and attribute searching will be used. The technical infrastructure will be realized by the DPC (http://www.uba.uva.nl/dpc). The DPC offers scientists and knowledge organizations support to create, and make available electronic publications and databases. The DPC is a department within the Division of Electronic Services of the University Library of Amsterdam. The Filmmuseum and the Dutch Institute for Image and Sound are also participating in this project.

The database will make it possible for the researcher to conduct qualitative and quantitative research into the distribution and showing of films.

The evidence available for digitization consists of textual data regarding film distribution and exhibition in a number of Dutch towns between 1896 and 1940. The information reveals which films were exhibited where and when. The corpus contains data of almost 65,000 film showings, specifying in which cinemas a certain film has played, or which films were shown in a particular cinema, with rich details about each cinema regarding its history and management, and about every single film, like censorship classification, country of origin, year of production, and its makers. At the moment, the evidence is scattered over many archives and has to be united through digitization. Several sources of data have been selected for digitization. For instance the archive of the "Centrale Commissie voor de Filmkeuring" (National Archives) contains about 60,000 files on all films reviewed by the Dutch board of film censors between 1928 and 1977.

In providing new and rich context data about films, the project has the ambition to become an invaluable supplement to existing digital film archives, catalogues, and filmographies in The Netherlands and abroad. At the same time, however, the provision of context data serves as a bridge between heterogeneous collections and catalogues of artifacts, creating links between hitherto disconnected data and institutions. The system sits like a spider in the center of a cobweb.


Seminar on Application of the Dublin Core Metadata Model in Spain

Contributed by:
Eva Méndez
Library and Information Sciences Department
Carlos III University of Madrid
Getafe (Madrid) Spain
<emendez@bib.uc3m.es>

On 5 June in Madrid, the National Library of Spain hosted a Seminar on the "Application of the Dublin Core Metadata Model in Spain". This important meeting for the Spanish digital information world was co-organized by the Research Institute of Information Science and Information Management "Agustín Millares" of the Carlos III University in Madrid; SEDIC, the Spanish Society of Information Science and Scientific Information; and the National Library of Spain.

The seminar comprised three main sessions (each of which was conceived as an open forum workshop) about metadata and the Dublin Core Metadata Initiative (DCMI). The purpose of the seminar was to articulate the approach toward a Spanish national policy on digital content organization and retrieval. The most relevant session included the DCMI Spanish Mirror presentation and concerned the implementation of Dublin Core in Spain. (The Agustín Millares Institute is the host for the DCMI mirror.) The mirror presentation was inaugurated by the Spanish Ambassador on Special Mission for New Information Technologies, and the three sessions were chaired by the current director of the DCMI, Makx Dekkers.

Seminar attendance numbered 160 people, and many more from digital libraries and the electronic information community in Spain and Latin America were able to view the seminar via Internet broadcast on the UC3M-TV service. The event marked one step forward in the process toward semantic web research and development in the Spanish language.

For more information about the seminar, please see the following web sites:

<http://www.uc3m.es/uc3m/inst/IAM/seminarios.htm>
<http://es.dublincore.org/es/eventos/dcmi-es1>
<http://es.dublincore.org>.


VTT Publications Register is now OAI Compatible

Contributed by:
Timo Hellgren
OAI Project Manager
VTT Information Service
Espoo, Finland
<timo.hellgren@vtt.fi>

The VTT Publications Register, <http://www.otalib.fi/vtt/jure/search.html>, has implemented the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), <http://www.openarchives.org>. The base URL for the repository is <http://cgi.vtt.fi/progs/inf/OAI>.

The public research results of VTT (Technical Research Centre of Finland) are published in various international and Finnish scientific journals and publication series as well as monographs, conference papers, dissertations and in VTT's own publication series (see http://www.vtt.fi/inf/pdf). The publications register contains references to reports, journal articles and conference proceedings authored by the personnel of VTT. Reports published in the publication series of VTT are included from the year 1943. References to other publications are included from 1984. Currently the database contains over 47,000 records and is updated daily.

Supported metadata format is OAI's Dublin Core. Sets are also supported, so harvesting can be limited only to certain publication types or to publications made by certain research institutes within VTT or to fulltext documents. Some of the fulltext documents may have access restrictions, e.g. articles published by commercial publishers.

Anyone harvesting scientific and/or technical literature is encouraged to use this implementation of OAI-PMH. Paper copies of the VTT Publication series can be ordered through the VTT Publications Register (see instructions at http://www.vtt.fi/inf/serviceforms/index.htm).


In the News

Excerpts from Recent Press Releases and Announcements

New Advocacy Pages Are Available on the Digital Preservation Coalition (DPC) Web Site

Announced 8 July 2003 by Neil Beagrie, Secretary, Digital Preservation Coalition:

"A new advocacy area has been added to the DPC public website at:
<http://www.dpconline.org/graphics/advocacy>. "

"At present it concentrates on the Media and PR work of the Coalition but we hope to expand it with other materials in coming months."

"The new section includes a register of press and media coverage obtained by the DPC since launch including some 20 articles in the national and specialist press and TV and radio since August 2002 as a result of this year's work. We will keep this register up to date with new coverage over the next year. The DPC advocacy campaign is phased with the balance of our efforts gradually shifting from general awareness raising to mobilising and supporting specific actions over time."


PRONOM 3 User Requirements

Announced 4 July 2003 by Robert Taylor, Communications Officer, The National Archives, United Kingdom:

"PRONOM is an application developed by the Digital Preservation Department of the UK National Archives for managing information about the file formats used to store electronic records, and the software applications used to create and render those formats. Although originally developed as a practical tool to support our own preservation activities, we are aware that this information will potentially be of great value to others. As the next phase of our ongoing development of PRONOM, we will therefore be developing a web-accessible version of the system. This will allow users to search PRONOM and generate reports for viewing on screen, printing, or exporting in XML and CSV formats."

"In order to maximise the public value of the system, we are actively seeking comments on the user requirements for this phase of the project. The user requirement document is available to view or download (PDF or MS Word 2000 format) from <http://www.pro.gov.uk/about/preservation/digital>. Comments can be submitted by email to <digital-archive@pro.gov.uk>. The deadline for comment is 25th July 2003."

"We plan to launch the web version of PRONOM in the autumn 2003, with information on selected software products. This will quickly be expanded to cover over 300 applications. Further information about PRONOM and its future development is also available at <http://www.pro.gov.uk/about/preservation/digital>."

For further information, please contact <robert.taylor@nationalarchives.gov.uk>.


Managing Change: Public Libraries Get Toolkit to Exploit Online Opportunities

"London, 4 July 2003 — People's Network Change Management Toolkit helps public libraries develop new services for the information age."

"Public libraries are changing — fast. New broadband online facilities, installed in libraries under the People's Network project, can allow them to develop online content and information services for their communities undreamed of only a few years ago. But delivering new services means more than just wiring up the libraries and putting in the computers. It also means helping people change the way they work, deciding priorities for action, and winning over hearts and minds."

"That's why Resource: The Council for Museums, Archives and Libraries, CILIP: the Chartered Institute of Library and Information Professionals and the Networked Services Policy Task Group are launching the People's Network Change Management Toolkit, at the Umbrella conference in Manchester this week. Freely available on the web from Friday 4 July, and developed in consultation with over 30 library authorities, the toolkit will help libraries grapple with the challenges they face as they develop programmes using the People's Network to deliver new online services. Funding for the toolkit has come from the New Opportunities Fund, and it has been developed by the leading consultancy Information Management Associates."

For further information, please see the full press release at <http://www.resource.gov.uk/news/press_article.asp?articleid=575>.


The People's Network: A National Online Library

"London, 3 July 2003 – The People's Network, which links every public library in the country to the internet, has been awarded £500,000 lottery funding for a new phase of its development: as a national online library and a major provider of access to e-learning and e-government services."

"Already, thanks to the People's Network, 30,000 computers with internet access have been installed in over 4,000 public libraries throughout the UK, and every member of public library staff is being trained in ICT skills to give the public help and advice. This £120 million lottery-funded project is already the largest ICT investment of its kind, and the biggest boost ever in the 150-years of the public library service. The project, which is managed by Resource: The Council for Museums, Archives and Libraries, has been delivered on time, and within budget."

"The new £500,000 funding from the New Opportunities Fund will be used to exploit the Network and the new ICT skills of library staff to the maximum, building on their proven strengths to provide new online services such as a national electronic enquiry service and virtual reference shelf, and access to e-government services, e-learning resources, and community information."

For further information, please see the full press release at <http://www.resource.gov.uk/news/press_article.asp?articleid=574>.


Carl Grant Elected NISO Vice-Chair and Pesch, Greenstein, and Ramsey Join the NISO Board

"Bethesda, Md., USA (June 24, 2003) - The Voting Members of the National Information Standards Organization (NISO) have elected Carl Grant Vice Chair/Chair Elect and have elected Daniel Greenstein, Oliver Pesch, and Ed Ramsey to the NISO Board of Directors, NISO's governing body. Carl Grant will serve as Vice Chair/Chair Elect from July 1, 2003 until June 30, 2005 and will be NISO Chair from July 1, 2005 through June 30, 2007. Greenstein, Pesch and Ramsey will each serve three year terms beginning July 1, 2003."

"Carl Grant is President and COO of VTLS, Inc. Grant has worked in libraries, or companies automating libraries, for almost three decades. Grant represents VTLS to the Coalition for Networked Information (CNI); Grant is also active in the American Library Association and has served on the Board of the ALA's Exhibits Round Table (ERT), the Public Library Association (PLA) Exhibitor's Advisory, and the Library and Information Technology Association (LITA). Grant holds an MLS from the University of Missouri at Columbia. He is a frequent speaker at conferences in the U.S. and around the world, and has authored a number of journal articles focusing on technology and library automation. Grant has been a NISO Board member since 2000."

"Daniel Greenstein is University Librarian for Systemwide Library Planning and Scholarly Information at the University of California Office of the President and Executive Director of the California Digital Library. Prior to joining the University of California in May 2002 he was Director of the Digital Library Federation and founding director of two national networked services serving the academic community in the UK. Greenstein is a graduate of the University of Pennsylvania and received his DPhil from Oxford University. He has published numerous books and articles about the development and delivery of networked information resources, digital preservation, and digital libraries and has been active in the development of information standards since the early 1990s."

"Oliver Pesch is Chief Architect and Senior Vice President of EBSCO Publishing and has been designing and developing products for the library market for more than 20 years. As EBSCO's Chief Architect, he has been directly involved in standards-compliant product development and has published and presented on the topics of linking, usage statistics, and MARC records for aggregator databases. Pesch has been active on NISO standards committees since 2001 and now co-chairs NISO's Metasearch Initiative planning committee."

"Ed Ramsey is Director, Corporate Applications at Random House Inc. and is responsible for all of the corporate Electronic Data Interchange (EDI) systems. Ramsey has been with Random House for 21 years; he represents Random House on the Book Industry Study Group (BISG) Board of Directors, and is Chairman of the BISG Book Industry Standards and Communication (BISAC) committee. Ramsey is a member of the U.S. delegation to the International Standards Organization (ISO) Working Group revising the International Standard Book Number (ISBN)."

"Effective July 1, 2003 the officers of the NISO Board are: Chair Jan Peterson, Vice President of Content Development, Infotrieve; Vice Chair/Chair Elect and Treasurer Carl Grant, President, VTLS, Inc.; Immediate Past Chair Beverly P. Lynch, Professor, UCLA Graduate School of Education and Information Studies; and Secretary Patricia Harris, Executive Director, NISO. Other NISO Board directors in addition to the three newly elected members are: Pieter S.H. Bolman, Vice President, Director of STM Relations, Elsevier, Brian Green, Director, Book Industry Communications and Manager EDItEUR; Jose-Marie Griffiths, Doreen E. Boyce Chair and Professor & Director, Sara Fine Institute, University of Pittsburgh; Deborah Loeding, Vice President of Sales and Marketing, H.W. Wilson; Richard E. Luce, Research Library Director, Los Alamos National Laboratory; Sally H. McCallum, Chief, Network Development and MARC Standards Office, Library of Congress; and Patricia Stevens, Director, Cooperative Initiatives, OCLC, Inc."

For further information, please contact Patricia Harris <pharris@niso.org>.


Commerce Secretary Evans Releases Major Report Examining Information Technology Education and Training Landscape in the 21st Century

"June 19, 2003 – Commerce Secretary Don Evans today released "Education and Training for the Information Technology Workforce: Report to Congress from the Secretary of Commerce," which was mandated by the American Competitiveness in the 21st Century Act of 2000."

"The 225-page report provides an extensive exploration of employer demand for information technology (IT) workers, the IT education and training landscape, and the role of employers and workers in IT education and training. The report's executive summary contains dozens of specific findings and five broad findings:

  1. The IT education and training infrastructure has grown significantly in size and scope over the past decade. Today, there is a vast array of IT education and training opportunities, with different types of programs and curricula serving different purposes.
  1. Jobs in the IT field are varied, complex, and specialized, as are the knowledge, skills, and experience required to perform them.
  1. Employers seek workers who possess a specific combination of technical skills and experience, often coupled with a college degree, soft skills, and business or industry knowledge. Typically, employers prefer job candidates with the exact skill set who require no additional training.
  1. There is no single path to prepare a worker for a professional IT job.
  1. The training landscape is complex, rapidly evolving and therefore challenging to navigate."

"...The report is available for download on the website of the Commerce Department's Technology Administration, www.technology.gov/reports. "

For further information, please see the full press release at <http://www.technology.gov/PRel/p_pr030619.htm>.


OpenURL Standard Trial Implementation Launched

"Bethesda, Md., USA (June 18, 2003) - The National Information Standards Organization (NISO) has released The OpenURL Framework for Context-Sensitive Services standard (version 1.0) for a trial use period ending November 1, 2003. The OpenURL standard allows a user who has retrieved an information resource citation to obtain immediate access to the most "appropriate" copy of the full resource through the implementation of extended linking services. The selection of the best source for the full resource is based on the user's and the organization's preferences related to location, cost, contractual or license agreements in place with information suppliers, etc.-all done transparently to the user. The transparency is accomplished by storing context sensitive metadata with the "OpenURL" link from the source citation, and linking it to a "resolver" server where the preference information and links to the source material are stored."

"The initial development of OpenURL was targeted at the electronic delivery of scholarly journal articles. In version 1.0 of the Standard the framework is generalized to enable communities beyond the original audience of scholarly information users to adopt extended linking services and to lower the entry barrier for new implementers."

"An impressive international group of trial users including data providers constructing OpenURL metadata, providers of OpenURL resolvers, and libraries providing end user services using OpenURL resolution are testing the standard. The goal of the trial period is to test the standard's framework using a variety of data sources and resolver services to ensure that users can seamlessly receive and process OpenURLs and to solicit feedback on the proposed standard. Participating in the trial as data providers are:

CABI Publishing (U.K.)
Edinburgh University Data Library (U.K.)
Informatics India Ltd (India)
Grupo Stela (Brazil).
MIMAS, University of Manchester (U.K.)
MuseGlobal, Inc. (U.S.)
ProQuest Information and Learning (U.S.)
RLG-Eureka (U.S.)"

"Participating resolver services include:

ArXiv.org (U.S.)
Auto-Graphics, Inc. (U.S.)
Edinburgh University Data Library (U.K.)
Endeavor Information Systems (U.S.), Ex Libris USA, Inc. (U.S.)
Innovative Interfaces, Inc. (U.S.)
OhioLINK (U.S.)
Potiron Tecnologia para Bibliotecas (Brazil)
ProQuest Information and Learning (U.S.)
Sirsi Corporation (U.S.)
MuseGlobal, Inc. (U.S.)
Openly Informatics (U.S.)
RLG-Eureka (U.S.)
Vrije Universiteit Brussel (Belgium)."

"Libraries providing end-user access include: Université Libre de Bruxelles Libraries (Belgium), RLG-Eureka (U.S.), and The Getty Research Institute Library (U.S.)."

"The Standard has been issued in two parts and is available as a free download at <http://library.caltech.edu/openurl/Public_Comments.htm>. The activities of the OpenURL standards committee and its trial implementers can be followed on the committee's website at <http:// library.caltech.edu/openurl/default.htm> or by subscribing to the committee's listserv by sending an email message to <majordomo@caltech.edu> with "subscribe openurl" (without quotation marks) in the body of the message."

"The trial use period is being coordinated and managed by the California Digital Library. To sign-on as a trial implementor contact Karen Coyle (email: karen.coyle@ucop.edu)."


UK Research, Accessible for Free, for Everyone

UK leads world in publishing revolution to provide open access to scientific research

"17 June 2003 - More than 80,000 biology and medical researchers working at UK universities can now share their research findings freely with fellow researchers, funding bodies, students, journalists, and the general public worldwide. Making the results of science and medical research openly available will aid the global advancement of science and healthcare. Publishing in freely accessible online journals will also make the UK higher education system more cost-effective, by reducing the amount of money spent on journal subscriptions."

"The landmark deal announced today by The Joint Information Systems Committee (JISC), a joint committee of HEFCE and other UK further and higher education funding bodies, and open access publisher BioMed Central places the UK at the forefront of the drive to make scientific research freely available on the Internet. The BioMed Central membership agreement commences on the 1 July. From this date article-processing charges will be waived - for all UK higher education staff — when publishing in any of BioMed Central's 90+ peer-reviewed journals in which all research content is freely accessible...."

"...This is the first step of many that funding bodies are taking to ensure the success of open access. For the academic and clinical research communities working in UK Higher Education institutions, one of the biggest hurdles to publishing in open access journals — cost — has been removed. Funding bodies are now moving to acknowledge that authors who publish in open access journals are providing a service to the scientific community."

"The JISC deal means that 180 universities in the UK will now become BioMed Central members. Together with the recent NHS England membership agreement, the vast majority of research produced in the UK could be published in open access journals at no cost to the individual author."

For further information, please see the full press release at <http://www.biomedcentral.com/info/about/pr-releases?pr=20030617>.


ReadingListDirect & Resource Discovery Network Partnership

"Sentient and the Resource Discovery Network (RDN) have formed a partnership to provide the academic community with the ability to integrate high quality web resources within a searchable digital resource management system to enhance learning and teaching."

"Sentient launched ReadingListDirect (soon to be Sentient DISCOVER), to facilitate resource discovery, resource management and enhance access to learning resources...."

"...The Resource Discovery Network is the UK's free national gateway to Internet resources for the learning, teaching and research community. It is funded by the Joint Information Systems Committee (JISC) with support from the ESRC and AHRB. The network is a collaboration of over seventy educational and research organizations, including the Natural History Museum and the British Library. Unlike search engines, Web sites are carefully selected and verified by specialist information professionals and academics."

"In recognising the value of RDN, which consists of a database of over 70,000 high quality Web resource descriptions, the service can now be searched within the ReadingListDirect interface. With one click, RDN resources can be embedded into and accessed from ReadingListDirect. Instantly, all web resource descriptions and web links are added to resource lists. This enables the student to easily locate and access a greater variety of essential learning materials."

For further information, please contact Cheri McCall, Sentient <Ceri.mccall@sentient.it> or Helen Stokoe <Helen.stokoe@rdn.ac.uk>.


The National Library of Medicine Defines Standard Content Model for Electronic Archiving and Publishing of Journal Articles

"June 10, 2003 (Bethesda, Md.) - The National Library of Medicine (NLM) announces the creation and free availability of a standard model for archiving and exchanging electronically journal articles."

"Since the mid-1990s, scholarly journals have been striving to make their content available on the web for greater distribution, ease of searching and retrieval, or just to have a web presence. 'These electronic files are created to meet the needs of the Internet-usually without much thought given to long-term archiving of the content,' says Dr. David Lipman, Director of the Library's National Center for Biotechnology Information (NCBI). 'Today we release two Document Type Definitions (DTDs) that will simplify journal publishing and increase the accuracy of the archiving and exchange of scholarly journal articles.'"

"NCBI created the Journal Publishing DTD to define a common format for the creation of journal content in XML. The advantages of a common format are portability, reusability, and the creation and use of standard tools. Although the Publishing DTD was created for electronic production, the structures are robust enough to support print publication as well."

"Built using the same set of elements, the Archiving and Interchange DTD also defines journal articles, but it was created to provide a common format in which publishers, aggregators, and archives can exchange journal content."

"These DTDs and the Tagset from which they were created are in the public domain. Complete information and documentation can be found at <http://dtd.nlm.nih.gov>...."

For further information, please see <http://www.nlm.nih.gov/news/press_releases/dtd_ncbi03pr.html>.

 


Copyright 2003 © Corporation for National Research Initiatives

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Conference Report | Clips & Pointers
E-mail the Editor


DOI: 10.1045/july2003-inbrief