Cornell University:
The Networked Computer Science Technical Reference Library (NCSTRL)
The NCSTRL architecture
The Networked Computer Science Technical Reference Library (NCSTRL -- pronounced "ancestral") is an international collection of computer science research reports made available for non-commercial use from over 100 participating organizations worldwide. The organizations that participate in NCSTRL include Ph.D. granting computer science departments, research laboratories, ePrint repositories, and electronic journals. The documents in NCSTRL are almost all textual, ranging in size from 100-plus page doctoral dissertations to short technical reports.
NCSTRL is a federated digital library; content and services that provide access to that content are distributed across North America, Europe, and Asia. The technical foundation for NCSTRL is Dienst; an architecture and protocol for distributed digital libraries. The Dienst architecture consists of a number of key features:
- Defined Digital Library Services
- Dienst divides total digital library functionality into a set of logical services. The Repository Service stores documents (defined by the Dienst document model, described below) and provides access to them. The Index Service process queries on a set of documents. The name service provides unique naming and name resolution for documents. The collection service provides the metadata necessary for the cooperation of a set of services into a uniform digital library collection. The user interface service provides a human-friendly gateway to the remainder of the services.
- Open Protocol
- All of these services communicate with each other via the Dienst protocol. Each verb in the Dienst protocol is associated with one of the services as defined above. For example, there are a set of repository verbs such as List-Contents, which returns the identifiers (handles) of the documents in a repository, Structure, which returns structural information about an individual document, and Disseminate, which returns the requested dissemination of a document. The use of an open protocol and its exposure of the individual services permit external clients and services to flexibly exploit the full functionality of a Dienst-based digital library. For example, one could combine a set of Dienst repositories with a "home-grown" search service, or construct a new user interface front end that gives a different view of a set of Dienst services. The interaction of Dienst services through the protocol is illustrated in the following figure.
- Structured Document Model
- Documents in Dienst are structured objects and can be disseminated in a variety of ways. This structure is realized in the Dienst protocol as follows. The Structure verb, issued against a specific document, returns information about the logical components of the document. While the nature of components is fully extensible, in the current implementation documents are structured in the form of document content and document metadata, physical pages, and logical partitions (e.g., chapters, tables, etc.) The Disseminate verb then processes requests based on this document structure; for example, a client may request a specific page of a document, or the descriptive metadata for a document.
- Distributed Collection Model
- The Dienst collection service makes it possible to organize a set of Dienst services in a variety of configurations. For example, it is possible to create separate sub-collections (consisting of only a few publishing organizations) and to configure the use of servers to match network connectivity characteristics.
The collections
As of January 1999, the NCSTRL collection consists of over 30,000 documents. Because of the flexibility of the Dienst document model, the items in the NCSTRL collection are available in a variety of formats. Some documents, for example, are available as either scanned page images (TIFFs), browser viewable images (GIFs), and printable PostScript.
User interfaces
As supported by the Dienst architecture, there are a number of user interface gateways to the NCSTRL collection. However, most users access the collection through the main gateway at http://www.ncstrl.org/. As shown in the following figure, the main page in this user interface allows searches of the collection and provides access to other functions in NCSTRL including browsing and subscribing (submitting persistent queries to the collection).
As shown in the following figure, the selection of a specific document (from, for example, the list of results returned by a query) in the collection provides access to bibliographic information and to the content of the document in a variety of formats. The example document is available as screen viewable pages, screen browseable thumbnails, and as downloadable PostScript.
The following screen images shows the result of a user pressing the "Browse Document" button for the above document. As shown the user sees a set of thumbnail images of the document pages and can then choose to view a specific page.
Research opportunities
The open architecture on which the NCSTRL collection is built makes it possible to use it as a laboratory and testbed for a variety of digital library research activities, e.g.;
- Distributed searching. Investigating methods for (optimally) routing queries among multiple search sites.
- Distributed systems reliability. Investigating techniques for ensuring the availability of loosely federated distributed systems.
- Development of value added services. Including payment, certification, summarization, and the like.
- Delivery of other document genre. Including software, data, and multimedia documents.
- Construction of alternate user interfaces. Including the investigation of multi-linguality issues in digital libraries.
Further information
For general information about NCSTRL, see the web site: http://www.ncstrl.org/ or send email to help@ncstrl.org.
Researchers with serious interests in using the testbed, should contact: Naomi Dushay, naomi@cs.cornell.edu.
[ Testbeds ]
Copyright © 1999 Carl Lagoze