D-Lib -- In Brief (December 2001)

| | | |

D-Lib Magazine
December 2001

Volume 7 Number 12

ISSN 1082-9873

In Brief

RQL: A Declarative Query Language for RDF

Contributed by:
Greg Karvounarakis, Vassilis Christophides, and Dimitris Plexousakis
Institute of Computer Science, FORTH
Vassilika Vouton, P.O.Box 1385, GR 711 10
Heraklion, Greece
<gregkar, christop, dp @ics.forth.gr>

In the next evolution step of the Web, termed the Semantic Web, vast amounts of information resources (i.e., data, documents, programs) will be made available along with various kinds of descriptive information (i.e., metadata). This evolution opens new perspectives for Digital Libraries (DLs). Community Web Portals, E-Marketplaces, etc. can be viewed as the next generation of DLs in the Semantic Web era. Better knowledge about the meaning, usage, accessibility or quality of web resources will considerably facilitate automated processing of available Web content/services. The Resource Description Framework (RDF), <http://www.w3.org/TR/REC-rdf-syntax>, enables the creation and exchange of resource metadata as any other Web data. To interpret metadata within or across user communities, RDF allows the definition of appropriate schema vocabularies (RDFS), <http://www.w3.org/TR/rdf-schema> . The most distinctive feature of the RDF model is its ability to superimpose several descriptions for the same Web resources in a variety of application contexts (e.g., advertisements, recommendations, copyrights, content ratings, push channels, etc.), using different DL schemas (many of which are already expressed in RDF/RDFS. See <http://139.91.183.30:9090/RDF/Examples.html>. Yet, declarative languages for smoothly querying both RDF resource descriptions and related schemas, are still missing.

This ability is particularly useful for next generation DLs that require the management of voluminous RDF description bases, and can provide the foundation for semantic interoperability between DLs. For instance, in knowledge-intensive Web Portals, various information resources such as sites, articles, etc. are aggregated and classified under large hierarchies of thematic categories or topics. These descriptions are exploited by push channels aiming at personalizing Portal access (e.g., on a specific theme), using standards like the RDF Site Summary <http://purl.org/rss/1.0/spec>. Furthermore, the entire catalog of Portals can be exported in RDF, as in the case of Open Directory <http://dmoz.org> . The same is true for DL services (e.g., notification/filtering, annotation/personalization, recommendation based on user/group profiles). There is an ongoing effort to express service descriptions and schemas in RDF (e.g., see the RDF version of WSDL <http://www106.ibm.com/developerworks/library/ws-rdf>) to benefit from existing RDF support (e.g., query engines) in service matchmaking (i.e., matching service offers with service requests).

Motivated by the above issues, we have designed RQL, a declarative query language for RDF descriptions and schemas. RQL is a typed language, following a functional approach (as in ODMG OQL or W3C XQuery). RQL relies on a formal graph model (as opposed to other triple-based RDF query languages) that captures the RDF modeling primitives and permits the interpretation of superimposed resource descriptions by means of one or more schemas. The novelty of RQL lies in its ability to smoothly combine schema and data querying. In this way, DL applications have to specify in a high-level language only the schema and/or data resources needed for access (e.g., to implement browsing, personnalization, etc.), leaving the task of determining how to efficiently store or access their descriptions to the underlying RDF database engine. More precisely, RQL has been implemented on top of a persistent RDF Store that exploits, as much as possible, available RDF schema information in order to efficiently load and query resource descriptions in an Object-Relational DBMS (SQL3), whilst it preserves the flexibility of RDF in refining schemas and enriching descriptions at any time. The results of our experiments (using the Open Directory catalog) illustrate that our approach yields considerable performance gains in query processing and storage volumes as compared to other triple-based RDF Stores.

RDF Suite has been partially supported by EU projects C-Web (see <http://cweb.inria.fr> [IST-1999-13479]) and MesMuses (see <http://cweb.inria.fr/Projects/Mesmuses> [IST-2001- 26074]). For more information, related publications, downloads and the online RQL demo, see <http://139.91.183.30:9090/RDF/RQL/> .

DP9 Service Provider for Web Crawlers

Contributed by:
Xiaoming Liu
Computer Science Department
Old Dominion University
Norfolk, Virginia, USA
<liu_x@cs.odu.edu>

The Open Archive Initiative (OAI) team (K. Maly, M. Zubair, M. Nelson, X. Liu) of the Old Dominion University (ODU) Digital Library group <http://dlib.cs.odu.edu>. announces DP9 <http://dlib.cs.odu.edu/dp9> -- a new OAI service provider for web crawlers. DP9 is an open source gateway service that allows general search engines, (e.g., Google, Inktomi, etc.) to index OAI-compliant archives. DP9 does this by providing a persistent URL for repository records and converting this to an OAI query against the appropriate repository when the URL is requested. This allows search engines that do not support the OAI protocol to index the "deep web" contained within OAI-compliant repositories.

Indexing OAI collections via an Internet search engine is difficult because web crawlers cannot access the full contents of an archive, are unaware of OAI, and cannot handle XML content very well. DP9 solves these problems by defining persistent URLs for all OAI records and dynamically creating a series of HTML pages according to a crawler's requests. DP9 provides an entry page, and if a web crawler finds this entry page, the crawler can follow the links on this page and index all records in an OAI data provider. DP9 also supports a simple name resolution service: given an OAI Identifier, it responds with an HTML page, a raw XML file, or forwards the request to the appropriate OAI data provider.

DP9 consists of three main components: a URL wrapper, an OAI handler and an XSLT processor. The URL wrapper accepts the persistent URL and calls the internal JSP/Servlet applications. The OAI handler issues OAI requests on behalf of a web crawler. The XSLT processor transforms the XML content returned by the OAI archive to an HTML format suitable for a web crawler. XSLT allows DP9 to support any XML metadata format simply by adding an XSL file. DP9 is based on Tomcat/Xalan/Xtag technology from Apache.

DP9 is a gateway service; it does not cache the OAI records and only forwards any request to a corresponding OAI data provider. Its quality of service is highly dependent on the availability of OAI data providers. DP9 now uses the data providers list from the OAI website <http://www.openarchives.org/Register/ListFriends.pl>.

The DP9 code is available for installation by any interested OAI-compliant repository. This work was supported by the ODU Digital Library group and the LANL Research Library <http://lib-www.lanl.gov>.

RDN-include takes the RDN on to University and College Web Sites

Contributed by:
Dr. Philip Pothen
RDN Communications Manager
JISC DNER Office
King's College London
London, United Kingdom
<philip.pothen@KCL.AC.UK>

The Resource Discovery Network (RDN) at <http://www.rdn.ac.uk> has launched a new service called RDN-include (RDN-i). This allows ResourceFinder, the RDN search engine, to be added free of charge to higher and further education institution Web sites.

The great value of this new service is that students and staff can now use the RDN search facilities to discover high-quality Web resources while remaining within the familiar look-and-feel of their university or college Web site.

Over the last few years, the RDN has built a strong reputation within further and higher education through the quality of its work in discovering, cataloguing and making available high-quality Web resources for the benefit of learning and teaching. RDN currently provides access to approximately 35,000 Web resources, all of which are hand-picked by subject specialists, are quality-assured and are freely available to everyone within further and higher education in the UK. Research and experience suggest that this service is needed more than ever as students and staff become increasingly familiar with information technology, and the Internet and how it can be used to support their work.

The RDN has endeavoured to make the service as simple as possible to install and configure. There are two versions available: one which uses a cgi script; and an even simpler JavaScript-based version, which simply requires Webmasters to paste a few lines of code into their pages. We are also able to offer versions of RDN-i that only search results from selected hubs. In addition, we plan to further develop ways in which the Webmaster can configure the service to suit local needs using RDN-i (e.g., the ability to search EEVL and Psigate data alone).

RDN-include is an important development. It will bring the RDN and its services to educational institutions at their own sites, enabling users to access quality-assured resources within a Web environment familiar to them. Manchester Metropolitan University currently uses RDN-include (see <http://www.mmu.ac.uk/services/library/rdni/rdnisearch.html>) on its institutional Web site, and the university's staff and students have found RDN-include of great benefit in accessing high-quality Web resources.

Over the next few months, the RDN will be making more of its content available for inclusion on UK university and college Web sites. First and foremost in our plans is the Behind the Headlines service. Behind the Headlines offers easy access to background information on the latest news headlines via preset searches of RDN data. The service was developed following analysis of RDN search logs from the height of the Foot-and-mouth crisis, which showed large numbers of users looking for related current affairs information. More recent topics featured in the service have included Osama Bin Laden, deep vein thrombosis, and assisted suicide. Behind the Headlines has proved very popular, particularly in the further education sector, and we are keen to make it available through institutional Web sites as well as on the RDN's own pages.

Further details about RDN-include are available from:
<http://www.rdn.ac.uk/rdn-i/>
Behind the Headlines can be viewed from the RDN home page:
<http://www.rdn.ac.uk/rdn-i>

Or contact:
Dr. Philip Pothen
RDN Communications Manager
<rdn-support@rdn.ac.uk>

The EDC Gender and Science Digital Library Project

Contributed by:
Sarita Nair
Project Director
Education Development Center, Inc.
Newton, Massachusetts, USA
<snair@edc.org>

The Gender & Diversities Institute at the Education Development Center, Inc. (EDC), is a global institute dedicated to improving the well-being of individuals and communities, especially women and girls, through innovative, gender-healthy approaches to life-long learning. The institute serves as a global center for innovation and exploration by individuals and projects, generating, collecting, synthesizing, disseminating and advancing knowledge with the goal of building practices internationally that empower individuals and communities through a more inclusive and expansive education for the 21st century.

The EDC Gender and Science Digital Library (GSDL) is a collaborative project between the Gender & Diversities Institute and the Eisenhower National Clearinghouse (ENC) at Ohio State University. The project is funded by the National Science Foundation as part of their National Science Digital Library initiative. The GSDL has two primary purposes: first, it will assist educators and researchers in promoting and implementing gender-equitable STEM (Science, Technology, Engineering, and Mathematics) education in both formal and informal settings, to both male and female students, and to assist in increasing female involvement in the sciences; and second, it will provide resources to researchers and others working to understand the link between gender and science, including how gender influences the development of science and the role of women within science. Specifically, the GSDL will:

Identify, evaluate, and classify new and existing materials that apply to the broad spectrum of gender and science (e.g., books, research publications, web links, video, curricula, etc.)

Create a user-friendly library interface, an efficient usage and submission protocol, and the accompanying support systems

Ensure the sustainability of the library, and its fit within the federated National Science, Technology, Engineering, and Mathematics Education Digital Library (NSDL)

During the initial stage of development, the GSDL will develop a collection structure that can be expanded over time, starting with K-16 information. Focussing on the sciences as the core, we will develop a number of related categories such as: gender-fair science curriculum, teacher guides for integrating gender equitable instruction into existing curricula, resources on women in science, and strategies to bridge gender and racial divides in the sciences.

The expected significance of the GSDL project is manifold. It will provide a dissemination outlet for key, gender-fair and inclusive, science resources, while linking science educators, students, researchers and gender-equity specialists to mentors, experts and others in cross-disciplinary examinations of gender and science. As part of the federated system, the GSDL will provide links to a variety of discipline-specific online libraries, thereby providing a gender framework to their materials.

At this time, the GSDL project is actively seeking focus group feedback on a variety of topics including how users interact with digital resources, how searches might be conceived, and how to best provide guidance and materials to different audiences. In addition, we are seeking content submissions and accepting applicants for peer reviewers. If you would like to contribute to our research in any of these areas, or for additional information about the GSDL, please contact:

Katherine Hanson, PI
EDC, 55 Chapel Street, Newton, MA 02458
Email: khanson@edc.org>
<http://www.edc.org/GDI/>

Sarita Nair, Project Director
EDC, 55 Chapel Street, Newton, MA 02458
617-618-2357
Email: <snair@edc.org>
<http://www.edc.org/GDI/GSDL.htm> .

The CEDARS Project Website

Contributed by:
Maggie Jones
Cedars Project Manager
The University of Leeds
Leeds, United Kingdom
<libmjj@leeds.ac.uk>

The Cedars project is a JISC-funded project on digital preservation, which began in April 1998 and will end in March 2002. The first three years of the project focused primarily on the challenging task of conducting research into the technological, organizational, and legal issues confronting research libraries as they move forward to create and deliver digital resources. For the final year of the project, the primary focus of the project has been on documentation and dissemination, as opposed to new research.

The Cedars website, <http://www.leeds.ac.uk/cedars/>, had evolved primarily as a tool to facilitate communication between the Cedars partners, who were initially based at three different sites (Leeds, Oxford, and Cambridge). With the focus in the final year on dissemination of project findings, a high priority was redesigning the Cedars website to transfer it from a project communication tool to a tool which would transmit information about digital preservation, in general, and the Cedars project, in particular, to a much wider audience.

This proved a more difficult exercise than we had initially envisaged. We had certain criteria in mind, for example we wanted the home page to fit onto a single screen (never quite managed it!). We wanted a simple design both aesthetically and also from the perspective of ease and speed of use and ongoing maintenance, bearing in mind the project ceases in March 2002. We also wanted it to be easy to navigate, whether the primary purpose for visiting the site was curiosity about the project itself, a desire to find information of relevance for specific aspects of digital preservation, or for the project management to locate any administrative information they required.

In the end, we settled on four color-coded categories, three of which indicate the primary research interests of the project -- and also likely to be of interest to a wide audience -- and one linked to conferences and articles prepared by project staff. The four key color-coded categories are:

Publications and Conference, <http://www.leeds.ac.uk/cedars/pubconf/pubconf.html>
Collection Management, <http://www.leeds.ac.uk/cedars/colman/colman.html>
Technical Strategies, <http://www.leeds.ac.uk/cedars/technical/technical.html>
Distributed Archiving Prototype, <http://www.leeds.ac.uk/cedars/archive/archive.html>

Hard copy publications, which will be produced by the end of the project, will match (as closely as possible) these colors.

Library Research Seminar II: A Research Conference for Scholars, Practitioners, and Students

Contributed by:
Bill Edgar
Assistant Professor, School of Information Resources and Library Science
University of Arizona
Tucson, Arizona, USA
<bedgar@u.arizona.edu>

On November 2 - 3, 2001, the Library Research Seminar II convened at the University of Maryland at College Park. Attended by approximately 100 people, this research conference attracted scholars, practitioners, and students of Library and Information Science from throughout the United States and several foreign nations. Chaired by Dr. Lynn Westbrook of Texas Women's University, this conference had a diverse set of sponsors, including SIRSI, EBSCO, ISI, Sage Publications, Emerald Corporation, ALISE, the Friends of the University Libraries, IMLS, Beta Phi Mu, the Library History Round Table, and the Library Research Round Table.

The conference provided participants with several venues for the presentation of ideas: juried papers, invited speakers, panels and discussions, and advisory workshops. In the 25 juried papers, chosen from over 50 submissions, all four of the major categories of libraries -- academic, public, school, and special -- were addressed. Also addressed in the presented research were several of the major library functions, including reference services, collection development, online searching, children's services, and library administration. However, in response to the organizers' intention for intellectual diversity, other information-related topics were also addressed, including information-seeking behavior, classroom instruction, digital libraries, and the relationship between power and knowledge. In addition, a wide variety of methods were used in the juried papers including: case studies, oral history, content analysis, multiple regression analysis, field observation, focus groups, interviews, survey research, and the Delphi method.

The conference featured three excellent invited presentations. Phyllis Dain and Kathleen M. Molz of Columbia University addressed research done on U.S. public library history. A major scholar of qualitative research methods, Yvonna Lincoln from Texas A&M University, reviewed some insights these methods can provide for library services. Ben Schneiderman, from the University of Maryland, presented an overview of a current project involving user visualization of digital library collections. Panel discussions were held on interdisciplinary information needs, the need for interdisciplinary work connecting Library and Information Science (LIS) to other fields of knowledge, ethnographic approaches to LIS research, and the details of getting published. Finally, Martha Crawley of the Institute for Museum and Library Services (IMLS) presented a workshop reviewing that agency's procedures for awarding grants.

Library Research Seminar II's stated goals were the following: to facilitate research-based knowledge for the profession, to encourage interdisciplinary and international research, to promote networking among practitioners and professionals, to showcase the work of doctoral students, and to encourage new methodological approaches. As the current chair of the American Library Association's Library Research Round Table, Dr. Tom Nisonger, said in his closing remarks at the conference: "I would say that the Library Research Seminar II was a success because its stated objectives were admirably achieved."

The successor to this conference, Library Research Seminar III, is tentatively scheduled for 2006 at a location to be determined. Further information on Library Research Seminar II can be found at the conference web site at <http://www.dpo.uab.edu/~folive/LRSII>.

Note: Special thanks for help in preparing this article goes to Dr. Tom Nisonger for his closing remarks at the conference.

In the News

Recent Press Releases and Announcements

In Memory of Robert France

D-Lib Magazine has learned that Robert Karl France, 49, digital library researcher and co-author of a recent D-Lib article, died on November 29, 2001. He worked for many years at Virginia Tech as a computer programmer and research associate. He was chief architect and developer of the MARIAN search system and worked most recently with the Digital Library Research Laboratory. In Robert's memory, contributions can be made to Oberlin College (Oberlin, Ohio 44074) or to the Roanoke Valley Preservation Foundation (P.O. Box 1366, Roanoke, VA 24027).

Feasibility of an Archival Article DTD

(December 10, 2001, announcement from Marilyn Geller, Project Manager, Harvard University.) In the Fall of 2001, under the auspices of a Mellon Grant to explore ejournal archiving, Harvard University Library contracted with Inera, Inc. to review a variety of DTDs from selected publishers. The study focused on two key questions: Can a common DTD be designed and developed into which publishers’ proprietary SGML files can be transformed to meet the requirements of an archiving institution? If such a structure can be developed, what are the issues that will be encountered when transforming publishers’ SGML files into the archive structure for deposit into the archive? The requirement of the archival article DTD was defined as ability to represent the intellectual content of journal articles. This study is now freely available at: <http://www.diglib.org/preserve/hadtdfs.pdf>.

The following ten publishers and hosting services contributed their DTDs, documentation and samples for use in this study:

American Institute of Physics
BioOne
Blackwell Science
Elsevier Science
Highwire Press
Institute of Electrical and Electronics Engineers
Nature
Pubmed Central
University of Chicago Press
John Wiley & Sons

All interested parties are encouraged to read and comment on this study. Comments may be sent to Marilyn Geller, Project Manager <marilyn.geller@mindspring.com> and Bruce Rosenblum, primary author <bruce@inera.com>.

JISC Information Environment Development Strategy 2001-2005: Draft now available

(December 10, 2001, announcement from Catherine Grout, Assistant Director, Development, JISC/DNER Office.) Access to high quality on-line information and learning resources is now essential to all engaged in education, whether as students, teachers, or researchers. But finding and using the right resources is not easy - for example, exploiting multimedia materials can be particularly demanding. It is clear also that without the enabling role of a managed environment, information seekers cannot always be confident about the quality of the information they encounter. The development of a coherent information environment is an important means of helping users to maximise the value of the Internet, by making best use of its bewildering profusion of information resources.

Building on existing services and technology applications, the JISC Information Environment will provide tools to help users to access quality resources:

*enable meaningful links between on-line information and learning resources;
*ease access to restricted resources while maximising security;
*download information and learning resources for incorporation into broader learning materials, essays, and research papers, while protecting copyright and other forms of IPR;
*enable holders of on-line resources to make them more widely available and demonstrate their applicability to others.

A copy of this document is now available on the JISC website at <http://www.jisc.ac.uk/dner/development/IEstrategy.html>.

Comments are welcomed and should be sent to:
<information.environment@kcl.ac.uk> or
<catherine.grout@kcl.ac.uk>.

DLF Members Renew Support

"November 30, 2001, Washington, D.C.— The Steering Committee of the Digital Library Federation (DLF) voted unanimously to continue supporting the organization for five more years, citing the organization's "significant positive impact on digital library development." DLF Chairperson Nancy Eaton announced the decision following the DLF Steering Committee meeting in Washington, D.C., on November 14."

"The decision followed a five-year review of the organization, which had been mandated when the DLF was formed in 1995. In June 2001, the DLF Steering Committee approved the creation of a DLF Review Panel to evaluate the progress of the DLF in achieving its goals and to consider DLF's future."

For the full news release, see <http://www.clir.org/pubs/press/dlfrenew.html>.

OCLC Makes Offer to Purchase Assets of netLibrary

"DUBLIN, Ohio, November 15, 2001—Subject to the approval of the bankruptcy court, OCLC Online Computer Library Center announces that it has made an offer to purchase substantially all the assets of netLibrary and assume certain netLibrary liabilities. netLibrary is a leading provider of eBooks, eTextbooks and Internet-based content/collection management services."

"Concurrently, netLibrary announced that it has voluntarily filed a petition with the U.S. Bankruptcy Court for the District of Colorado for relief under Chapter 11 of the U.S. Bankruptcy Code. The transaction includes a loan from OCLC, to be repaid upon the consummation of the asset sale, to fund netLibrary's on-going operations through the transition period. OCLC's purchase of netLibrary's assets and its operating-funds loan to netLibrary are both subject to approval of the bankruptcy court."

For more information, see the full press release at <http://www.oclc.org/oclc/press/20011115.shtm>.

VTLS Inc. and AMICO Sign Distribution Agreement

"November 15, 2001, AMICO Headquarters; Pittsburgh, PA - The Art Museum Image Consortium (AMICO) and VTLS Inc. have signed a distribution agreement to deliver The AMICO Library™. VTLS Inc., a leader in client-server integrated library automation solutions, is the latest in a series of providers, announced in recent months, making The AMICO Library available for a variety of user types, from small art institutes to large public library systems, K-12 schools to state universities, at reasonable rates with different functional and interface flexibility. The objective is to make The AMICO Library widely available and provide users with a choice of service providers so they may select one that particularly suits their unique needs."

"To begin distribution of their presentation of The AMICO Library VTLS has agreed to be a distributor option for the cooperative purchasing program newly available to Nylink members. Nylink is a not-for-profit membership organization providing services to libraries throughout New York State and beyond. Services include training, support and consulting services for libraries on a broad range of technology-related issues and facilitating group purchasing opportunities to help libraries obtain cost-effective access to electronic information. A preview trial period will be available to Nylink members from Nov. 15 to Dec. 31, 2001."

The full press release is at <http://www.amico.org/docs/press/pr.011115.VTLS.html>.

DOI: 10.1045/december2001-inbrief

D-Lib Magazine December 2001

Volume 7 Number 12ISSN 1082-9873