Digital Research Library 5/95--PG
URL:http://aultnis.rutgers.edu/texts/DRC.html

July 18, 1995


(Approx. 4700 words including footnotes (which are active in this document); as submitted, after revision, to College & Research Libraries in February, 1995; with further revisions in proof as of 5.12.95. Please quote or cite only from the published version of July, 1995, p. 331-339.)

REQUIREMENTS FOR THE DIGITAL RESEARCH LIBRARY

There is a separate page of References and Notes
http://aultnis.rutgers.edu/texts/drc_fn.html.

Peter S. Graham
Associate University Librarian for
Technical and Networked Information Services
Rutgers University Libraries
169 College Ave.
New Brunswick, NJ 08903
(908) 445-5908; fax (908) 445-5888
psgraham@rci.rutgers.edu (updated 7/26/96)

REQUIREMENTS FOR THE DIGITAL RESEARCH LIBRARY

ABSTRACT

A Digital Research Library is a collection of electronic information organized for use in the long term. To meet user needs, the founders of a DRL must accomplish two general tesks: establishing the repository of electronic scholarly materials, and implementing the tools to use it. More important, long-term commitments are needed if scholarly information is to be available over periods longer than human life: organizational commitments, fiscal commitments and institutional commitments. The establishment of DRLs is directly related to the changes taking place in the library profession.


Acknowledgements

Peter S. Graham is Associate University Librarian for Technical and Networked Information Services at the Rutgers University Libraries, New Brunswick, New Jersey. He wishes to thank Marianne Gaunt and Robert T. Warwick, both of the Rutgers University Libraries, and Czeslaw Jan Grycz of the University of California, for attentive readings of earlier drafts. Preliminary forms of this material were presented at the ALCTS Institute: The Electronic Library (October, 1993) and at a Task Force meeting of the Coalition for Networked Information (November, 1993).


USER REQUIREMENTS FOR THE DIGITAL RESEARCH LIBRARY

What will users require of a digital research library? The answer merges the histories, capabilities and missions of research librarianship and computing science to produce a new service meeting long-defined needs.

The mission of research libraries is to acquire information, organize it, make it available and preserve it. This has been their significant, distinctive and successful role with print and other artifactual materials for the past several hundred years.[1] An implicit mission of computing science has been to make the benefits of technology of use to society at large. Missions, needs and capabilities now come together so that information users can have added assistance in performing research and in assuring the continuity of scholarship, today and in the future. But it will take conscious, planned efforts by both librarianship and computing to make this happen.

Many libraries are now trying to provide the increasing volume of scholarly electronic information to their clienteles. Current information needs are being provided in electronic form with varying success in public, college and research libraries around the country and the world. As yet, however, no research library has taken on the provision, organization and preservation of information with the same long-term commitment we have made for print materials.[2] It is an expensive, uncharted and difficult task.

But until the long-term commitments are undertaken, many currently proposed solutions will have only temporary effects. For example, discussion of cataloging network resources must remain tentative, for until resources being cataloged have a permanent network presence (whether at fixed or virtual locations), the cataloging that points to them must also have an ephemeral quality. (Cataloging for some transitory electronic materials will always be necessary.) Similarly, the expensive products of recent valuable digitizing demonstration projects, from microfilm to digital form and vice versa, will be at risk after only a few years if tools and commitments are not in place for the preservation of what has been achieved.[3]

Most important, the ability of the scholarly community to give serious weight to electronic information depends upon their trust in such information being dependably available, with authenticity and integrity maintained. Looked-for changes in scholarly publishing to help alleviate the serials crisis, for example, are usually thought to be bound up with the prestige of electronic journals in the academic tenure process. The ability of the academy to count on long-term, secure existence of electronic scholarly work will be an important determinant of the success of academic electronic publishing. Libraries and universities have a stake in helping electronic publishing to succeed, and therefore have an interest in establishing secure digital research libraries.

Users' needs will continue to be what they long have been. Users will want information reliably locatable, so that when they go there (whether personally or on the net) they can expect to find what they're looking for. Users will want information easily accessible: the cataloging must be clear and accurate, and the information must be promptly retrievable. In the electronic environment the need for access tools will be more evident, and users will expect appropriate and standard software to be readily available. Users will expect information to be available that was placed in the library's care a long time ago; and they will expect that the integrity of the information they get from the library will be assured.

This article sets out what must be done for a digital research library to be successful in meeting these user needs. The primary requirement for a digital research library (DRL) is that from the start it be committed to organizing, storing and providing electronic information for periods of time longer than human lives.[4] Implementation of a Digital Research Library will require two kinds of tasks (establishing the repository itself and implementing the tools for use with it), and three kinds of new commitments. In what follows the tasks are given the most space, yet as technical problems they probably are the easiest to solve. The institutional commitments described in the final section will be much more difficult to achieve.

All the issues are described here in cursory form. Each could be developed in great detail, but at the moment the outline and overall program are most important. Early implementations will test many of these assumptions and will add more requirements to the list. Work needs to begin.

I. TASKS

THE ELECTRONIC STORAGE REPOSITORY

The Digital Research Library will be manifest to users as collections of information existing in various places (not always evident) and accessible through the use of widely available tools. The locus of information may be called the electronic storage repository; the access tools will be described below.

Over time, we will learn how collection development plays out in an access environment as well as in an ownership environment. It is sometimes loosely proposed (not by librarians) that libraries need not acquire electronic information, for it will be available somewhere on the network. Such proposals ignore the obvious truth that some institution must still, in the end, take responsibility for the information. That has always been a definition of the library responsibility.

There will be many electronic storage repositories, responding both to requirements of redundancy and to the individual needs of institutions. In contrast to print collections, it is unlikely that there will be a high degree of content duplication across many electronic repositories, since for most purposes existence in a single place allows world-wide access. Aside from their actual contents, however, repositories that are part of a DRL will have many common characteristics. Some of these are described here; in some cases, open questions are noted that need to be explored in early implementations.

Megadocument contents: Even an initial repository should comprise many gigabytes of information, growing quickly to millions of electronic documents. The medium itself (disk storage) is cheap and the possible resources are plentiful.

Sources and potential participants: It is easy to cite numbers of electronic scholarly resources that now exist. A few are noted here only as examples:

* Johns Hopkins Medical Library medical image data base and its e-Journal of Medical Imaging;

* Texts maintained by the Center for Electronic Texts in the Humanities at Rutgers/Princeton (e.g. those of the Women Writers Project);

* Texts at the Georgetown electronic text center, such as those of C.S. Peirce, Hegel and Feuerbach, under varying licensing arrangements;

* Survey research data from the Interuniversity Consortium for Political and Social Research (ICPSR);

* Aviador, the Columbia University Libraries architecture image resource;

* Commercial publications, either profit or non-profit (from a university press? publications of a scholarly society, such as IEEE? a partnership with a commercial press?); a repository could be a commercial alternative to local storage or no storage;

* Los Alamos National Laboratories Physics Preprint Data Base;

* National Archives and Record Administration materials;

* e-journals now established on the network, especially if peer reviewed (e.g. Psycoloquy, Bryn Mawr Classical Review, Journal of Fluids Engineering, Modal Analysis, OCLC Journal of Online Critical Trials (with attendant copyright issues), Scientist, Solstice; [5]

* Early network activity as examples of ephemera, e.g. selected alternate (alt.x) newsgroups, information located at temporary ftp sites, samples of early advertisements, etc.;

* Listserv and newsgroup electronic archives;

* Commercial information bases which will not be made widely available, e.g. Biosis Previews or Chadwyck-Healey's English Poetry, where it can be recognized that long-term preservation is necessary even though access might be licensed or otherwise constrained.

All these are only examples. None, of course, should automatically be selected; collection development policies should be adapted and followed. The continuing substantial costs of providing electronic information will require that electronic collection decisions be made even as carefully and parsimoniously as for print.

Backup mechanisms: Backup/restore procedures must be in place and must be automated and economical, for libraries are never likely to have expensive labor available in quantity. Backups must be multi-generational, using remote storage, with regular disaster simulations and tests.

Staged Access: In computing jargon, "staging" refers to the prioritized use of different mechanical methods of storing data as it waits to be recalled. All data does not need to be immediately available on the most expensive and fastest storage media. Alternatives for providing immediate online access to the enormous potential volume of scholarly information need to be provided. What can be off line, and how can it be retrieved? Present alternatives include magnetic disks, optical disks and jukeboxes, optical disks on shelves, magnetic tapes on site, tapes in remote storage, and automated data warehouses of magnetic tapes.

Data structure standards: In a repository, does information simply exist as is (as first created) or is complementary information (metadata) associated with it? Widely differing examples include SGML (Standard Generalized Markup Language) headers, ICPSR codebooks, picture captions, hypertext links and early software versions for use with data files. There is an increasing need to link bit-mapped page images to ASCII text versions of the page contents. If there is an association, is it through use of header portions of a file or through supplemental files? How are they indicated and connected?[6]

Refreshing mechanisms: Refreshing is agreed to be necessary for long-term preservation across advances in computing technology, media and software.[7] There will be organizational and bureaucratic issues in addition to the simply technical. If information is copied from magnetic to optical disk, copyright issues must be recognized. Automation will be necessary to reduce labor costs. Other issues include workflow and record-keeping, migration techniques, and standards and techniques that will apply independently of technology. It may be possible to link refreshment to backup techniques for expedience and economy.

Authentication and Integrity: Intellectual preservation goes beyond preservation of the medium and the technology to assure the protection of the intellectual structure of information as it was recorded by its author.[8] To meet user expectations DRL's must implement authentication and integrity techniques that combine mathematical security with ease of use, public trustworthiness and privacy protection. For example, bit patterns of texts, sound and images may be preserved through cryptographic hashing and encoding methods such as the digital time-stamping technique.[9] Standards and conventions for use and citation will be necessary.

Redundancy: It will be important to establish standards for the number of repository locations necessary to assure long-term existence of specific electronic information and access to it. One location won't do for a particular major electronic document or set; will two, or three? How many? Major institutions may separately or consortially establish repositories. It is not yet clear how much redundancy of their components will be desirable among them.

Aside from assuring longevity, other issues come to bear on decisions to provide multiple permanent copies of electronic information. Geographic location, nationalism and regionalism will still play a role (at least intercontinentally, and probably intracontinentally); so will informed decisions about the dynamic interplay between costs of network bandwidth, response time and costs of storage. Many library consortia will be formed on the basis of joint contracts with information vendors, leading to further redundancy.

ACCESS TOOLS AND POLICIES

Usage and Retrieval Mechanisms: The full panoply of present access tools must be supported by a Digital Research Library (e.g. online catalogs and OPACs, FTP, gopher, World Wide Web and its multiple clients) with provision for the new access tools that are likely to appear regularly. The "granularity" of documents needs to be addressed: how may one retrieve only part of a document (e.g. a chapter of Moby-Dick or of a legal code; or a particular chart from a presentation) when the full document may be of substantial size. Must documents be pre-coded (or pre-marked) to allow such granular access, or can access-time mechanisms be made available?[10]

Techniques for document update and consequent archiving and labeling need to be developed, as well as flags indicating obsolescence or supersession (or conversely indicating status as an authorized version), e.g. for ANSI standards, monthly statistical reports or draft versions. A form of SGML may be appropriate in some cases, for example the format proposed by the TEI (Text Encoding Initiative).[11]

Cataloging: Providing access to voluminous information is an intellectual problem that historically has been solved in the print environment by abstracting and indexing services and by library cataloging, with attendant rules and procedures to insure consistency and accuracy. These tools, adapted to suit new needs, will work for electronic information as well.[12] They should be linked to the new retrieval mechanisms so that users can smoothly navigate from location of information to retrieval of it without having to shift their mode of use. Early mechanisms will probably link catalog records to documents using tools such as the WWW, the Uniform Resource Indicator (and Locator), or URI/URL, and the recently proposed MARC 856 field.[13] SGML may offer other possibilities for linking of certain documents through its document description techniques. In any case, there eventually will need to be consensus both for the representation of physical electronic locations in bibliographic records and for representation of virtual locations.

If the DRL's catalog system works well, users will be able to search for information, locate bibliographic records for desiderata, and use those records directly to draw the desired information to their workstation.[14] Where an authentication technique is used (see above), means for including and testing the certification must be provided. Standards for such cataloging and remote access still need to be developed, particularly for providing catalog access to non-owned materials. The present review of AACR2 Chapter 9 is to be applauded, as is the recent OCLC study on the cataloging of non-book materials.[15]

Remote Access: A DRL should from the outset be intended for access from multiple remote locations. Internet-wide access should generally be possible. In early pilot implementations it may initially be advisable for a few libraries to plan and development catalog and access mechanisms that integrate the individual libraries' collections with that of the DRL. Procedures for dissemination of such catalog records will be needed; it will be not only a technical matter but a policy matter for libraries associated with the DRL to provide non-local access to their local patrons. Presumably the bibliographic utilities will play their accustomed role.

Fees and freedom: In practice these are often linked issues. Standards and techniques will be necessary to solve a knot of interconnected problems surrounding access and ownership, including

* Privacy preservation for users, while also protecting

* Copyright protection for intellectual property holders, while also protecting

* Fair use mechanisms, and also providing

* Fee-charging techniques, including billing, where relevant.

II. COMMITMENTS

Much of what has been described so far is merely technical, and the outlines of solutions are clear even if the details remain to be worked out in practice (set aside here are the non-trivial matters of cost). More difficult will be the social compacts, that is, the agreements on standards, intellectual property and access modes. But most difficult of all to achieve, if electronic preservation and access are to be accomplished on any significant scale, will be the long term commitments to these goals by institutions.[16] Nothing makes clearer that a library is an organization, rather than a building or a collection, than the requirement for institutional commitment if electronic information is to have more than a fleeting existence.

ORGANIZATIONAL COMMITMENT

The organization of libraries is already changing as electronic information increasingly becomes part of their charge. Most research libraries now have substantial systems departments. Some libraries locate the responsibility for electronic information distinctly from that for print. Other libraries see the forms as inseparable and include electronic responsibilities along with artifactual responsibilities in assignments for collection development, cataloging and public service.

What is new will be the permanent assignment of staff responsibility for the long term maintenance of electronic information within a library. There is no obvious artifactual parallel for this responsibility: circulation, stack maintenance, preservation and physical plant departments now share it for print. Nor are there present parallels in academic computing centers, where staffs typically focus on technological advance and availability, leaving data to the users. The electronic preservation responsibility will be focused as it will require technical expertise likely to be located in a single functional area.

It is by no means clear that this functional area will simply be what we used to call the library's systems department. As libraries move more into the electronic environment the historic tripartite division of libraries into public services, technical services and collection development continues but in more fluid arrangements. People who combine bibliographic understanding, problem-solving abilities and process orientation have often been found in technical services as well as elsewhere in libraries. Similar librarians will take on the demanding new technical, collection and service responsibilities for long-term support of digital collections. At the same time, it is becoming clear that the traditional computing community is fertile with ideas, analyses and skills that are important to electronic library goals..[17]

FISCAL COMMITMENT

The permanent existence of a digital research library will require assured continuity in operational funding. Almost any other library activity can survive a funding hiatus of a year or more. Acquisitions, building maintenance, and preservation can be suspended, or an entire staff can be dispersed and a library shut down for several years, and the artifactual collections will more or less survive. But digital collections, like the online catalog, require continual maintenance if they are to survive more than a very brief interruption of power, environmental control, backup, migration and related technical care.

Online catalog maintenance costs have reached a rough steady state, and the capital costs for new OPACs are decreasing relative to the capabilities provided. The catalog size will continue to increase, but catalog records are small relative to the information to which they refer. DRL's, however, as a proportion of the library's supply of information, will grow for the foreseeable future, and the quantity of information requiring care will become considerable (and much larger than the catalog). Unit costs of storage are likely to continue falling for some time, which may make the financial burden manageable. (Staffing costs are not expected to increase, as most libraries now recognize that overall staff growth for any reason will not be allowed for some time; reassignments, however, are likely.)

Long term funding will be required to assure long term care. Libraries and their parent institutions will need to develop new fiscal tools and use familiar fiscal tools for new purposes. Public institutions, usually constrained to annual funding, will have particular difficulties; existing procedures for capital or plant funding may provide precedents. One familiar technique is the endowment. It has been difficult to obtain private funding for endowments of concepts and services rather than books and mortar, but it is possible. Institutions might also build endowments out of operating funds over periods of time.

Some revenue streams associated with Digital Research Libraries may be practical. Consortial arrangements may allow for lease or purchase of shares in a DRL. Shorter-term access might be provided to other institutions on a usage basis. Access could be sold to certain classes of users, e.g. businesses, non-local clienteles, or specific information projects. New relations with publishers, presently difficult to perceive through the mists rising from intellectual property, might result in fee income for storage of electronically published materials during the copyright lifetime in which publishers collect usage fees. With commitment and imagination long term fiscal tools will be found.

INSTITUTIONAL COMMITMENT

All these are instrumental means of accomplishing the greatest requirement, that of conscious, planned institutional commitment to preserve that part of human culture which will flower in electronic form. While museums preserve artifacts, often beautiful, that embody information, libraries preserve information that -- until now -- has been embedded in artifacts (only occasionally of aesthetic interest in themselves). The advent of electronic information will accentuate the difference between these roles as libraries take the responsibility for the preservation of information in non-artifactual forms.

For the past century most research libraries have been associated with universities, and this connection seems likely to continue in the immediate future.[18] Whatever the governance structure, an institution wishing to benefit from electronic information will have to make a conscious commitment to providing continuing resources. Michael Buckland, of the University of California at Berkeley, has distinguished between a library's role and its mission. Where the role of a library is to facilitate access to information, its mission is to support the mission of its parent institution.[19] One can extend this to understand that if a university wishes to continue gaining support for its mission from its library, it will have to make commitments to the library's role. In the electronic environment, this means new longstanding financial commitments which the library and university together must identify and gain.

The commitment will have to be clearly and publicly made if scholars and other libraries are to have confidence that a given DRL is indeed likely to exist for the long term. It will probably be desirable for guidelines or standards to be established defining what is meant by a long term commitment, and defining what electronic repositories of data can qualify to be called a digital research library. Just as donors of books, manuscripts and archives look for demonstration of long term care and commitment, so too will scholars and publishers as electronic information is created and requires a home.

CONCLUSION

Establishing a Digital Research Library continues the research library role. To do so should be considered as natural as acquiring the next book or cataloging the next journal. Not to do so would be an abdication of that role. The tasks call not so much on new knowledge nor on new techniques, but upon informed commitment; that is, upon will. For librarians wondering what is to come of their profession in the electronic age, here is their challenge.


Peter Graham, Rutgers University Libraries, 169 College Avenue, New Brunswick, NJ 08903; (908) 445-5908; fax (908) 445-5888; e-mail to psgraham@rci.rutgers.edu

Back to PG's Home Page - http://aultnis.rutgers.edu/pghome.html.