CPA/Lynn/Structured Glossary of Technical Terms/Aug 1990 THE...

CPA/Lynn/Structured Glossary of Technical Terms/Aug 1990

                           THE COMMISSION ON



                        PRESERVATION AND ACCESS



                                 REPORT



                   PRESERVATION AND ACCESS TECHNOLOGY



               THE RELATIONSHIP BETWEEN DIGITAL AND OTHER

                      MEDIA CONVERSION PROCESSES:



                A STRUCTURED GLOSSARY OF TECHNICAL TERMS



                                   by

                             M. STUART LYNN

                                  and

              The Technology Assessment Advisory Committee

              to the Commission on Preservation and Access



                              August 1990



    [Note for electronic version:  The CPA address as given in the

    printed text (1785 Massachusetts Avenue, N.W., Suite 313,

    Washington, D.C. 20036-2117 (202) 483-7474) is obsolute.  The

    current address is



        1400 16th Street, NW, Suite 740

        Washington, DC 20036-2217

    ]



The Commission on Preservation and Access was established in 1986 to

foster and support collaboration among libraries and allied

organizations in order to ensure the preservation of the published and

documentary record in all formats and to provide enhanced access to

scholarly information.



<newpage id=verso-front-cover>



    [The following paragraph applies to the printed text and is

    reproduced in the electronic version for documentary purposes only



    This document was printed by Cornell University with the same

    technology that is being used for printing books in connection with

    Cornell's digital book preservation study. This study is being

    supported in part by the Commission on Preservation and Access and

    in part by the Xerox Corporation. Since this printing technology is

    restricted to black-and-white, an exception has been made to the

    Commission's normal practice of using blue ink.]





                              Published by

               The Commission on Preservation and Access

                1785 Massachusetts Avenue, NW, Suite 313

                          Washington, DC 20036



                              August, 1990



Additional copies available for $5.00 from the above address. Order must

be prepaid, with check made payable to "The Commission on Preservation

and Access." Payments must be in U.S. funds. Please do not send cash.



                 This publication has been submitted to

            the ERIC Clearinghouse on Information Resources.



    [The following paragraph applies to the printed text and is

    reproduced in the electronic version for documentary purposes only



<phi> The paper used in this publication meets the minimum

    requirements of the American National Standard for Information

    Sciences -- Permanence of Paper for Printed Library Materials

    ANSI Z39.48-1984]

</phi>

 

Copyright 1990 by the Commission on Preservation and Access. Copying

without fee is permitted provided that copies are not made or

distributed for direct commercial advantage and credit to the source is

given. Abstracting with credit is permitted. To copy otherwise, or to

republish, requires a fee and specific permission.



<newpage id=i>



                           COMMITTEE PREFACE



In 1989, the Technology Assessment Advisory Committee (TAAC) of the

Commission on Preservation and Access was asked by the Commission to

consider the potentials of various new technologies for the capture of

printed and other information now at risk, and the storage and retrieval

of preserved materials. This report is one in a series alerting the

Commission and others to developments and possibilities within the

context of national and international initiatives for preservation of

and access to information printed on disintegrated paper and other

substrates. During its first meetings, the Committee found the need for

a framework within which to discuss the use of emerging technologies for

preservation purposes -- a framework that could also be shared with

professionals working in the preservation and related fields.



The resulting "structured glossary", which represents the views and

thinking of the full TAAC membership, was principally authored by M.

Stuart Lynn with assistance from colleagues in the libraries and

information technologies divisions at Cornell University. This paper has

also been subjected to a pre-publication review by selected members of

the library and information technologies professions at large. The

Committee hopes that this Glossary will contribute to a common

understanding of how preservation and access needs can be addressed by

emerging technologies, in order to take full advantage of appropriate

opportunities.



    Rowland Brown, Chair

    Technology Assessment Advisory Committee



TAAC membership consists of representatives of the computer and

communications industries, as well as corporate and higher education

institutional consumers of advanced technologies. The members are: Adam

Hodgkin, Managing Director, Cherwell Scientific Publishing Limited;

Douglas van Houweling, Vice Provost for Information Technologies,

University of Michigan; Michael Lesk, Division Manager, Computer

Sciences Research, Bellcore; M. Stuart Lynn, Vice President for

Information Technologies, Cornell University; Robert Spinrad, Director,

Corporate Technology, Xerox Corporation; Robert L. Street, Vice

President for Information Resources, Stanford University; and Rowland

C.W. Brown, Chair, President, OCLC (retired).



<newpage id=ii>



                              ACKNOWLEDGEMENTS



The Committee is particularly grateful to John Dean, Conservation

Librarian, Cornell University Library; and to Lynne K. Personius,

Assistant Director for Scholarly Information Technologies, Cornell

Information Technologies, for their assistance in the preparation of

this paper. The Committee also owes a special debt of gratitude for

their careful review of the paper to Margaret Byrnes, Head Preservation

Section, National Library of Medicine, and to Gay Walker, Head Librarian

for Preservation, Yale University Library. Invaluable additional

comments were also provided by Millicent Abell, University Librarian,

Yale University; Richard De Gennaro, Roy E. Larsen Librarian, Harvard

University; James F. Govan, University Librarian, University of North

Carolina at Chapel Hill; Paula Kaufman, Dean of Libraries, University of

Tennessee; and Michael Keller, Associate University Librarian for

Collections Development, Yale University.



<newpage id=iii>



                                FOREWORD



This document is offered as a <e1>structured</> glossary of terms

associated with the technologies of document preservation, with

particular emphasis on document media conversion technologies (often

called "reformatting technologies"), and even more particularly on the

use of digital computer technologies. The Glossary also considers

technologies associated with access to such preserved documents. Such a

glossary is intended for communication among people of different

professional backgrounds, especially since in recent years there has

been a proliferation of such technologies and associated technical

terms, technologies and terms that cut across many disciplines.



The use of digital technologies, however, has implications for libraries

that extend far beyond the boundaries of preservation and of access to

preserved materials. Some of these implications are summarized in the

discussion in the Introduction of "The Impact of Digital Technologies,"

and are indicated throughout the Glossary. Thus this Glossary may serve

a wider purpose than the title itself would imply.



The Glossary is a <e1>structured</> glossary, in the sense that the

defined terms have been hierarchically grouped. The term "taxonomy" was

used to describe earlier drafts of the manuscript, but that term was

dropped since it might imply a degree of completeness and form beyond

that envisaged, or even possible. The Glossary is not intended to be

complete with respect to preservation technology as a whole, but is

highly selective (and even highly subjective) in its choice of terms to

include, and very much slanted towards the use and impact of digital

technologies. Other preservation technologies are sketched in for

contextual purposes only. Within these constraints, the Glossary is

intended to be comprehensive but not exhaustive.



The Glossary is not intended to be so comprehensive as to satisfy the

technologist only concerned with technologies, or the librarian

exclusively concerned with librarianship and preservation. It is

intended to satisfy the intersection of their concerns. On the other

hand, issues of preservation and access raise concepts that have

implications for librarianship as a whole, so that, in that sense, this

Glossary has consequences that are not limited to the preservation arena

alone.



<newpage id=iv>

<newpage id=v>



                   PRESERVATION AND ACCESS TECHNOLOGY



                  THE RELATIONSHIP BETWEEN DIGITAL AND

                   OTHER MEDIA CONVERSION PROCESSES:

                A STRUCTURED GLOSSARY OF TECHNICAL TERMS





                            TABLE OF CONTENTS



COMMITTEE PREFACE                                                    i



ACKNOWLEDGEMENTS                                                    ii



FOREWORD                                                           iii



TABLE OF CONTENTS                                                    v



INTRODUCTION                                                         1



    The Impact of Digital Technologies                               1



    Scope of the Glossary                                            5



    Structure of the Glossary                                        7



1. THE ORIGINAL DOCUMENT                                             9



    1.1.  Document Medium                                           10

        1.1.1.  Paper                                               10

        1.1.2.  Microform                                           11

        1.1.3.  Video                                               11

        1.1.4.  Film                                                12

        1.1.5.  Audio                                               12

        1.1.6.  Digital Electronic                                  12

            1.1.6.1 Magnetic Disk                                   13

            1.1.6.2 Magnetic Tape                                   13

            1.1.6.3 Optical Disk                                    13

            1.1.6.4 Optical Tape                                    13

            1.1.6.5 Magneto-Optical Disk                            13

        1.1.7.  Multi-Media                                         13



    1.2.  Document Format                                           14

        1.2.1.  Manuscript                                          15

        1.2.2.  Book                                                15

        1.2.3.  Pamphlet                                            15

        1.2.4.  Newspaper                                           15

        1.2.5.  Printed Sheet                                       15



<newpage id=vi>

        1.2.6.  Periodical                                          15

        1.2.7.  Cartographic Materials                              15

        1.2.8.  Music                                               16

        1.2.9.  Graphic Materials                                   16

            1.2 9.1 Art Original                                    16

            1.2 9.2 Filmstrip                                       16

            1.2 9.3 Photograph                                      16

            1.2 9.4 Picture                                         16

            1.2 9.5 Technical Drawing                               16

            1.2.9.6 Miscellaneous                                   16

        1.2.10 Data File                                            16

            1.2.10.1 Table                                          17



    1.3.  Document Periodicity                                      17

        1.3.1. Monograph                                            17

        1.3.2. Serial                                               17



    1.4.  Document Properties                                       18

        1.4.1.  Tone                                                18

            1.4.1.1    Monotone                                     18

                1.4.1.1.1 Two-Tone                                  18

                1.4.1.1.2 Greyscale                                 18

            1.4.1.2     Highlight Color                             19

            1.4.1.3  Two color                                      19

            1.4.1.4   Full Color                                    19

        1.4.2.  Object Type                                         19

            1.4.2.1 Text Objects                                    19

            1.4.2.2 Data Objects                                    19

                1.4.2.2.3 Table                                     19

            1.4.2.3 Graphic Objects                                 19

                1.4.2.3.1   Line Art                                20

                    1.4.2.3.1.1    Graphs                           20

                1.4.2.3.2  Halftone                                 20

                1.4.2.3.3  Discrete Tone                            20

                1.4.2.3.4  Continuous Tone                          20



    1.5.  Document Condition                                        20

        1.5.1.  Archival                                            21

        1.5.2.  Non-Archival                                        21

        1.5.3.  Acidic                                              21

        1.5.4.  Brittle                                             21

        1.5.5.  Other                                               22



    1.6.  Document Content                                          22

        1.6.1.  Intellectual Content                                22

        1.6.2.  Copyright                                           22

        1.6.3.  Structure                                           23

            1.6.3.1 Abstract (see 3 4.1.2)                          24

            1.6.3.2 Title Page                                      24

            1.6.3.3  Table of Contents (see 3 413)                  24

            1.6.3.4  List of Figures, Tables, Maps or

                Other Illustrations                                 24

            1.6.3.5   Preface (see 3.41.5)                          24

            1.6.3.6  Introduction (see 3 4.16)                      24

            1.6.3.7  Body                                           24

            1.6.3.8  Index (see 3 4.17)                             24

            1.6.3.9   Other                                         24



<newpage id=vii>

2. THE SELECTION PROCESS                                            25



    2.1.  By Title                                                  26



    2.2.  By Category                                               26



    2.3   By Bibliography                                           27



    2.4.  By Use                                                    27



    2.5.  By Condition                                              27



    2.6.  By Scholarly Advisory Committee                           27



    2.7.  By Conspectus                                             27



3. THE PRESERVED COPY                                               29



    3.1.  Preservation and Media Conversion Technologies            30

        3.1.1.  Conservation Treatment                              31

        3.1.2.  Paper Deacidification and Strengthening             31

        3.1.3.  Photocopying                                        32

        3.1.4.  Microform Recording                                 33

        3.1.5.  Electronic Digitization                             34

            3.1.5.1 Image Document                                  36

            3.1.5.2 Text Document                                   37

            3.1.5.2.1 Unformatted Text                              37

            3.1.5.2.2 Formatted Text                                37

            3.1.5.3 Compound Document                               37

        3.1.6.  Rekeying of Text                                    37

            3.1.6.1 Unformatted Text                                38

            3.1.6 2 Formatted Text                                  38

        3.1.7.  Reprinting or Republication                         38



    3.2.  Capture Technology                                        38

        3.2.1.  Photocopier                                         39

        3.2.2.  Microform Recorder                                  39

        3.2.3.  Digital Image Scanner                               39

        3.2.4.  Optical Character Recognition Scanner               40

        3.2.5.  Internal Character Recognition                      41

        3.2.6.  Intelligent Character Recognition                   42

        3.2.7.  Page Recognition                                    42

        3.2.8.  Rekeying of Text                                    42

        3.2.9.  Enhancement                                         42



    3.3.  Storage Technology                                        43

        3.3.1.  Storage Medium                                      43

            3.3.1.1   Paper (see 1 1 1)                             43

            3.3.1.2  Microform (see 1 12)                           43

            3.3.1.3  Video (see 1 1.3)                              43

            3.3.1.4  Film (see 1 14)                                43

            3.3.1.5  Audio (see 11.5)                               43

            3.3.1.6 Digital Electronic                              43

                3.3.1.6.1   Magnetic Disk                           44

                3.3.1.6.2    Magnetic Tape                          45

                3.3.1.6.3   Optical Disk                            45

                3.3.1.6.4   Optical Tape                            45

                3.3.1.6.5  Magneto-Optical  Disk                    46

        3.3.2.  Compression                                         46



<newpage id=vii>

            3.3.2.1 Uncompressed                                    46

            3.3 2.2 Reversibly Compressed                           47

                3 3.2.2.1  CCITT Group Compression                  47

                3.3.2.2.2  Reversible Textual Compression           47

                3.3.2.2.3  Page Description Language

                    Compression (PDL)                               47

                3.3.2.2.4 Other Compression Standards or Algorithms 47

            3.3.2.3 Irreversibly Compressed                         47

                3.3.2.3.1  Irreversible Textual Compression         47

        3.3.3.  Storage Format                                      47

        3.3.4.  Encoding Method                                     48

            3.3.4.1 No Encoding                                     48

            3.3.4.2 Textual Encoding                                48

            3.3.4.3 Markup Language Encoding                        49

            3.3 4.4 Page Description Language Encoding              49

        3.3.5.  Useful Life                                         49



    3.4.  Access Methodology or Technology                          50

        3.4.1.  Indexed Access                                      51

            3.4.1.1 Via Catalog                                     51

            3.4.1.2 Via Abstract                                    51

            3.4.1.3 Via Table of Contents                           51

            3.4.1.4 Via List of Figures, Tables, Maps or

                Other Illustrations                                 52

            3.4.1.1 Via Preface                                     52

            3.4.1.6 Via Introduction                                52

            3.4.1.7 Via Index                                       52

            3.4.1.8 Via Citation                                    52

        3.4.2.  Full (or Partial) Document Access                   52

        3.4.2.1 Via Inverted Text File Index                        53

        3.4.3.  Compound Document Access                            53



    3.5.  Distribution Technology                                   54

        3.5.1.  Distribution Medium                                 54

            3.5.1.1  Paper (see 1.1.1)                              54

            3.5.1.2  Microform (see 1.1.2)                          54

            3.5.1.3   Video (see 1.13)                              54

            3.5.1.4    Film (see 1.1.4)                             54

            3.5.1.5   Audio (see 1.1.5)                             54

            3.5.1.6   Digital Electronic (see 1.16)                 54

        3.5.2.  Messenger Services                                  55

        3.5.3.  FAX                                                 55

        3.5.4.  Print-on-Demand                                     55

        3.5.5.  Data Networks                                       56

            3.5.5.1 Local Area Network                              56

            3.5.5.2 Wide Area Network.                              56

            3.5.5.3 National Network                                57

        3.5.6.  Voice Networks                                      57

        3.5.7.  Cable Networks                                      57



    3.6.  Presentation Technology                                   57

        3.6.1. Presentation Medium                                  58

            3.6.1.1 Paper (see 1 1 1)                               58

            3.6.1.2 Microform (see 1 12)                            58



<newpage id=-x>

            3.6.1.3 Video (see 1.1.3)                               58

            3.6.1.4 Film (see 1.14)                                 58

            3.6.1.5 Audio (see 1.1.5)                               58

            3.6.1.6 Digital Electronic (see 1 16)                   58

        3.6.2.  Presentation or Viewing Device                      58

            3.6.2.1 Paper Document                                  58

            3.6.2.2 Microform Reader                                58

            3.6.2.3 Video Projector (Television Set)                58

            3.6.2.4 Film, Slide, or Other Projectors                59

            3.6.2.5 Audio Devices                                   59

            3.6.2.6 Computer Workstation                            59

                3.6.2.6.1 Display Monitor                           59

                3.6.2.6.2 Local Printer                             60

                3.6.2.6.3 Remote Printer                            60

                3.6.2.6.4  Other Local Media Output Device          60

            3.6.2.7 Multi-Media Workstation                         60



4. SOURCES OF INFORMATION                                           61



INDEX                                                               63

</toc>

<newpage id=x>

<newpage id=1>



                   PRESERVATION AND ACCESS TECHNOLOGY



                  THE RELATIONSHIP BETWEEN DIGITAL AND

                   OTHER MEDIA CONVERSION PROCESSES:

                A STRUCTURED GLOSSARY OF TECHNICAL TERMS



                              A Report of

              The Technology Assessment Advisory Committee

                                   of

               The Commission on Preservation and Access



INTRODUCTION



This document is offered as a structured glossary of terms associated

with the technologies of document <e1>preservation</>, with particular

emphasis on document <e1>media conversion</> technologies (often called

"reformatting technologies''), [1] and even more particularly on the use

of <e1>digital computer</> technologies. The Glossary also considers

technologies associated with access to such preserved documents. Such a

glossary is intended for communication among people of different

professional backgrounds, especially since in recent years there has

been a proliferation of such technologies and associated technical

terms, technologies and terms that cut across many disciplines.



The use of digital technologies, however, has implications for libraries

that extend far beyond the boundaries of preservation and of access to

preserved materials. Some of these implications are summarized in the

following discussion of "The Impact of Digital Technologies," and are

indicated throughout the Glossary. Thus this Glossary may serve a wider

purpose than the title itself would imply.





The Impact of Digital Technologies



The digital computer technology revolution continues to open up

concepts, many of which are only just beginning to be understood or

accepted. These concepts are critically important to librarianship in

general and preservation in particular. In a world historically

dominated by paper, the same medium is used for document capture

(creation, recording),



<newpage id=1>

storage, access, distribution and use, and there has been no compelling

need to consider these as separate entities. There has also been no

compelling need to distinguish between the format of a document and the

medium in which it is embodied, since there is only one dominant choice

of medium. Indeed, the terms have traditionally been used somewhat

interchangeably and indiscriminately. The introduction of non-paper

forms such as phonograph recordings and films has modified this

straightforward view somewhat, but traditional cataloging makes every

effort to foster the constraint that there is a one-to-one

correspondence between the format and the medium, with the objective of

identifying the combined format-medium with some physical shelf

location.



Further efforts to foster this constraint increasingly break down when

digital technologies enter the picture. Digital technologies open a

world that paradoxically is simultaneously more complex and, in some

ways, simpler. It is more complex because now the same document or

document format may intrinsically be represented in different media for

different purposes, forcefully motivating the need to distinguish

carefully between the format and the medium. Furthermore, different

media may be used interchangeably for different stages of document

handling, that is, for capture, storage, access, distribution, and use.

To complicate the situation even more, the documents may be encoded in a

myriad of ways at each of these stages.



And yet, separation of the format and the medium -- and treating each

stage of document handling separately -- may open up a more logical

structure free from traditional constraints. In this sense, digital

technologies may simplify certain aspects of librarianship.



Digital technologies present many new challenges, however, that must be

considered. For example, although these varying formats may be decoded

and translated back and forth among each other, many fear that the means

of decoding may become lost as a result of technological obsolescence,

conceivably making digitally stored documents inaccessible. There are

also many who question the longevity of the physical media used in

digital technologies. Others suggest that the appropriate way to address

both of these problems -- as well as to take advantage of the declining

costs of computer storage and of increasing storage densities -- may

well be to copy stored documents periodically onto new media.



Indeed the main advantage of the world of digital technologies, namely

that they represent a kind of "esperanto" of mutually comprehensible and

interchangeable formats, may, if not properly managed, also represent

their biggest weakness, because of the rapidity of change and

obsolescence, and because of the wide range of choices available at any

given time. Their



<newpage id=2>

very attractiveness could lure the unwary or the uninformed into

dangerous territory.



Periodic recopying onto new media represents a whole new approach for

libraries to the operation and financing of "inventory management"

(although though such practices are quite common in data centers). The

implications could be quite extensive. Librarians tend to think in terms

of periods of centuries rather than having (or wanting) to recopy every

few years. Such considerations may either hinder the adoption of digital

technologies for preservation or other purposes or eventually cause some

rethinking of the underlying economics of librarianship.



The incentive for such potential reevaluation, however, is not limited

to the preservation of older materials, nor is the influence of

technology the only driving factor. The underlying stimulus is a gradual

transition over the centuries -- perhaps spurred by the exponential

growth of recorded knowledge and information -- from documents with

associated physical or conceptually useful lifetimes, times between new

editions, or, more generically, times between "instances", that can be

measured in decades or centuries; to documents with associated times

between instances measured in much shorter units of time -- even, in the

case of "active documents" (see below), measured in minutes or seconds.



In essence, this represents a transition from "batch processing" to

"continuous processing." [2] The financial and other implications of

this could undoubtedly be far-reaching for libraries (a full discussion

is beyond the scope of this Glossary), introducing into the library

milieu unfamiliar (or, at least, largely unused) concepts associated

with continuous processes or processes with relatively short lifetimes,

such as "depreciation" and "lifecycle costing." These are concepts that

are familiar to the world of digital electronic processing and quite

normal outside of universities, but that have been avoided in worlds --

such as research libraries -- that depend to a greater or lesser extent

upon irregular gifts or grants of varying or unpredictable size,

donations directed to the purchase and immediate storage of documents,

but not to their maintenance. Indeed, one of the most serious questions

facing librarians in the future may be how to effect a match between the

changing economic demands of "continuous processes" and the traditional

nature of many funding sources. Will donors, for example, be as willing

to support the continuous demands of technological processing as they

have historically and generously supported the periodic construction of

library buildings? What



<newpage id=3>

implications does the financing of continuous processes have for the

"free" and openly accessible library? [3]



Yet the potential of digital technologies and of the flexibility they

offer is boundless. Over the coming decades, these technologies may open

up vistas of ever-increasing storage densities to where entire libraries

can be electronically stored in the space of a single room; of blinding

access and distribution speeds allowing whole documents to be moved

almost instantly across the nation's (and indeed the world's) data

networks, leading to the concept of the "distributed library;" of ease

of replication at very modest cost (another cause for alarm,

particularly to those concerned with protection of intellectual

property); of "print-on-demand" where paper copies of documents are only

printed "just in time" and not inventoried in advance of need; of

accessibility at a distance away from where the "digital document" or

preservation copy was created or is stored; and of intelligent automated

document analysis. Indeed, the means of creation and production of

documents have already been revolutionized by these technologies.



These technologies also open up horizons for totally new document

formats, such as <e1>active</> documents whose contents may combine

different media such as text, sound, video or voice; or whose contents

may change dynamically with time, what Harvey Wheeler called "the

fungible book." [4] The preservation of these new "active" formats is

not of direct interest to the subject of preservation of more

traditional formats (and therefore beyond the scope of this Glossary),

but is of indirect interest because digitally preserved traditional

documents can be incorporated into such active documents. Furthermore,

contemporary active documents will become a subject of future

preservation interest.



Some view the introduction of digital technologies into the world of

libraries as likely to cause a revolution as far-reaching as that caused

by the printing-press: a massive paradigm shift. Others view the

introduction with concern (one cannot help but recall that the monks at

first also viewed the introduction of the printing press with equal

concern), an intimidating perturbation that disturbs an equilibrium and

modalities of scholarship that have served well for many decades or even

for centuries.



Either way, digital technologies cannot be ignored. They are already

with us. The question is not whether they will have a presence, but the

pace



<newpage id=4>

and degree to which that presence will grow and influence. The next

twenty years are likely to be times of extraordinary change. Our

libraries -- indeed our universities, colleges, and our scholarly

communities -- may well be remade by the consequences of this

technological revolution.



And yet  --  in spite of technology's impact and of the revolutionary

consequences of that impact -- it must be recognized that technology

itself is not the ultimate driving force. It is the inexorable pressure

caused by the exponential growth of recorded knowledge, and the

ever-increasing complexity, costs, and other problems associated with

the storage and distribution of, and access to, such information.

Technology can provide some solutions: it is not an end in itself.



Furthermore --  for many reasons too numerous to detail here --  the

"digital library" is not about to replace the "paper library." Both will

need to coexist in a shifting environment, at least for the foreseeable

future. This in itself will present librarians with many economic,

organizational, social, technical, and other challenges.



Between the eager apostles of technology and those who approach change

with extreme caution lies the mass of professionals who are trying to

understand and grapple with the potential of this shifting environment,

many of them implementing prototype activities designed to elucidate

greater insight, [5] many working to close the gap between promise and

reality.



It is to these professionals -- from all fields --  that this Glossary

is dedicated, to provide a common language for dialogue and mutual

understanding, particularly as is required to address the problems of

preservation, and the potential application of digital technologies to

those problems. The Glossary is not intended to be so comprehensive as

to satisfy the technologist only concerned with technologies, or the

librarian exclusively concerned with librarianship and preservation. It

is intended to satisfy the intersection of their concerns. On the other

hand, issues of preservation and access raise concepts that have

implications for librarianship as a whole, so that, in that sense, this

Glossary has consequences that are not limited to the preservation arena

alone.





Scope of the Glossary



This document is a <e1>structured</> glossary, in the sense that the

terms have been hierarchically grouped. The term "taxonomy" was used to

describe earlier drafts of the manuscript, but that term was dropped

since it might



<newpage id=6>

imply a degree of completeness and form beyond that envisaged, or even

possible, for such a document. This document is not intended to be

complete with respect to preservation and access technologies as a

whole, but is highly selective (and even highly subjective) in its

choice of terms to include, and very much slanted towards the use and

impact of digital technologies. Other preservation technologies are

sketched in for contextual purposes only. Within these constraints, the

Glossary is intended to be comprehensive but not exhaustive.



The Glossary is not intended to solve all issues associated with the

definition of technological and other terms associated with preservation

and access. It is a conceptual document. Not all terms are defined with

equal precision; indeed, the degree of precision is largely directed by

the extent to which it is necessary to distinguish among these terms.

The Glossary is intended to be adequate to support further research and

development on the subject. Indeed, one measure of success of the

Glossary will be the extent to which it stimulates additional work in

the field, including refinements of the Glossary itself.



For the conceptual reasons outlined above, the Glossary departs from

many well-established norms. Furthermore, excluded in any detail are

terms primarily associated with <e1>conservation</>, such as paper

deacidification, where every effort is made to preserve the documents in

their original physical form, [6] or hand conservation. The focus, as

stated, is on preservation through <e1>media conversion</>

(traditionally known as "reformatting", a term which we do not favor in

this Glossary --  see 3.1), where the objective is to preserve the

intellectual content of the original document on some other medium, and

also if desired to produce at some later stage a close physical

facsimile of the original, at least to the extent allowed by the

technology.



The focus is also for the most part on <e1>paper</> documents requiring

preservation. These represent the principal (but not the only) area of

national and international attention: paper documents have the longest

history and exist in the greatest numbers. They are also in urgent need

of preservation because of the "embrittlement" (see 1.5.4) caused by the

high acid content of paper manufactured since the mid-nineteenth century

and by improper storage environments. In the years to come, the focus

may well shift to other media. There is already, for example,

considerable attention paid to film preservation, and video recordings

are already deteriorating at an alarming rate.



<newpage id=7>

Different technologies are more or less suitable to preserve different

classes of documents or for achieving different access or other

objectives. One of the main applications intended for this Glossary is

for the classification of ranges of activity that can be used to

describe different investigations into preservation and access

methodologies. The level of detail varies throughout the Glossary

according to what we believe is necessary to make the Glossary most

pertinent to this intended application.





Structure of the Glossary



The Glossary is divided into three main sections: the Original Document,

the Selection Process, and the Preserved Copy. The latter is dealt with

in the most detail; in turn it is divided into a number of subsections:

the first defines the actual preservation or media conversion

technologies that may be employed; and the remaining subsections are

devoted to the various technologies employed in the different stages of

preservation and access -- capture, storage, access, distribution, and

presentation.



The reader will observe that there is some repetition of discussion of

certain concepts throughout the Glossary. This is intentionally

introduced, since it is expected that most readers will not choose to

read the Glossary from cover to cover.



The overall structure of the Glossary is presented in Figure 1.





    <fig id=fig1>

            ------------------- -------------------

           |                   |                   |

           |                   |                   |

      THE ORIGINAL       THE SELECTION       THE PRESERVED

       DOCUMENT             PROCESS               COPY

           |                                       |

           |                                       |

     1.1 Medium                           3.1 Preservation Technology

     1.2 Format                           3.2 Capture Technology

     1.3 Periodicity                      3.3 Storage Technology

     1.4 Properties                       3.4 Access Technology

     1.5 Condition                        3.5 Distribution Technology

     1.6 Content                          3.6 Presentation Technology



             Figure 1: Overall Structure of Glossary

    </fig>



<newpage id=8>

<graphic status=omitted>

1.  THE ORIGINAL DOCUMENT



Different preservation or media conversion technologies are appropriate

to different kinds of original material. This section, therefore, is

devoted to a classification of terms used in describing the original

document to be preserved, particularly those terms that need to be

referenced in the context of media conversion.



              The term document is used generically throughout this

              Glossary to include all forms of books, manuscripts,

              records and other classes of material containing

              information or other matter of intellectual content,

              regardless of the actual medium (1.1) or format (1.2)

              employed.



The Glossary takes free license with terms that have taken on a

traditional meaning in the context of cataloging and other library

activities, and in fact frequently departs from traditional norms used

in this area. As stated in the Introduction, the reason for this is that

such traditional definitions often confuse the <e1>format</> and

<e1>content</> of the document with the <e1>medium</> used to record it,

terms that have traditionally been used somewhat interchangeably and

indiscriminately. This made sense when paper was the primary medium used

for document capture, storage, distribution, and use. With newer

technologies, however, and particularly with those used for media

conversion (3.1), different media can be used for each of these stages,

and, in fact, different media can be used for different instances of

each stage. In this context, therefore, it makes taxonomic sense to

separate format from medium.



For example, a traditional classification is "Motion pictures and video

recordings." In our Glossary, the document format would be "motion

pictures." The medium could be "film" or "videotape" or even "digital

electronic" (such as with digital video). Even a book (document format)

could be embodied in different media: "paper," "audio" (the "talking

book"), "microform," or "digital electronic." To extend the example,



<newpage id=9>

<newpage id=10>

the book could be <e1>stored</> in a digital electronic medium, and

subsequently <e1>distributed</> electronically, and <e1>used</> by

"printing-on- demand" on paper or microform, or by presentation at a

digital computer workstation.



<graphic status=omitted>

  1.1.    Document Medium



  Document Medium refers to the material upon which the original

  document was recorded.



    1.1.1.  Paper



    <e1>Paper<e1> is a medium traditionally used for printed books and

    other documents that are the most frequent target of preservation

    efforts. Paper is defined to be sheets usually made of vegetable

    fibers laid down on a fine screen from a water suspension. Marks are

    imprinted on the paper using any of a number of techniques including

    <e1>handwriting</> or <e1>drawing</> using a variety of media such

    as pencil, pen and ink, or pastel; <e1>various forms of printing</>

    using inks (numerous technologies are used to accomplish this);

    <e1>photographic printing</>, where paper coated with

    light-sensitive emulsion is exposed to various intensities of

    light); xerographic printing, where an electrically charged

    photoconductive insulating surface is selectively exposed to light

    and the latent image is developed with a resinous powder;

    <e1>thermographic printing</>, where the paper is exposed to a

    directed heat source that selectively modifies parts of the surface

    that may have been pre-treated with a heat-sensitive powder; and

    <e1>chemical transfer printing</>, where the surface of the paper is

    chemically coated and selectively modified by pressure or other

    means.



<newpage id=11>

    <e1>Parchment</> and <e1>vellum</> are not paper since they are made

    from the skins of sheep, goats, or calfskin. <ntr rid=nt7> We note

    them here for completeness.



    <e1>Hard Copy</> is a term often used to denote any document

    produced on paper.



    1.1.2. Microform



    <e1>Microform</> refers to a document medium for producing or

    reproducing printed matter. It records <e1>microimages</>, that is,

    images too small to be read without some form of magnification. In a

    general sense, microforms may be on film (1.1.4) or paper (1.1.1),

    but for purposes of this Glossary the definition is restricted to

    film. Reading a microform requires the assistance of a microform

    reader (3.6.2.2). Microform comes in different styles including

    <e1>microfilm</> (a film roll that contains microimages arranged

    sequentially) and <e1>microfiche</> (sheets of film in which many

    microimages are arranged in a grid pattern). Both usually contain a

    header that can be read without magnification).



    Microforms are an economic and compact form of document

    representation for archival storage, but are inconvenient to read

    when compared with a printed book. Microform technology is used as a

    preservation medium (3.1.4), as a means of saving space (such as for

    the convenient storage of newspapers), or as a means of duplicating

    scarce or unique documents, that is, microreproductions of other

    original documents. However, microform is sometimes used for

    original documents, for example, those created on a computer and

    directly printed out onto a <e1>computer-output-on-microfiche

    (COM)</> device; and for microreproductions of material assembled

    for the purposes of releasing an original edition in microform.



    1.1.3. Video



    Video is normally an analog (see definition under 1.1.6) electronic

    technology for recording still or moving images, usually combined

    with sound (cf. 1.1.5). Following standards (which vary across the

    world) defined for television playback and broadcasting, the images

    are normally recorded on magnetic tape (3.3.1.6.2), when it is known

    as <e1>videotape</>, but also on other physical media such as

    optical disk (3.3.1.6.3) (<e1>videodisk</>).



<newpage id=12>

    Playback is usually achieved through a television set or video

    projector (3.6.2.3), although it is now possible and becoming common

    to play video recordings back through a computer (3.6.2.6) or

    multimedia workstation (3.6.2.7).



    1.1.4.  Film



    <e1>Film</> is a recording medium consisting of thin sheets or

    strips of transparent or translucent material, such as polyester or

    acetate, coated with a light-sensitive emulsion. Recording occurs by

    exposing the film to the light emitted or reflected by the entity

    being recorded. Film is also the medium used for microfilm recording

    (1.1.2). A <e1>photograph</> (1.2.9.3) is produced using essentially

    the same technology, except that normally the light- sensitive

    emulsion is adhered to paper or some other opaque medium.



    1.1.5. Audio



    <e1>Audio</> documents are recordings made on a variety of (usually)

    magnetic media (see 3.3.1.6) of sounds only (as contrasted with

    video recordings (1.1.3) that also combine images). The evolution of

    such audio recordings has traversed a large number of different

    formats and physical media, including <e1>phonograph</> disks

    (records) of varying size (78 rpm's. 45 rpm's, 33 rpm's) and

    <e1>tape cassettes</> (of different formats), both of which are

    analog (see 1.1.6) recording technologies; and, more recently,

    <e1>compact disks</> and <e1>digital acoustic tapes (DATs)</>, which

    are digitally (1.1.6) encoded.



    1.1.6. Digital Electronic



    <e1>Digital Electronic Technologies</> [8] are technologies used to

    capture (3.2.3), store (3.3.1.6), transform (3.3.2, 3.3.4),

    distribute (3.5.1.6) or present (3.6.1.6, 3.6.2.6, 3.6.2.7)

    information in quantized electronic form (normally as a sequence of

    O's and l's known as <e1>bits</>). <e1>Digital</>, in which

    information is quantized discretely, is to be contrasted with

    <e1>Analog</>, in which information is not quantized but maintained

    in a continuous format. [9] A video



<newpage id=13>

    recording (1.1.3), is an example of an electronic technology that is

    analog [10].



    For a variety of reasons, digital technologies are gradually

    replacing analog technologies. Reasons of importance to this

    Glossary are the convertibility of digital technologies among each

    other and into and from other technologies (such as paper and

    voice), so that digital technologies become a kind of <it>lingua

    franca</> of communication and storage; and the ease of transmission

    of information by digital technologies across networks (3.5.5) to

    facilitate communication at a distance.



    Original documents that are of concern for library preservation

    purposes are not normally encoded in a digital electronic medium.

    [11] Since this may become a subject of future concern, the category

    is included for completeness. Definitions, however, are more

    appropriately included under Storage Technology Medium (3.3.1.6).



      1.1.6.1 Magnetic Disk (see 3.3.1.6.1)



      1.1.6.2 Magnetic Tape (see 3.3.1.6.2)



      1.1.6.3 Optical Disk (see 3.3.1.6.3)



      1.1.6.4 Optical Tape (see 3.3.1.6.4)



      1.1.6.5 Magneto-Optical Disk (see 3.3.1.6.5)



    1.1.7. Multi-Media



    <e1>Multi-Media</> is a term used to denote documents created using

    a number of different media simultaneously, usually those with an

    electronic technological basis: for example, a digital electronic

    recording (1.1.6) that also combines video (1.1.3) and audio

    (1.1.5), and that may, as part of the document, intrinsically

    produce paper (1.1.1) outputs.



<newpage id=14>

<graphic status=omitted>

  1.2.   Document Format



  <e1>Document Format</> refers to the class of document with respect to

  its <e1>style, arrangement, or layout</>.



  Although this Glossary emphasizes the distinction between format and

  medium, some formats are more closely associated with a given medium.

  Thus, formats such as documentary, short, feature, and newsreel are

  most closely associated with the medium of film. Consistent with the

  main thrust of this Glossary, we emphasize those formats that are

  mostly associated with the medium of paper, even though several of

  these formats may also be embodied in other media (the "talking book,"

  for example, recorded, say, on tape cassettes).



  The term "format" itself may be too all-encompassing. There may be a

  need to further distinguish between the "type" of a document, such as

  "book," and the arrangement or layout of the book -- such as formatted

  text on pages, or simply linear text that is not formatted into pages

  (as in the "talking book" where pages are not distinguished). However,

  this Glossary does not make this distinction, partly because of its

  focus on the paper milieu, where such a distinction may not be

  necessary, and partly because in the emerging world of digital

  technologies it may be premature to attempt such a distinction.



  The use of the term "format" should not be confused with its use in

  the context of "reformatting." The latter, as described in 3.1, is

  best replaced by the term "media conversion."



<newpage id=15>

    1.2.1. Manuscript



    For purposes of this Glossary, an original, unpublished document

    directly created by its author(s), usually on paper or parchment,

    and often in the author's own hand.



    1.2.2. Book



    A monograph (1.3.1) publication containing more than 49 pages,

    usually on paper. [12]



    1.2.3. Pamphlet



    A complete monograph (1.3.1) of at least 5 but not more than 49

    pages, usually on paper (see Footnote 12).



    1.2.4. Newspaper



    A serial (1.3.2) publication issued at stated, frequent intervals

    containing news, opinions, advertisements, and other topical

    material, usually on paper (see Footnote 12).



    1.2.5. Printed Sheet



    A single sheet of printed paper such as a poster (but see 1.2.9.4),

    broadside, folded leaflet, or memorandum, usually on paper.



    1.2.6. Periodical



    A serial publication (1.3.2) appearing at regular or stated

    intervals, generally more frequently than annually, usually on paper

    (see Footnote 12). Includes <e1>magazines</> and <e1>journals</>.



    1.2.7. Cartographic Materials



    Representations of a selection of abstract features of the universe,

    most often in relation to the surface of the earth, often on paper

    but also on other substrates.



<newpage id=16>

    1.2.8. Music



    In this context, printed representation of musical notation for

    instrumental, chamber, orchestral, and vocal scores, usually on

    paper (see footnote 12).



    1.2.9. Graphic Materials



      1.2.9.1 Art Originals, Prints, and Reproductions



      Illustrated works, such as drawings, engravings, and lithographs,

      issued separately from books.



      The following terms are included for completeness, but without

      definition [13]:



      1.2.9.2 Filmstrips



      1.2.9.3 Photographs, Slides, Transparencies, and Stereographs



      1.2.9.4 Pictures, Postcards, and Posters



      1.2.9.5 Technical Drawings (including Architectural Plans)



      1.2.9.6 Miscellaneous



              The Miscellaneous category includes flash cards,

              radiographs, study prints, and wall charts.



    1.2.10. Data File



    The term <e1>Data File</> is used generically to denote a document

    consisting of a collection of data, normally organized in some

    logical fashion so as to facilitate access (3.4). Such data may

    consist of factual information, statistics, numbers, textual, or

    composite records to be used as a basis for reasoning, discussion,

    or calculation. An entity within a data file is known as a

    (<e1>data</>) <e1>record</>. A collection of data files is sometimes

    known as a <e1>databank</>, particularly when the data files are

    electronically encoded (1.1.6).



    Although data files may be encoded in any media (for example, a

    paper card index file is an example of a data file), the term has

    most often come to be used in connection with data files that are

    electronically encoded and stored in digital electronic form

    (3.3.1.6).



<newpage id=17>

      1.2.10.1 Table



      A data file arranged into two-dimensional form, normally

      consisting of rows and columns together with headings or labels to

      depict the contents of the rows and columns. Tables may themselves

      contain other tables as elements resulting in a "latticed"

      arrangement of data. A <e1>spreadsheet</> is a special form of

      table originally used for accounting purposes and containing

      financial data, but which now includes a wide variety of complex

      reports arranged in tabular form, often with the aid of computer

      workstations (3.6.2.6).



<graphic status=omitted>

  1.3.   Document Periodicity



  <e1>Periodicity</> refers to the number of parts into which the

  document is divided and the manner or sequence in which those parts

  are or have been published.



    1.3.1. Monograph



    A <e1>Monograph</> is a published work, collection, or other

    document that is not a serial (1.3.2).



    1.3.2. Serial



    A <e1>Serial</> is a publication issued in successive parts, bearing

    numerical or chronological designations, at regular or irregular

    intervals and intended to continue indefinitely.



<newpage id=18>

<graphic status=omitted>

  1.4.    Document Properties



  <e1>Document Properties</> refers to a classification of various

  components of documents as to their different tonal or color content

  and as to the types of objects [14] they contain. Emphasis is placed

  on those properties most closely associated with documents produced on

  paper.



    1.4.1. Tone



    <e1>Tone</> refers to the color quality or color content of the

    document or parts of the document regardless of form or material

    content.



      1.4.1.1 Monotone



      <e1>Monotone</> documents (or parts of documents) are printed or

      otherwise produced using one color hue <ntr rid=nt15> only, most

      often black or near-black.



        1.4.1.I.I  Two-Tone



        Those parts of a monotone document that are represented in only

        two contrasting tones (regardless of the hue of the color,

        although the term is most often associated with black hues),

        with no intermediate shades. Thus, for purposes of this

        Glossary, a book printed with red ink on yellow paper would be

        considered two-tone. When one of the shades is black or near-

        black, and the other white or near-white, the document is

        described as being produced in <e1>black-and-white</>.



        1.4.1.1.2       Greyscale



        Those parts of a monotone document that are presented using a

        range of tones (regardless of the hue of the underlying color).

        The range of tones may either be <e1>continuous</> (such as in a



<newpage id=19>

        photograph), where all possible values may essentially be taken

        on, or <e1>discrete</>, where only a finite set of values may be

        taken on.



      1.4.1.2    Highlight Color



      A two-tone (1.4.1.1.1) document, parts of which additionally

      contain areas highlighted with a second single color of uniform

      shade.



        1.4.1.3 Two Color



        A document containing two colors, intermixed to create

        intervening hues, and two extreme tones (normally black and

        white) used to create a continuous or discrete (see 1.4.1.1.2)

        range of shades.



        1.4.1.4 Full Color



        A document containing or attempting to contain a full range of

        colors, normally of all hues, tones, and shades.



    1.4.2. Object Type



    <e1>Object Type</> (see also Footnote 13) is a descriptor that

    conveys information about a given sub-area (<e1>object</>) of the

    document with regard to the manner in which it conveys data or

    information.



      1.4.2.1 Text Objects



      <e1>Text Objects</> are document objects consisting of written or

      printed (or otherwise displayed) stored words or ideograms.



      1.4.2.2 Data Objects



      <e>Data Objects</> are document objects consisting of factual

      information normally arranged into datafiles (1.2.10) or tables

      (1.2.10.1) which are used as a basis for reasoning, discussion, or

      calculation.



        1.4.2.2.3  Table



        See 1.2.10.1.



      1.4.2.3 Graphic Objects



      <e>Graphic Objects</> are document objects containing image

      information consisting of artwork, photographs, technical drawings

      etc, perhaps containing limited amounts of text usually as

      captions or for labelling purposes.



<newpage id=20>

        1.4.2.3.1       Line Art



        <e>Graphic objects</> created entirely from the use of text,

        dots, and straight or curved lines.



          1.4.2.3.1.1    Graphs



          Line art objects consisting of representations of the

          interrelationships of data in pictorial form.



        1.4.2.3.2      Halftone



        A representation of a greyscale (1.4.1.1.2) or color graphic

        object as a series of dots obtained, for example, by

        photographing or scanning an image through a mesh screen. By

        limiting the dots to, say, black and white (for example, by

        using high-contrast film), the illusion of greyscale may be

        created in a two-tone or black-and-white document (1.4.1.1.1).



        1.4.2.3.3       Discrete Tone



        A greyscale or color (1.4.1.4) graphic object where the tones

        take on discrete (normally equispaced) values within a range.



        1.4.2.3.4       Continuous Tone



        A greyscale (1.4.1.1.2) or color (1.4.1.4) graphic object where

        the tones fall continuously across an entire range of values,

        such as in a photograph (1.1.4, 1.2.9.3).



<graphic status=omitted>

  1.5.         Document Condition



  <e>Condition</> refers to the physical state of the document compared

  with its state when originally published. The following presents only

  those characteristics of the physical state of a document that



<newpage id=21>

  are pertinent to the main thrust of this Glossary, that is, to the

  paper milieu.



    1.5.1. Archival



    A document that can be expected to be kept permanently as closely as

    possible to its original form. An </>archival document medium</> is

    one that can be "expected" to retain permanently its original

    characteristics (such expectations may or may not prove to be

    realized in actual practice). A document published in such a medium

    is of <e>archival quality</> and can be expected to resist

    deterioration.



    <e1>Permanent</> paper is manufactured to resist chemical action so

    as to retard the effects of aging as determined by precise technical

    specifications. <e1>Durability</> refers to certain lasting

    qualities with respect to folding and tear resistance.



    See also 3.3.5.



    1.5.2. Non-Archival



    A document that is not intended or cannot be expected to be kept

    permanently, and that may therefore be created or published on a

    medium (1.1) that cannot be expected to retain its original

    characteristics and resist deterioration.



    1.5.3. Acidic



    A condition in which the concentration of hydrogen ions in an

    aqueous solution exceeds that of the hydroxyl ions. In paper, the

    strength of the acid denotes the state of deterioration that, if not

    chemically reversed (3.1.2), will result in embrittlement (1.5.4).

    Discoloration of the paper (for example, <e1>yellowing</>) may be an

    early sign of deterioration in paper.



    1.5.4. Brittle



    That property of a material that causes it to break or crack when

    depressed by bending. In paper, evidence of deterioration usually is

    exhibited by the paper's inability to withstand one or two

    (different standards are used) double corner folds. A <e1>corner

    fold</> is characterized by bending the corner of a page completely

    over on itself, and a <e1>double corner fold</> consists of

    repeating the action twice.



<newpage id=22>

    1.5.5. Other



    There are many other conditions that characterize the condition of a

    document. Bindings of books, for example, may have deteriorated for

    a variety of conditions. Non-paper documents may exhibit a variety

    of conditions (see, for example, 3.3.5 for a discussion of the

    concept of "Useful Life"). However, with the focus on paper original

    documents and on media conversion technologies for preservation, a

    full analysis of document condition would be beyond the scope of

    this Glossary.



<graphic status=omitted>

  1.6.    Document Content



  Document Content refers to the substance of the material or

  information within the document that is intended to be communicated.



    1.6.1. Intellectual Content



    <e>Intellectual Content</> refers to the ideas, thought processes,

    artistic expressions, etc., contained within the document.



    1.6.2. Copyright [16]



    <<e1>Copyright</> refers to a means of legal protection provided to

    the author(s) of original published and unpublished works that have

    been "fixed in a tangible form of expression," in order to afford

    such authors the exclusive right of <e1>exploitation</>, in

    particular the right to control the reproduction, distribution,

    performance, or display of the work, or to control the



<newpage id=23>

    preparation of derivative works. [17] Often, exploitation of the

    work by others requires the consent of the author(s) and the payment

    of a <e1>royalty</> to the author(s), usually in the form of a fixed

    sum of money for each copy made, shown, or distributed.



    For works copyrighted in the United States after January 1, 1978,

    protection afforded to the author(s) or the author(s)' estate is

    usually for the author(s)' lifetime plus 50 years. For works created

    prior to that date, the copyright period was 28 years from the date

    of publication (or the date of registration of copyright for

    unpublished works), plus an additional period of 47 years for works

    whose copyright was renewed during the last year of the first term.



    Works published in the United States may be afforded protection in

    countries that were members of the Universal Copyright Convention or

    of the Berne Convention for the Protection of Literary and Artistic

    Works. Conversely, works published in such member countries are

    protected within the United States.



    Most works that are the subject of preservation interest were

    published before 1978. The copyrights on the majority of those works

    were not renewed for the optional second term. Thus, the copyrights

    have expired on most of the works of current preservation interest

    that were subject to United States copyright protection. However,

    since this is not true of all such works, the normal practice is to

    check copyright ownership to verify clearance.



    1.6.3. Structure



    <e1>Structure</> refers to the divisions within a document provided

    for ease of access, reference, and other purposes. The broad

    structure of a given document is likely to vary according to its

    format (1.2), and there is also not necessarily any standard

    structure for a given format. With its long history, the structure

    of the printed book (1.2.2) has evolved towards a somewhat standard

    structure. Because of the focus of this Glossary on the preservation

    of the printed book, a typical book structure is presented here and

    structures for other formats are omitted.



<newpage id=24>

      1.6.3.1 Abstract (see 3.4.1.2)



      1.6.3.2 Title Page



      The <e1>Title Page</> of a work normally contains the title of the

      work, its author(s), and the name of the publisher.



      1.6.3.3 Table of Contents (see 3.4.1.3)



      1.6.3.4 List of Figures, Tables, Maps or Other Illustrations (see

      3.4.1.4)



      1.6.3.5 Preface (see 3.4.1.5)



      1.6.3.6 Introduction (see 3.4.1.6)



      1.6.3.7 Body



      The <e</>>Body</> of a document refers to the main corpus of the

      work. It may be divided into <e1>characters</>, <e1>chapters</>,

      <e1>articles</>, or other segments.



      1.6.3.8 Index (see 3.4.1.7)



      1.6.3.9 Other



      This category includes publisher's notes, credits, frontispieces,

      and other minutiae of publication.



<newpage id=25>

<graphic status=omitted>

2.  THE SELECTION PROCESS [18]



The <e1>Selection Process</> refers to the means whereby original

documents are selected for preservation purposes. The choice of

selection strategy may be intrinsically affected by the choice of

preservation or media conversion technology used (see 3.1), since the

latter may well affect costs and other parameters associated with the

former. Thus, the total costs of preservation will be a complex

combination of the effects of selection strategy and choice of

technology.



Thus, for example, with the use of microform (3.1.4), it is highly

desirable (if not imperative) to obtain a complete copy of the document

to be preserved prior to recording. This may require replacing missing

or damaged pages from the prime copy being microfilmed, and the expense

of obtaining these pages from copies held in other libraries.

Microfilming also places a premium on recording only once. With the use

of digital technologies (3.1.5), on the other hand, such replacement

pages could be scanned at a later date and electronically "edited" into

the main electronic document: with digital technologies, it may in fact

be cheaper to scan more than one copy to facilitate such "editing"

rather than to expend excessive manual labor on assembling the most

perfect paper copy possible prior to microfilming.



The following is a brief -- and very over-simplified -- classification

of selection methodologies. It is only intended to sketch the range of

possibilities and not to do full justice to the complexity of this

subject. It merely indicates some of the main lines of strategy or

process used in selecting documents for preservation. Furthermore, often

a combination of approaches is used rather than any single approach,

with the actual condition of the document being the dominant factor in

the choice.



<newpage id=26>

In all cases, the "universe" of documents to which the selection

strategies outlined in this Section are applied is those documents that

are deteriorating or are likely to deteriorate, such as brittle books

or, more generally, books printed on acidic paper. "Preservation",

however, may also be applied to the conversion onto other media of

materials that, while in quite good condition, are scarce or unique,

thus allowing patrons to handle facsimiles instead of the precious

originals.



The term "essentially all documents" is used below to define documents

from within the former universe that fit within the indicated selection

strategy, while allowing that a number of these selected documents may

yet be rejected following review for various reasons (such as having

deteriorated to the point that preservation is not possible, or because

it has been determined that the document has already been preserved

elsewhere).



  2.1.    By Title



  Selection is made from among individual works, perhaps by professional

  bibliographers who, possibly working in consultation with others, make

  a determination of the value of the selected work to a given

  collection, discipline, or field of study.



  2.2.    By Category



  Selection is made by choosing essentially all documents from a within

  a given category, such as within a given time period, or of a given

  format (for example, all newspapers), subject classification, special

  collection, or, say, American imprint. The essence of this approach is

  that all documents within the category be readily and conveniently

  definable and accessible, without having to resort to time-consuming

  selection processes.



  Colloquially, this approach is sometimes erroneously termed the

  <e1>vacuum cleaner approach"</>, an appellation that is overly

  pejorative insofar as some prior review is almost always made to

  reject materials within a category that for various reasons are not

  suitable or desirable for preservation. In particular, a check is made

  to ensure that the material has not already been preserved.



  Selection, for example, by time period permits the focus of effort on

  those periods of highest risk of deterioration with respect to

  paper-manufacturing processes.



<newpage id=27>

  2.3     By Bibliography



  Selection is made by choosing essentially all documents specified in a

  published bibliography.



  2.4.    By Use



  Selection is made by choosing essentially all documents in poor

  condition that are actually used by patrons as judged by some

  criterion such as, for example, frequency of circulation.



  2.5.    By Condition



  Selection is made by preserving the documents in the worst physical

  condition.



  The foregoing are examples of selection according to certain

  established <e1>criteria</>. Selection may also be made according to

  established <e1>procedures</>:



  2.6.    By Scholarly Advisory Committee



  Selection is made with the assistance of a committee of scholars

  knowledgeable in a particular field who choose the material they

  consider to be of most importance to that field.



  2.7.    By Conspectus



  Selection is made from institutional collections determined in a

  program initiated by the Research Libraries Group (RLG) [19] and

  described in the RLG Conspectus. The Conspectus describes collections

  on various levels from Level O (Out-of-Scope, a level which is in fact

  non-existent), through Level 4 (Research), to Level 5 (Comprehensive).

  Collection development officers (selectors) in about 50 major research

  libraries in the U.S. have evaluated their own collections to provide

  such brief descriptions. The Conspectus can be used as one of several

  means to determine "Great Collections."



<newpage id=28>

<newpage id=29>

<graphic status=omitted>

3.  THE PRESERVED COPY



This section addresses technologies employed in the preservation

process. The first section broadly classifies different kinds of

preservation processes. The remaining sections focus on the different

technological stages associated with preservation processes dependent

upon media conversion technologies. These are: capture technologies,

storage technologies, access technologies, distribution technologies,

and presentation technologies.



The divisions among these various stages of technology may, at first,

seem artificial, particularly to those used to working with paper. For

example, we distinguish between the storage medium (3.3.1), the

distribution medium (3.5.1), and the presentation medium (3.6.1). In the

world of paper, as stated in the Introduction, these are usually all one

and the same, even though the same paper book, say, may play different

roles at different times. When it is on the library bookshelf, it is a

<e1>storage</> medium; when it is being messengered through interlibrary

loan, it is the <e1>distribution</> medium; and when it is being read by

the patron, it is the <e1>presentation</> medium. In the world of

convertible technologies, the separation becomes more than convenient

sophistry  -- it becomes essential, since different media may well be

used at any stage of the process. Consider, for example, a table from a

scientific journal article (paper: the storage medium), which is FAXed

across the nation using a data network (digital electronic: the

distribution medium), and printed out directly onto photographic slides

(film; the presentation medium) for projection in a lecture.



Indeed, in the preservation milieu, this conceptual separation also

offers considerable flexibility. It offers the flexibility of separating

the act of preservation itself from the ultimate means of storage and

delivery. Thus, for example, microfilming may be used as a preservation

process (3.1.4), but the microfilm contents may be printed later onto

paper for user presentation purposes. Or the microfilm may be digitally

scanned



<newpage id=30>

and the contents stored on computer files for subsequent distribution

across networks. As another example of this flexibility, images scanned

and stored using digital preservation techniques (3.1.5) may later be

interpreted using internal character recognition (3.2.5) or page

recognition (3.2.6) technologies.



The point is that the ultimate use of the preserved document may not be

well- articulated at the time of preservation. Thus, preservation

technologies that offer the greatest flexibility are to be preferred to

those (such as photocopying (3.1.3)) that offer less flexibility,

although lack of funds and patron preference often dictates the use of

the latter.



The distinction between the various technology stages is maintained

throughout this Glossary.



<graphic status=omitted>

  3.1.    Preservation and Media Conversion Technologies



  Many different technologies have been proposed to address the problems

  of preservation. These can be divided into three broad categories:

  those directed at preserving both the content and physical embodiment

  of the original, those directed at preserving the content and copying

  the physical embodiment, and those directed at preserving the content

  only, without concern for the physical embodiment. Conservation and

  paper deacidification fall into the first category. The remaining

  technologies described below fall into the other categories.



  In the second category every effort is made to copy the physical

  embodiment or format of the original as faithfully as possible,

  normally onto another medium. The term <e1>media conversion</>

  technologies is thus used for this class (<it>note</>: this does not

  exclude copying a paper document onto another paper document: media

  conversion has still occurred). Media Conversion includes photocopying

  (3.1.3), microform recording (3.1.4), and the use of electronic

  digitization techniques (3.1.5).



  3.1 The Preserved Copy: Media Conversion Technologies

<newpage id=31>



  The third category makes no attempt to preserve or copy the physical

  embodiment of the original. For example, merely rekeying the text (see

  3.2.8) of a document composed entirely of text preserves only content

  and nothing else if no attempt is made to capture font and other

  formatting information.



  Among librarians, the term "reformatting" has traditionally been used

  for "media conversion." The former term is not used in this Glossary

  because of possible confusion with the concept of Document Format

  (1.2). Furthermore, "reformatting" does not do justice to the concept

  of copying onto microform (3.1.4) or of digital scanning (3.1.5). [20]



  This necessarily brief glossary of different preservation approaches

  also summarizes some of the key issues involved in comparing the

  various alternatives.



    3.1.1. Conservation Treatment [21]



    The treatment of a document to preserve it in its original form, in

    recognition that the original medium, format, and content are all

    important for research and other purposes. Pure conservation

    approaches are normally hand-tailored to the individual document

    and, as such, may be relatively expensive. Use is normally,

    therefore, limited to those situations where such expensive

    treatment is justified by the research requirements.



    3.1.2. Paper Deacidification and Strengthening [22]



    The treatment by chemicals to stabilize a document (in paper, by

    alkalization to neutralize the acid content) and/or to strengthen it

    (in paper by the use of a support coating or by impregnation). The

    alkalization treatment also usually entails depositing an alkaline

    reserve to buffer against further acidification.



    Deacidification or strengthening can be applied to individual

    documents or, with some treatment processes, to a large number



<newpage id=32>

    of documents at once (<e1>mass</> or <e1>bulk deacidification</>).

    The latter is a relatively cheap approach, and pilot plants have

    been or are being established in a number of countries to support

    different processes. There is, however, no standard approach at this

    time even though there appear to be a number of promising

    alternatives. There are also a number of unanswered questions at

    this time regarding the longevity of chemical stabilization

    processes, toxicity, the feasibility of scaling processes to full

    production requirements, the potential continuing "offgassing"

    implications to patrons resulting from the storage of thousands of

    treated volumes in confined library spaces, and other issues. Recent

    research appears to be addressing many of these concerns.



    Deacidification is essentially a stabilization process that arrests

    deterioration. It does not turn brittle books back to their original

    state, although coating or impregnation can strengthen the paper to

    extend its useful life. Its greatest utility may lie in arresting

    embrittlement in books that are not too far gone, or for

    prophylactic protection of new or old books that have not yet

    started to turn brittle. Deacidification may also "buy time" in

    anticipation of later preservation by other processes.



    3.1.3. Photocopying



    <e1>Photocopying</> refers to the process of preserving the document

    by making a full-size (usually bound similarly to the original)

    facsimile copy on archival (1.5.1) paper by creating a photographic

    copy of the images of the pages contained in the document, possibly

    using a <e1>photocopier</> (3.2.1). As used here, photocopying

    refers to an in-line process where the original is scanned and one

    or more photocopies made all in one pass, with no form of retained

    intermediate storage being automatically generated (as contrasted

    with <e1>microform recording</> (3.1.4)) so that more copies can be

    made in the future. In actual practice, however, when photocopying

    is used for preservation it is customary to make a second photocopy

    that is retained in unbound form, so that further copies can readily

    be made in the future from this master copy.



    A distinction is made between straight photocopying, which does not

    necessarily involve the use of archival paper (1.5.1), and

    <e>preservation photocopying</>, which does require the use of

    archival paper.



    The advantages of making such a facsimile are that normally a single

    paper facsimile is produced that is quite faithful to the



<newpage id=33>

    original, there is no machine interface required other than the

    photocopier itself, the medium (1.1) and format( 1.2) of the

    original are retained, and the cost is usually less than other

    processes, particularly if the original is a monochrome document.

    Furthermore, library patrons prefer paper facsimiles to the use of,

    say, microforms (3.1.4), except where bulky documents, such as

    newspapers, are involved. The disadvantages, as compared with

    microform recording (3.1.4) and electronic digital preservation

    (3.1.5), is that normally second copies made from the master copy

    are of poorer quality than, say, prints of microforms made from

    master microforms. Furthermore, the costs of making subsequent

    copies is higher than the cost of printing microforms. Another

    disadvantage, shared to a greater or lesser extent with microforms,

    is that photocopying does not precisely reproduce all the

    information in the original, and there is some loss of information,

    especially for graphic objects (1.4.2.3) involving other than line

    art (1 .4.2.3.1).



    3.1.4. Microform Recording



    <e>Microform Recording</> refers to the process of preserving the

    document by filming the original document onto a microform film

    negative (1.1.2), that is, storing microimages of the pages or

    segments of the document on film. Positive film copies, which can be

    produced inexpensively, are made from this original film negative or

    <e1>master</>. Such a positive copy is both a storage (3.3) and

    distribution (3.5) technology, and is normally viewed using a

    <e1>microform reader</> (3.6.2.2), or paper positive prints may be

    made from the positive microform using printing devices designed for

    the purpose. Access to microfilm (1.1.2) using such a reader is

    serial (cf 3.3.1.6), whereas access to microfiche (1.1.2) is random

    (cf 3.3.1.6) like a book.



    The advantages of microform are that the process is economically

    competitive with other processes; that film has a long useful life

    (3.3.5); and that microform copies  -- made from a second negative

    [23] (known as the <e1>printing master</>) copied from the original

    negative --  may be made cheaply and distributed among other

    institutions, so that access is not limited to a single facsimile.

    Microform preservation is a well-tried, tested, and accepted method

    of preservation.



<newpage id=34>

    The disadvantages are that there is usually a loss of information in

    the recording process, particularly in recording continuous tone

    imagery (1.4.2.3.4), since the film used is usually of high

    contrast; [24] and that readers dislike using microform readers

    compared with, say, reading books.



    Microform-preserved documents can subsequently be converted to other

    media besides paper. They can be scanned (3.2.3) and converted to

    digitally-encoded documents (3.1.5) to take advantage of the

    benefits of digital encoding for storage, distribution, and access.

    However, any loss of information in the original recording process

    will be perpetuated in the subsequent digital recording.



    3.1.5. Electronic Digitization



    <e>Electronic Digitization</> refers to the capture of the document

    in electronic form through a process of scanning (see 3.2.3) and

    digitization. The scanned image is stored electronically, usually on

    magnetic (see 3.3.1.6.1 and 3.3.1.6.2) or optical (see 3.3.1.6.3 and

    3.3.1.6.4) storage media. The electronically stored image may be

    further <e1>transformed</> for reasons such as compression (see

    3.3.2) or information interpretation (see 3.3.3); and subsequently

    selected through the use of access technologies (see 3.4),

    distributed through the use of distribution technologies (see 3.5),

    or viewed through the use of presentation technologies (see 3.6).



    When originally scanned, or as a result of subsequent

    transformations, the document may in whole or in part be stored in

    <e1>image</> (3.1.5.1), <e1>unformatted text</> (3.1.5.2.1),

    <e1>formatted text</> (3.1.5.2.2), or <e1>compound</> (3.1.5.3)

    form. The distinction is important insofar as it affects <it>inter

    alia</> the extent to which information such as text in the scanned

    document may be interpreted (3.2.5, 3.2.6, 3.2.7) and used for

    purposes of information access (3.4, in particular 3.4.2, but see

    also 3.1.5.1, 3.1.5.2, 3.2.4). An <e1>image</> representation is an

    electronic pictorial representation composed of dots (black and

    white, greyscale, or color) much like a halftone (1.4.2.3.2) printed

    photograph, and no distinction is made between text and other

    information (such as graphs, pictures, and so forth) contained in

    the document -- in other words, the letter "b" is not stored as a

    character per se, but as a <e1>"digital picture"</> of the letter

    "b", and the series of numbers stored to represent the picture would

    be quite distinct



<newpage id=35>

    among different typestyles used. Text representations, on the other

    hand, represent text as text, with a specific code used to denote

    the letter "b" independent of what typestyle is used.



    Image representations cannot be searched for words or phrases: text

    representations can. Image representations of text may be converted

    into formatted or unformatted text representations using OCR (3.2.4)

    or ICR (3.2.5) techniques, but with loss of accuracy. In the context

    of preservation, image representations are likely to dominate, since

    the cost of transforming image into text representations with

    sufficient accuracy may be prohibitively high, at least in the

    immediate future. Thus full-text searching, for example, is not

    likely to be a feature of digitally-preserved documents. This is

    unlike the situation that exists with documents where the text

    already exists in digital electronic form, such as if the publisher

    had preserved the original tapes used in typesetting.



    If and when OCR techniques are able to convert image format to text

    format with sufficient accuracy and performance, then the archives

    of digitally-preserved material in image format can be converted to

    text format using ICR (3.2.5) techniques, provided the original

    material was scanned with sufficiently high resolution (3.2.3).

    Furthermore, promising research has been done recently on the

    searching of documents for retrieval purposes using the "corrupted"

    (erroneous) text derived from the OCR or ICR scanning of image

    documents at existing levels of OCR/ICR accuracy and performance.



    The advantage of electronic digitization is that it potentially

    combines the advantages of photocopying and microform recording

    while eliminating some of the disadvantages. Paper facsimiles can be

    produced at will by <e1>printing-on-demand</> (3.5.4) on paper (or

    writing the appropriate signals on whatever might be the appropriate

    output medium, in the case of video, film, or sound), thus

    eliminating the need for awkward microform readers. Alternatively,

    the stored images can be reconstructed and viewed at computer

    workstations (3.6.2.6). Furthermore, the stored digital images can

    be distributed essentially at will across data networks (3.5.5) for

    sharing among institutions. The content of the stored images can

    also be interpreted at any time (3.2.5, 3.2.6, 3.2.7) after

    recording (whenever it might become economically desirable to do so)

    for purposes of, say, creating indices for access purposes (3.4.1).



<newpage id=36>



    Another key advantage is the robustness of digital encoding. Further

    copies, including copies made in new formats (3.3.3) on other

    digital electronic storage media (3.3.1.6) for purposes of extending

    the useful life of the digital copy (see Introduction and 3.3.5),

    can be made without loss of information, as contrasted with

    photocopying (3.1.3) or microform recording (3.1.4). Furthermore,

    scanned images can be digitally enhanced (3.2.9) to improve the

    image quality.



    The disadvantages are that this is a new and relatively untried

    technology, and the cost and other trade-offs are uncertain at this

    time. There are also concerns about the useful life (3.3.5) of

    present storage media, both in terms of the physical properties of

    the media and in terms of the robustness of the recording format

    (3.3.3) and of the means of access. Some, however, take the view

    that it will be both functionally and economically imperative in any

    event to recopy the data from storage medium to storage medium every

    few years to take advantage of the rapidly declining storage costs

    and increasing storage capacities of the technology, and that the

    useful life of a given medium is not the relevant issue (see

    Introduction and 3.3.5).



      3.1.5.1 Image Document



      A representation of the document <e1>image</> is electronically

      captured (usually with the aid of a digital image scanner -- see

      3.2.3) or created without interpretation of its actual

      <e1>content</>. This is stored as a sequence of l's or O's (known

      as <e1>bits</>), a "digital photograph" as it were. In certain

      image representations, a "l" indicates "black" and a "O" indicates

      "white" (<e1>Binary Encoding</>), but usually the representation

      is encoded in more complex representations (see 3.3.4 Encoding

      Method). In some representations, for example, the average grey

      level of a small area of the page, termed a "pixel", is encoded

      (<e1>Greyscale Encoding</>. See also 1.4.1.1.2). Such a pixel is a

      grey dot. The number of dots per inch is termed the <e1>pixel

      resolution</>. This pixel resolution may range from 100 per inch

      to several thousand per inch.



      It is not unusual, for reasons of storage economy, to convert a

      greyscale- encoded image document into a binary-encoded image

      document of higher resolution at the time an image document is

      stored. Compression techniques (3.3.2) are used to achieve this.

      The resultant stored image represents a compromise between

      scanning resolution, image fidelity, and storage space.



      The electronically-encoded sequence of l's and O's that represent

      an Image Document is also known as a <e1>Bitmap</>.



      Image Documents are generally accessed by associating an index

      entry, such as a page number, with a segment of the Image

      Document. See



<newpage id=37>



      discussion following under 3.1.5.2 regarding other issues

      associated with searching and retrieving Image Documents.



      3.1.5.2 Text Document



      The text of the document only is captured as <e1>character</>

      representations, that is, each alphabetic character has a unique

      representation (see discussion above) following a standard means

      of encoding, such as the <e1>ASCII</> standard. With electronic

      digital storage, the amount of space taken to store a

      <e1>representation</> of a character generally takes far less than

      the amount of space taken to represent a character in image form.

      Usually, each character representation of a letter of, say, the

      Roman alphabet takes 8 bits (1 byte) of storage space. When stored

      in image form, the representation may take several orders of

      magnitude more storage space, depending upon the size of the

      character, the scanning resolution, and the degree of compression

      (see 3.3.2) used. See also 3 3 4 2



      Storing a document as a text document facilitates full-text or

      partial- text retrieval (see 3.4.2), where documents or parts of

      documents can be selected and retrieved by searching for the

      occurrence of keywords or strings of text. This is not possible

      with Image Documents (3.1.5.1), unless they have been wholly or

      partially converted to Text Documents using Optical Character

      Recognition (OCR) techniques (3.2.4, 3.2.5), a process that is not

      sufficiently accurate for most preservation purposes (see,

      however, 3.2.4 for a discussion of the use of such techniques for

      the construction of indices).



        3.1.5.2.1  Unformatted Text



        The character representation of the text contains no information

        to indicate font style, font size, or page layout. In this

        sense, unformatted character text representations are an example

        of irreversible compression (see 3.3.2.3).



        3.1.5.2.2  Formatted Text



        The character representation of the text also contains

        sufficient information to describe one or more of font type,

        font size, or page layout. In this sense, formatted text may, if

        the document segment contains only textual material, represent a

        form of reversible compression (see 3.3.2.2).



      3.1.5.3 Compound Document



      The document is captured as a combination of image and formatted

      or unformatted text.



      3.1.6. Rekeying of Text



      <e1>Rekeying of Text</> refers to a preservation technology where

      the text in a document is literally reentered by hand into a



<newpage id=38>

      composition or other device for republication or reproduction

      purposes, often with the use of a digital computer. See also

      3.2.8.



      3.1.6.1 Unformatted Text



      In the rekeying of the text, no attempt is made to key sufficient

      information to indicate font style, font size, or page layout.



      3.1.6.2 Formatted Text



      In the rekeying of text, information is captured to indicate one

      or more of font style, font size, or page layout.



    3.1.7. Reprinting or Republication



    The document is preserved by producing a new edition or reprint,

    possibly by reprinting from retained intermediate forms of the

    document, such as reprinting a book from photocomposition tapes.

    Alternatively, the document may be recreated from scratch.



<graphic status=omitted>

  3.2.   Capture Technology



  <e>Capture Technology</> refers to the technology used to transform

  the images or information contained in the original document into some

  other form, the form dependent upon the overall <e>media conversion

  technology</> being used. This term is not relevant to Conservation

  (3.1.1) or Deacidification (3.1.2), which are <e1>conservation</>

  technologies, and do not employ media conversion techniques. Printing

  (see 1.1.1) on paper, is of course also a capture technology.



<newpage id=39>

    3.2.1. Photocopier



    A <e1>Photocopier</> is a device for making photographic copies of

    graphic images. A common form of the photocopier involves the use of

    the <e1>xerographic</> process, where light reflected from the

    original document is focused onto an electrically charged insulated

    photoconductor, and the latent image is developed using a resinous

    powder. For the purposes of this Glossary, the term

    <e1>photocopier</> is restricted to devices that use <e1>analog</>

    technologies, such as the use of light lens technology.

    <e1>Digital</> technologies are incorporated separately (see 3.2.3).

    With photocopiers so defined, the image is normally scanned and

    printed essentially in a single operation, and an intermediate

    scanned latent image is not normally stored for re-use at a later

    stage -- although the two stage processes of photography, which

    indeed may be used for photocopying, do permit the use of the

    photographic negative as an intermediate storage device (a

    particular case of which is the use of microform recording

    technology -- see 3.2.2).



    3.2.2. Microform Recorder



    A <e1>Microform Recorder</> is a camera or other photographic device

    for photographing the original document and printing it onto one of

    several forms of microform (1.1.2). The microform film in essence

    becomes both a storage medium (see 3.3.1.2) and a presentation

    medium (see 3.6.1.2 and 3.6.2.1). Other film copies and paper copies

    may also be made from the microform negatives for presentation (see

    3.6.1.2).



    3.2.3. Digital Image Scanner



    A <e1>Digital Image Scanner</> is a device for scanning the images

    contained on pages of a document and transforming the scanned image

    into digital electronic signals corresponding to the physical state

    at each part of the search area, that is, into image documents

    (3.1.5.1). These signals are most often stored (see 3.3) for

    subsequent interpretation (see 3.2.5, 3.2.6, 3.2.7, and 3.3.2,

    3.3.4), access (3.4), distribution (3.5), or presentation (3.6). A

    single small element of the document (known as a "pixel") is thus

    encoded quantitatively by a digital number, where the number

    contains sufficient information to represent the <e1>image</>

    content of the pixel (see 3.1.5.1). A digital image scanner on its

    own does not interpret the image information. The number of pixels

    per square inch is considered to be the <e1>resolution</> of the

    scanner. Typical resolutions with current technology range from 100



<newpage id=40>

    pixels per linear inch to over 1,000 pixels per linear inch, but

    there are trade-offs between resolution, speed, cost, and quality.



    Digital Image Scanners may scan in one or more different modes,

    depending upon their capability and depending upon whether they are

    scanning monotone or color (1.4.1), or whether they are scanning

    line art, greyscale, halftone, or continuous tone objects (1.4.2.3,

    3.1.5.1). Performance, in terms of speed, accuracy, and resolution

    depend upon the degree to which these attributes can be

    accommodated. The speed of digital image scanners range from one or

    two pages per minute to around fifty per minute.



    A FAX machine (3.5.3) is a special form of digital image scanner.

    Other special forms of digital image scanners exist for scanning

    from media other than paper, such as digital image scanners that

    scan directly from microfilm (1.1.2). Such images scanned from

    microfilm, however, can be no better than the original microfilm

    image itself (see 3.1.4).



    Digital image scanners may come equipped with different physical

    devices for accommodating the original documents. These may include

    flatbed platens equipped with manual feeds, semi-automatic feeds

    (one page at a time is fed into an automatic hopper), or

    fully-automatic feeds. Manual feeds offer the greatest safety from

    potential jamming, a point of importance in the scanning of unique

    documents. Flatbed scanners generally require either books to be

    disbound and one page at a time placed on the platen, or require

    books to be laid open face-down on the platen, which may cause some

    distortion. They may also come equipped with edge-scanners, which

    scan right up to the binding of the book, avoiding this distortion;

    or with cradle scanners, where the book is opened in a cradle (such

    devices are also used in some microform recording devices) and two

    angled scanning heads are lowered into the open, cradled book. In

    all cases, quality control of scanning is an issue with respect to

    fidelity of the scanned image and registration of the scanned image

    with respect to a defined standard.



    3.2.4. Optical Character Recognition Scanner



    An </>Optical Character Recognition (OCR) Scanner</> is a digital

    image scanner that in addition interprets the textual portion of the

    images and converts it to digital codes representing formatted or

    unformatted text (3.1.5.2). The less sophisticated such devices can

    only "recognize" one or a few fonts of a fixed



<newpage id=41>

    size, and can only interpret such information as unformatted text.

    The more sophisticated devices can represent multiple fonts of

    different sizes, and can interpret limited information as formatted

    text. At either extreme, no device achieves 100% recognition

    accuracy: accuracy of the better devices typically ranges between

    95% and 98%, depending upon manufacturer imposed trade-offs between

    the sophistication of the device, its speed, and its intended range

    of applicability.



    OCR devices are most often used where scanning errors and

    unformatted text are acceptable limitations, such as, for example,

    where the input material can be subsequently proofread and

    corrected, or where redundant information is scanned and the

    redundant information used to correct any inconsistencies arising

    from scanning errors (typically in certain commercial applications).

    In the context of document preservation, most uses of OCR devices

    are limited to where text information only suffices, and the form of

    the original document is not an important aspect of preservation. An

    important application is for use in the construction of indices for

    access and distribution (see 3.4 and 3.5), or for full contextual

    searching of information (3.4.2). Promising research has been done,

    for example, on the searching and retrieval of documents for

    retrieval purposes using the "corrupted" (erroneous) text derived

    from the OCR scanning of documents. The techniques utilized in this

    approach exploit the redundant information contained in the

    corrupted text.



    Handwriting recognition devices, an extreme form of OCR devices, are

    not included in this Glossary. At this time, such devices are

    limited in capability.



    3.2.5. Internal Character Recognition



    Internal Character Recognition is the term sometimes used when the

    same interpretation technology that is used in OCR devices (3.2.4)

    is applied to an already stored digital image at a later date. This

    separates the functions of scanning the images (3.2.3) digitally,

    and of interpreting the images. Interpreting the scanned and stored

    images at a later date also allows for using different recognition

    technologies in the tradeoffs between accuracy, speed, and function.

    In the context of preservation and media conversion, it also allows

    for the immediate focus to be placed on scanning and storage (and

    possibly media conversion), deferring the option of character

    recognition and its applications (see 3.2.4) to a later date -- at

    such time, massive-volume



<newpage id=42>

    character recognition and information interpretation is likely to be

    more economically feasible at higher levels of accuracy than with

    present technology.



    3.2.6. Intelligent Character Recognition



    <e>Intelligent Character Recognition</> is the term sometimes given

    to Optical or Internal Character Recognition where the scanned and

    recognized information is further interpreted to take advantage of

    contextual information, that is, words, phrases, and so forth,

    rather than simply treating the text as a string of independent

    characters. Intelligent Character Recognition, for example, may be

    used by sophisticated computer programs to construct concordances

    automatically, or to create highly- sophisticated indexes. At this

    stage, intelligent character recognition is a field of research,

    rather than production, interest .



  3.2.7. Page Recognition



    <e>Page Recognition</> is the term given to the automatic

    interpretation of features contained within the printed page such as

    titles, subheads, columns, paragraphs, figures, figure captions,

    footnotes, and so forth. Additional capabilities of sophisticated

    page recognition algorithms include the ability to determine fonts

    and font sizes. In essence, Page Recognition "reverse engineers" the

    image into marked-up copy.



  3.2.8. Rekeying of Text



    As an alternative or complement to OCR (3.2.4), textual information

    can be encoded by directly keying alpha-numeric text into computer

    files manually. This has some advantage in accuracy over OCR, but is

    slower. It may also be used in situations where the brittleness of

    acidic documents makes them so fragile that scanning technologies

    cannot safely be used. See also 3.1.6.



  3.2.9. Enhancement



    <e1>Enhancement</> refers to the use of mathematical algorithms to

    improve the quality of digitally scanned images (3.2.3), such as by

    computationally adjusting the contrast or brightness of the scanned

    image. The term also includes techniques that may be used to modify

    the scanned image for structural reasons, such as <e1>bordering</>

    to remove any unwanted scanned areas surrounding



<newpage id=43>

    the actual document pages, <e1>de-skewing</> to rectify the scanned

    image to correct for any skew in the placement of the document on

    the scanner, or <e1>margin adjustment</> to ensure that pages are

    properly aligned with each other.



    A full glossary of terms associated with enhancement is beyond the

    scope of this document.



<graphic status=omitted>

  3.3.    Storage Technology



  <e>Storage Technology</> refers to the technology used to store the

  images or information obtained through the use of some form of Capture

  Technology (3.2). This includes the medium used for storage (3.3.1),

  the <e1>compression</> methodology used to minimize the amount of

  storage medium employed (3.3.2), the <e1>format</> used to program the

  image or information onto the medium (3.3.3), the <e1>encoding

  methods</> used to represent any interpretation of the stored

  information (3.3.4), and the <e1>useful life</> of the storage medium

  (3.3.5).



    3.3.1. Storage Medium



      3.3.1.1 Paper (see 1.1.1 )



      3.3.1.2 Microform (see 1.1.2)



      3.3.1.3 Video (see 1.1.3)



      3.3.1.4 Film (see 1.1.4)



      3.3.1.5 Audio (see 1.1.5)



      3.3.1.6 Digital Electronic



      A family of storage devices where information or data are

      represented by a series of quantized changes to the surface of the

      storage medium, where such quanta are recorded or modified using

      electronic means. There are two main classes in this category:

      <e1>magnetic</> devices where, in recording, the magnetic state of

      a coated surface is altered by the electronic digital signal, and,

      in reading, the surface is sensed using reading heads conceptually

      similar to those used in common tape recorders; and <e1>optical</>

      devices where the optical properties of a coated



<newpage id=44>

      surface are altered (in one such technology, submicrometer-sized

      holes are recorded and read by laser beams focused by electronic

      means onto the area of the spot). The recorded quanta normally

      corresponds to a recorded "I" or a recorded "0", that is, of

      <e1>bits</> <e1>(derived from "binary digits"</>), all data and

      information being constructed from these basic building blocks.



      Such devices are further classified according to whether they are

      <e>read/write</> devices (that is, information may be written onto

      the device and read from the device, and the information can be

      modified as many times as desired),<e1> read only memory (ROM)</>

      devices (that is, prerecorded information can be read from the

      device, but the information cannot be modified), or

      <e1>write-once-read-many (WORM)</> devices (that is, information

      may be written once by the consumer onto the device, but

      thereafter it can only be read). Most optical devices are either

      read only or WORM devices, but a class of devices that combine

      both magnetic and optical technologies (<e1>magneto-optical

      devices</>) are indeed read/write devices.



      Typically, magnetic devices are of higher performance in terms of

      <e>access time</> to a given segment of recorded information and

      <e1>transfer time</> of such accessed information to the host

      device. Optical devices, however, are generally more economic in

      terms of storage capacity. Magnetic technologies have a longer

      history than optical technologies, and more is known about their

      useful life, for example (see 3.3.). Both technologies seem to be

      following similar cost/performance curves with performance

      parameters doubling in capability approximately every two to three

      years (except for access times which are improving much more

      slowly), and cost per bit halving about every two to three years.



      Both devices are further classified as to whether they are

      <e1>random access</> devices (such as <e1>disk storage devices</>)

      or <e1>serial access</> devices (such as <e1>tape storage</>

      devices). With random access devices, information stored at any

      point can be directly accessed (much as is accomplished by placing

      the playing-arm of a phonograph at any point on the phonograph

      record); with serial access devices, information can only be

      accessed by passing through information that may be recorded ahead

      of it on the medium (as in winding through a tape on a tape

      recorder to arrive at a particular passage).



        3.3.1.6.1  Magnetic Disk



        A rotating circular plate having a magnetized surface on which

        information may be stored as a pattern of polarized spots on

        concentric or spiral recording tracks. These plates or platters

        are usually stacked in <e1>disk drives</>, several to a drive.

        These platters may either be <e1>removable</> or not, although

        in high performance disk drives, the platters are usually not

        removable. They are, however, read/write devices (3.3.1.6). Some

        removable magnetic disks of lower capacity are known as

        <e>floppy disks</>, since originally the recording medium was

        made of a flexible plastic.



<newpage id=45>

        3.3.1.6.2  Magnetic Tape



        A plastic, paper, or metal tape that is coated or impregnated

        with magnetizable iron oxide particles on which information is

        stored as a pattern of polarized spots. These are read using

        magnetic tape drives. Access times with magnetic tapes are

        slower than those associated with correspondingly priced disks,

        since they are serial access devices, but the tapes are almost

        always removable so that the information can be stored

        <e>off-line</>, thus making tapes [25] useful for archival

        storage (but see 3.3.5).



        3.3.1.6.3  Optical Disk



        A rotating circular plate on which information is stored as

        submicrometer-sized holes and is recorded and read by laser

        beams focused on the disk. This includes the class of

        <e1>CD-ROM</> devices, which embodies the same 5 1/4" diameter

        format used for CD recordings. CD-ROM's are usually read by

        inserting the CD-ROM disk into a <e1>CD-ROM player</>. Other

        typical formats involve 12" or 14" diameter formats, but there

        is a dearth of standards. The latter are usually read by

        inserting them into <e>optical jukebox</> devices, which perform

        the role suggested by their name. Even when mounted, access

        times for optical disks are typically relatively slow, because

        of the lag time needed to "spin up" the disk. However, the cost

        per stored bit is extremely low. Error rates may also be higher

        than for magnetic technologies. As such, optical disks are most

        useful where there is an abundance of redundant information

        contained in the stored data, such as would be the case with the

        storage of scanned document pages. On viewing the data, the eve

        would not likely be troubled by a tiny dot among an ocean of

        dots being the wrong shade of grey. See also the discussion of

        magneto-optical devices (3.3.1.6.5). Conversely, magnetic

        devices excel in the recording of encoded text (see 3.3.4.2),

        but may be expensive to use for the storage of images even when

        compressed (3.3.2).



        3.3.1.6.4  Optical Tape



        An emerging class of technology that combines the advantages and

        disadvantages of tape (3.3.1.6.2) with those of optical

        recording technology (3.3.1.6.3). Their chief advantage ma,v lie

        in very cheap cost per bit storage, but at this time they suffer

        from relatively high error rates.



<newpage id=46>

        3.3.1.6.5  Magneto-Optical Disk



        Disks that combine the use of magnetic and optical technologies.

        To record data, elements of the crystal structure of the

        substrate are aligned by using a laser to heat the element in

        the presence of an applied magnetic field. When the magnetic

        field is aligned one way, a "I" is recorded; when the magnetic

        field is reversed, a "0" is recorded. The data are read by

        reflecting a lower-intensity laser beam off the surface; the

        polarization of the reflected light varies according to the

        crystal alignment of the element of the substrate. Unlike

        regular optical disks, magneto-optical disks are read/write, and

        have performance characteristics somewhere between those of

        magnetic disks and optical disks in terms of access times,

        transfer rates, and storage capacity.



    3.3.2. Compression



    <e1>Compression</> refers to the extent to which the encoded form of

    the preserved or reformatted document has been modified to reduce

    the amount of storage space required by the storage medium. The

    technique takes advantage of the great redundancy that is present in

    much recorded data, particularly in image documents (3.1.5.1).

    Savings of storage of factors of ten or more may readily be achieved

    depending upon the scanning resolution and methodology employed

    (3.2.3), the type of material being scanned, and the particular

    compression method used. Although without compression the storage

    requirements grow rapidly as the square of the scanning resolution

    (3.2.3), with effective compression methods the storage requirements

    can be constrained to grow almost linearly with the scanning

    resolution. This is because advantage is taken of the greater data

    redundancy accruing from the increase of scanning resolution --

    compression effectively eliminates or reduces this data redundancy.

    Thus, the greater the redundancy of information contained in the

    scanned material, the more compression is possible -- continuous

    tone photographs, for example, often contain large amounts of

    redundant information. Compression is an important factor in the

    economics and efficacy of digital preservation.



      3.3.2.1 Uncompressed



      No compression has occurred.



<newpage id=47>

      3.3.2.2 Reversibly Compressed



      Compression has occurred so that the process can, if required, be

      reversed so that the original can be recovered without loss of

      information. Also known as "lossless".



        3.3.2.2.1  CCITT Group Compression



        Compression standards defined by the International Consultative

        Committee for Telephony and Telegraphy (Comite Consultative

        Internationale pour la Telephonie et la Telegraphie).



        3.3.2.2.2  Reversible Textual Compression



        If sufficiently complete, the representation in whole or in part

        of documents as formatted text (3.1.5.2.2) may represent a form

        of reversible compression. The use of a markup language

        (3.3.4.3) is also a form of reversible textual compression. See

        also 3.3.4.



        3.3.2.2.3  Page Description Language Compression (PDL)



        See 3.3.4.4 3.3.2.2.4  Other Compression Standards or Algorithms



        Refers to other compression standards, <it>de facto</>

        standards, or algorithms.



        3.3.2.3 Irreversibly Compressed



        Compression has occurred so that the process cannot be precisely

        reversed. The original cannot be recovered without loss of

        information.



        3.3.2.3.1  Irreversible Textual Compression



        The representation in whole or in part of a document as

        unformatted or partially formatted text (3.1.5.2) may represent

        a form of irreversible compression. The content of the text may

        be obtained but not one or more of its font style, font size, or

        positioning on the page.



    3.3.3. Storage Format



    As used in information storage and retrieval, <e1>Format</> or

    <e1>Storage Format</> refers to the actual representation of the

    stored data on the storage medium, that is, the specific way in

    which it is encoded or programmed onto the medium. Classifying such

    methodologies is beyond the scope of this document. Indeed, for the

    most part -- and particularly as applied to digital electronic



<newpage id=48>

    storage technologies -- there are few general standards that are

    accepted by all or most manufacturers. The implication is that

    access to the information stored on the medium depends upon specific

    software or computer programs supplied by the manufacturer, software

    that may become obsolete with the passage of time. One result may be

    that stored information may need to be reformatted or transferred to

    newer storage media periodically in order for the information to

    remain accessible with current software and technology.



    3.3.4. Encoding Method



    <e>Encoding Method</> refers to the <e1>extent</> to which the

    information e1>content</> of the document has been interpreted and

    encoded, rather than merely recorded. Such interpretation may be

    beneficial for a number of reasons including as a means of achieving

    reversible compression (3.3.2.2); for the construction of document

    indices to facilitate searching and access (3.4.1); or for efficient

    distribution of the information across data networks (3.5.5). For

    example, a document that has been merely scanned as a bit-mapped

    image (3.1.5.1) has not been encoded (3.3.4.1), even though faithful

    "digital pictures" of the pages of the document have been obtained.

    If the images of the document text are later interpreted through

    internal character recognition (3.2.5), then the digital

    representation has been <e1>textually encoded</> (3.3.4.2).



      3.3.4.1 No Encoding



      No interpretation of the information contained in the original

      document has occurred. If the document were originally scanned

      using a digital image scanner (3.2.3), then the document in this

      instance is generally stored in some image format (3.1.5.1),

      compressed or not (3.3.2). If portions of the document were

      originally scanned using optical character recognition (3.2.4),

      then those portions will be stored as either formatted or

      unformatted text (3.1.5.2).



      3.3.4.2 Textual Encoding



      The text contained in the original document has been interpreted

      so that each character has a separate representation (see

      3.1.5.2). Such interpretation may have occurred at the time of

      scanning if an optical character recognition device is used

      (3.2.4), or later using internal character recognition (3.2.5)

      programs applied to documents in image format (3.1.5.1). Such

      textual interpretation may result in either unformatted or

      formatted text, depending upon the degree of sophistication of the

      device or program. Recognition accuracy may also be limited.



<newpage id=49>

      3.3.4.3 Markup Language Encoding



      A computer markup language is a means for describing, for an

      electronically stored document, the complete positioning, format,

      and style of text and image segment representations (3.1.5) within

      the document. When combined with textual representation, it is a

      means for achieving fully formatted text (3.1.5.2.1). When

      combined with relevant image information about document graphics

      material (if any), it may be a means of archiving fully reversible

      compression (3.3.2.2) of the document. An example of a markup

      language is <e1>SGML (Standard Generalized Markup Language)</>

      that has been adopted by the United States Government and by many

      publishers as a pseudo-standard.



      3.3.4.4 Page Description Language Encoding



      A computer language in which segments of text and images are

      economically described with respect to form, orientation, size,

      density, and other characteristics for purposes of economic

      transmission across networks and between host devices and output

      devices such as printers. Page Description Languages are another

      form of compression (3.3.2), as well as a form of encoding.



      3.3.5. Useful Life



      <e>Useful Life</> refers to the archival quality of the storage

      medium. It usually refers to the period of time during which there

      is no unacceptable loss of information stored on the medium; and

      during which the storage medium remains usable for its intended

      purpose.



      The longevity of paper varies considerably depending upon its

      method of manufacture and conditions of storage (see 1.5). Unless

      the paper is produced to meet permanent standards (1.5.1), paper

      may last from a few years or so to hundreds of years. Most paper

      produced since the middle of the nineteenth century has a useful

      life of less than 100 years. Paper produced to meet archival

      standards should last several hundred years. Film, provided it is

      manufactured, processed, and stored according to archival

      standards, appears to have a useful life well in excess of 500

      years. Videotape appears to be extremely vulnerable and to have a

      relatively short life of a few decades.



      Digital electronic storage media have a varying useful life

      projected to range from a few years to over 100 years. The latter

      has not been formally tested by experience, but is projected based

      on laboratory stress tests. Such media, however, become obsolete

      for other reasons long before their physical properties render



<newpage id=50>

      them useless (see, for example, 3.3.3). It becomes economically

      and functionally infeasible to maintain the information stored on

      the original medium of capture, since it becomes far cheaper to

      transfer the information periodically to higher density and

      cheaper newer technologies. Concerns also exist regarding the

      possibility of modifying digitally-encoded documents, particularly

      when "read/write" (3.3.1.6) devices are used (this is essentially

      not possible with "read only" or "write once, read many"

      technologies (3.3.1.6)); and regarding other issues of security.



      The implications of periodic recopying for libraries are quite

      far- reaching. Libraries are not used to having to maintain their

      inventory by periodic recopying, even though such practices are

      quite common in data centers. Indeed, the recent impetus of

      preservation may have caused some librarians to rethink their

      position in this regard, although librarians still tend to think

      in terms of periods of centuries rather than having (or wanting)

      to recopy every few years. Such considerations may either hinder

      the adoption of digital technologies or eventually cause some

      rethinking of the underlying economics of librarianship.



      Further implications are discussed in the Introduction. <graphic

      status=omitted>



  3.4.     Access Methodology or Technology



  <e>Access Methodology</> or <e1>Technology</> refers to the means of

  selecting information from among all the information that is stored.



<newpage id=51>

    3.4.1. Indexed Access



    A <e1>Document Index</> is a systematically ordered file of objects

    [26] that refer to a collection of documents or to specific parts of

    those documents, organized in such a way as to facilitate searching

    the document collection for purposes of selection of single

    documents or groups of documents contained in the collection. Such

    document indices may be stored on different media depending upon how

    they are to be used.



      3.4.1.1 Via Catalog



      Access via a file of bibliographic records, created according to

      specific and uniform principles of construction and under the

      control of an <e1>authority file</>, which describes the documents

      contained in a collection. The file is usually organized in a

      systematic manner to facilitate access and document selection.

      Catalogs historically have been implemented in card files, but

      increasingly such card files are retroactively and prospectively

      giving way to computerized data files (1.2.10) which may be

      accessed and searched by patrons with the use of computer

      workstations (3.6.2.6) and data networks (3.5.5). Such

      computer-based catalogs are increasing in sophistication to

      support complex queries, including <e1>Boolean</> queries, which

      support logical searching (e.g., all the works of fiction written

      in Albania published between 1890 and 1919 by authors whose last

      name begins with the letter "L").



      3.4.1.2 Via Abstract



      Access via a summary of the document. Most often, the summary is

      of a contribution to a journal (1.2.6) or other periodical

      (1.3.2). Such a summary is usually without interpretation or

      criticism, and may contain a bibliographic reference (or

      <e1>pointer</>) to the original document. A collection of document

      abstracts may be used for purposes of search and selection (e.g.,

      <e>Chemical Abstracts</>, published by the American Chemical

      Society and also available in digital electronic form).



     3.4.1.3 Via Table of Contents



      Access via a list of parts contained in a document, such as

      chapter titles or articles in a periodical, with references by

      page number or other locator to the starting point of the

      particular part, usually ordered by sequenced groupings of the

      order of appearance. Collections of tables of contents may also be

      used for search and selection purposes.



<newpage id=52>

      Other parts of documents that may be used for search and selection

      purposes include:



      3.4.1.4 Via List of Figures, Tables, Maps or Other Illustrations



      Access via a list of those parts of a document that are either

      figures, tables, maps or other illustrations, respectively, with

      location reference by page number or other locator, usually

      ordered by location of appearance within the document. Figures,

      tables, maps, etc. may be listed separately. Usually, in a

      document, these lists follow the Table of Contents in some order.



      3.4.1.5 Via Preface



      Access via a note preceding the body of a document that usually

      states the origin, purposes, and scope of the work(s) contained in

      the document and may include acknowledgements of assistance. When

      written by someone other than the author(s) of the document, the

      preface is more properly termed a <e1>foreword</>.



      3.4.1.6 Via Introduction



      Access via the material that heads the body of a document and that

      provides an overview of the work that follows, or other

      introductory material to the text.



      3.4.1.7 Via Index



      Access via a systematically ordered collection of words or other

      terms or objects [27] contained within a document, with references

      by page number or other locator to the placement of the object

      within the document for purposes of accessing the object. The

      index is usually placed last in a document.



      3.4.1.8 Via Citation



      Access via <e1>reference</> to a document or to a part of a

      document, such as an <e1>article</> in a journal (1.2.6). A

      <e1>bibliography</> is a collection of citations directed to a

      specific purpose, such as a <e1>subject bibliography</> or a

      bibliography of citations appended to a journal article.



    3.4.2. Full (or Partial) Document Access



    Full Document or full text searching is where the full text of a

    collection of documents is stored, and the entire text of all or

    portions of the documents is searched for specific character

    strings, usually combined with some Boolean logical searching



<newpage id=53>

    capabilities. This requires that the document be textually encoded

    (3.3.4.2) either because it was initially created that way or

    perhaps more likely in the context of preservation because such

    textual encoding was obtained from scanned document images (3.1.5.1)

    with internal character recognition (3.2.5). Thus, for example, a

    search may consist of searching for all documents in the collection

    published by a given author or set of authors between certain dates

    containing the text "all that glitters." Full text searching is

    normally implemented on computers. For other than small collections

    of documents, a given search may be very costly in terms of computer

    processing time.



      3.4.2.1 Via Inverted Text File Index



      The use of <e1>Inverted Text Files</> (or other similar

      techniques) is often used as a compromise between indexed and full

      text searching. A file of words (<e1>Keyword</>), phrases (</>Key

      Phrase</>), or other text objects contained in a given collection

      of stored documents is created from an initial analysis of the

      full text together with locators as to where all instances of the

      word, phrase, or other object can be found within the file. In

      use, instead of the full text being searched for all occurrences

      of the object, [28] the inverted file itself efficiently gives

      pointers to the locations. The construction of such an inverted

      file, however, may be expensive for large collections of

      documents, as would adding new words or other objects [29] to the

      file at a later date. Furthermore, the use of the file is only as

      good as the care that has been given to the choice of objects to

      be contained within the file.



    3.4.3. Compound Document Access



    <e1>Compound</> documents are documents that contain both textually

    and other forms of encoded information, including image (see 3.3.4).

    Techniques are being developed for expanding the concept of text

    searching to searching of full compound documents, including those

    containing image objects [30]. A full glossary of such techniques,

    however, is premature and beyond the scope of this document.



<newpage id=54>

<graphic status=omitted>

  3.5.    Distribution Technology



  <e>Distribution Technology</> refers to the technology used to

  distribute or deliver the stored encoded document from one point to

  another. Some form of <e1>delivery service</> may be used (3.5.2), or,

  if the medium is paper, it may be distributed using point-to-point or

  distributed FAX (3.5.3). On the other hand, if the medium is digital

  electronic, then either the document may be converted to paper, by

  <e1>"printing-on-demand"</> (3.5.4) and subsequently distributed using

  delivery services or FAX, or <e1>data networks</> (3.5.5) may be used

  for distribution to a <e1>computer workstation</> (3.6.2), possibly to

  be converted to another medium, such as paper, at the point of

  delivery (see 3.6.1).



    3.5.1. Distribution Medium



    The <e1>Distribution Medium</> is the medium used to transport the

    stored encoded document to the presentation or viewing device

    (3.6.2). The same media that can be used for original documents

    (1.1) can also be used as distribution media.



      3.5.1.1 Paper (see 1.1.1)



      3.5.1.2 Microform (see 1.1.2)



      3.5.1.3 Video (see 1.1.3)



      3.5.1.4 Film (see 1.1.4)



      3.5.1.5 Audio (see 1.1.5)



      3.5.1.6 Digital Electronic (see 1.1.6)



      Whichever technology is used for storage (3.3.1), digital

      technologies may usually be used as the medium of distribution, as

      contrasted with using delivery services (3.5.2) to deliver the

      document. Paper, for example, can be scanned and transmitted by

      FAX (3.5.3) or across data networks (3.5.5). The only exception to

      this at this time is video, which



<newpage id=55>

      is normally distributed by <e1>analog</> electronic distribution

      networks (as opposed to digital -- see 1.1.6), because of the high

      information capacity (<e1>bandwidth</>) required. As the bandwidth

      of data networks grows, however, it is anticipated by many

      technologists that analog transmission will yield to digital

      transmission even for video recordings. Films, too, are often

      transmitted by converting them to video recordings (with some loss

      of quality at this time), and transmitting them across analog

      video networks.



    3.5.2. Messenger Services



    <e>Messenger Services</> refers to the use of local, regional, or

    national messengering or mail services to hand-deliver documents

    from the point of inventory or storage to the patron or consumer.

    One special case of this includes the patrons performing the

    messengering services for themselves by viewing the document, or by

    directly acquiring it (purchasing or borrowing), at or from the

    location of the document's storage.



    3.5.3. FAX



    <e1>FAX</> or <e>Facsimile Transmission</> is a system of

    communication or delivery for paper documents or other graphics

    material in which a special digital image scanner (3.2.3) scans the

    pages of the document, compresses the scanned image using CCITT

    Group Compression (3.3.2.2.1), and transmits the digital signals by

    wire or radio to a FAX receiver at a remote point. The FAX receiver

    decompresses the signals received and prints the digital image on

    paper. FAX transmission is a point-to-point protocol that is

    normally conducted over voice (3.5.6) or data (3.5.5) networks.

    Usually, scanning and printing devices are relatively slow (about 5

    pages per minute), and the quality is limited. The popularity of FAX

    rests on its simplicity of use and the relatively low cost of the

    equipment. With the rapid growth of installed FAX equipment, FAX has

    recently been extensively used for inter-library loan purposes, and

    is also becoming used for intra- campus delivery purposes.



    3.5.4. Print-on-Demand



    <e>Print-on-Demand</> refers to the capability to print documents

    right at the time they are required by patrons and consumers, rather

    than following traditional norms of printing documents in advance of

    need and coping with the need to distribute and inventory printed

    documents in anticipation of demand. This approach to distribution

    mirrors the "just-in-time" approach to inventory control.

    <e1>Print-on-Demand</> techniques are normally



<newpage id=56>

    used in conjunction with digitally stored documents (3.3.1.6) and

    data networks (3.5.5). The approach offers the promise of closing

    the gap between the world of digital technologies and those who

    maintain the superiority or simply prefer the characteristics of

    paper documents. Documents may be printed right in the patron's

    office or at a shared local facility from where it is delivered to

    or picked up by the patron.



    3.5.5. Data Networks [31]



    A </>Data Network</> is a communications network that transports

    data between and among computers and computer workstations

    (<e1>network nodes</>). Such networks may depend upon different

    physical media to transport the encoded digital signals (twisted

    pair copper wire, coaxial cable, fiber optic cable, satellite, and

    so forth); different protocols to encode the signals; and different

    ways in which the encoded signals are interpreted for use in

    applications. They also include bridges, routers, and gateways for

    connecting different media and for translating one protocol into

    another. Data networks vary considerably in speed and capacity,

    depending upon the physical media, the protocols used, and the

    particular architecture of the network. Network speeds and other

    performance characteristics appear to be more than doubling every

    two to three years.



      3.5.5.1 Local Area Network A



      <e1>Local Area Network (LAN)</> is a data network used to connect

      nodes that are geographically close, usually within the same

      building. In a wider view of a local area network, multiple local

      area networks are interconnected in a geographically compact area

      (such as a university campus), usually by attaching the LANs to a

      higher-speed local backbone.



      3.5.5.2 Wide Area Network



      A </>Wide Area Network (WAN)</> is a data network connecting large

      numbers of nodes and LANs that are geographically remote, such as

      within a broad metropolitan area, or between widely-separated

      metropolitan areas. This would also include regional networks,

      such as NYSERNet, which interconnects research and educational

      institutions in New York State.



<newpage id=57>

      3.5.5.3 National Network



      A WAN, or a federation of interconnected WANs, that span the

      nation, such as the NSFNet, BlTNet, CSNet, CREN, and, more

      generally, the Internet and the anticipated NREN (National

      Research and Educational Network). These national networks often

      use a high-speed spanning national backbone to interconnect

      regional WANs. Protocols are established to facilitate routing of

      information across the national networks to users at connected

      nodes. The national networks often have international connections

      and outreach.



    3.5.6. Voice Networks



    <e>Voice Networks</> are local, national, or international networks

    used to carry voice or telephone traffic. They may be either analog

    or digital (see 1.1.6). Because of different technical requirements,

    the transmission of data and voice usually is conducted using

    different transmission protocols, although it is increasingly common

    to share the same wiring plant. In general, there is increasing

    integration between the voice and data milieus.



    3.5.7. Cable Networks



    <e>Cable Networks</> are local, regional, or national networks

    normally used for the transmission of analog (see 1.1.6) signals

    such as video (see 1.1.3) television signals.



<graphic status=omitted>

  3.6.    Presentation Technology



  <e>Presentation Technology</> is the term given to technologies that

  present the encoded document to the end user or patron, possibly

  following some conversion of one medium to another. If the storage

  medium is paper, for example, no conversion would be necessary, and

  the storage medium and the presentation medium are one and the same

  (unless the



<newpage id=58>

  distribution technology used were, say, FAX, in which case there are

  intervening conversion processes). If the storage medium, on the other

  hand, were digital electronic (3.3.1.6), for example, and data

  networks (3.5.5) were used as the means of distribution, then the

  presentation technology might be a computer workstation (3.6.2.4) or

  the distributed encoded document could be converted to some other form

  such as paper.



    3.6.1. Presentation Medium



    The <e1>presentation medium</> is the medium into which the stored

    document (3.3), which has been distributed over the distribution

    medium (3.5.1), is converted to facilitate viewing or reading by the

    end user.



      3.6.1.1 Paper (see 1.1.1)



      3.6.1.2 Microform (see 1.1.2)



      3.6.1.3 Video (see 1.1.3)



      3.6.1.4 Film (see 1.1.4)



      3.6.1.5 Audio (see 1.1.5)



      3.6.1.6 Digital Electronic (see 1.1.6)



    3.6.2.  Presentation or Viewing Device



    A <e1>Presentation or Viewing Device</> converts the distribution

    medium (3.5.1) into the presentation medium (3.6.1). This includes

    the class of <e1>computer workstations</> (3.6.2.6).



      3.6.2.1 Paper Document



      A paper document, such as a book, must itself be considered a

      viewing device in this context when the presentation medium is

      paper (3.6.1.1). See 1.2 for a classification of different formats

      for paper documents.



      3.6.2.2 Microform Reader



      A display device with a built-in screen and magnification so that

      a microform (1.1.2) can be read comfortably at normal reading

      distances. Such devices may be accompanied by <e1>microform

      printers</> that can produce full-size (generally low-quality)

      paper copies of the microforms.



      3.6.2.3 Video Projector (Television Set)



      A device used to project or play back videotapes (1.1.3 and

      3.6.1.3) onto a television screen. Normally this is accomplished

      through the use of a videorecorder (see below) and television set

      or <e1>television  projection system</>. However, it is becoming

      increasingly common to play the video back through a computer

      workstation (3.6.2.6), possibly converting the analog signal to

      digital form (1.1.6).



<newpage id=59>

      The term <e1>videorecorder</> is often used to denote a device

      capable of both recording live television signals onto videotape

      and for reading recorded videotapes and transmitting the signal to

      a video projector or television set.



      3.6.2.4 Film, Slide, or Other Projectors



      A device to project motion picture films (1.1.4), still

      photographic slides (1.2.9.3), or other graphic materials (1.2.9)

      onto a screen, and, with some device, to reproduce sound from the

      film soundtrack. <e1>Slide viewers</> enable the user to view the

      slides through background projection on a small screen. Other

      classes of projectors (such as <e>overhead projectors</>) are

      designed to project images recorded on transparencies onto a

      screen.



      3.6.2.5 Audio Devices



      A device capable of playing back audio documents (1.1.5) such as

      phonograph record players, CD players, and tape cassette players.



      3.6.2.6 Computer Workstation



      A device capable of supporting the creation, storage, access,

      distribution, or presentation of digital electronic documents

      (1.1.6), ranging from special purpose devices such as electronic

      typewriters through microcomputers to high-performance engineering

      or desktop publishing workstations or even large mainframe

      computers. They may vary considerably in performance, as typically

      measured by the computer's internal processing speed, storage

      capacity, and ability to move data between its various devices.

      The traditional distinction between a <e1>personal computer

      (PC)</> and a <e1>high-performance workstation</> is blurring, and

      the term workstation is generically used to cover both.



        3.6.2.6.1  Display Monitor



        That portion of a computer workstation used to view digital

        electronic documents. This may consist of a display module built

        into the computer or it may be physically separated from the

        computer, but attached by cable. Display monitors may be

        black-and-white (1.4.1.1.1), greyscale (1.4.1.1.2), or color

        (1.4.1.4). They may also come in varying physical sizes

        typically ranging from about 8" on the diagonal to 23" or more.

        They may also display with varying resolution, with the higher

        (but not highest) performance monitors capable of displaying

        over 1,000 x 1,000 pixels (spots).



<newpage id=60>

        3.6.2.6.2  Local Printer



        A device locally attached to a computer workstation capable of

        printing digital electronic documents stored in the computer

        (3.3.1.6) or distributed to the computer from across a data

        network (3.5.5). Such devices may utilize a range of

        technologies including <e1>impact printing</>, <e1>inkjet

        printing</>, <e1>thermal printing</> and <e1>laser printing</>.

        They may print at varying speeds ranging from 10 characters per

        second to some tens of pages per minute. They may print with

        resolutions varying from several dots per linear inch to several

        hundred dots per linear inch. They may print in black-and-white,

        greyscale, or color.



        3.6.2.6.3  Remote Printer



        A printer (3.6.2.6.2) that is accessible to a computer

        workstation remotely across a data network (3.5.1.6). These may

        typically be higher performance devices than local printers,

        particularly regarding speed or resolution. Such devices are

        typically shared among many uses and users. They may have

        special capabilities for finishing" documents.



        3.6.2.6.4  Other Local Media Output Devices



        Computers capable of supporting multi-media (3.6.2.7) may

        support other "presentation" devices, such as television

        monitors for video recordings (although the trend is to combine

        the television video monitor and the computer display monitor

        into a single "head"), and audio playback devices for sound

        signals, including connections to "hi-fi" stereo equipment.



      3.6.2.7 Multi-Media Workstation



      A computer workstation (3.6.2.6) capable of supporting and

      combining multiple media such as digital electronic, video, sound,

      and paper.



<newpage id=61>

4. SOURCES OF INFORMATION



  Works referenced in the compilation of the Glossary include:



       The A.L.A. Glossary of Library and Information Science. Chicago:

       American Library Association, 1983.



       John Carter. A.B.C. for Book Collectors. New York: Alfred A.

       Knopf, 1980.



       John Dean. A Glossary of Library Technical Terms. Private

       Communication: October, 1989.



       Geoffrey Ashall Glaister. Glaister's Glossary of the Book.

       Berkeley: University of California Press, 1979.



       Nancy E. Gwinn (editor). Preservation Microfilming: A guide for

       Librarians and Archivists. Chicago: American Library

       Association, 1987.



       Dennis Longley and Michael Shein. Dictionary of Information

       Technology, Second Edition. Oxford University Press, New York,

       1986.



       Ray Prytherch. Harrod's Librarians' Glossary, Fifth Edition.

       Grower Publishing Company, Brockfield, Vermont, 1984.



       Matt J. Roberts and Don Etherington. Book Binding and the

       Conservation of Books: A Dictionary of Descriptive

       Terminology: Washington: Library of Congress, 1982.



       McGraw-Hill. Dictionary of Scientific and Technical Terms.

       Fourth Edition, 1989.



       Rosenberg, Jerry M. A Dictionary of Computers, Data Processing,

       and Telecommunications. John Wiley and Sons, 1983.



       Bohdon S. Wynar. Introduction to Cataloging and Classification.

       Littleton, Colorado: Libraries Unlimited, Inc., 1985.



       Webster's New Collegiate Dictionary. G. & C. Merriam Co., 1979.

<newpage id=62>

<newpage id=63>

<index status=omitted>



 Notes



1.See Section 3.1 for a discussion of the use of the term "media

conversion" to replace the use of the term "reformatting." We also

follow the distinction that while media conversion is not a

<e1>conserving</> technology, it is a <e1>preserving</> technology.



2.This analogy was pointed out by Douglas van Houweling.



3.A glimpse of possible implications has already been seen in the

tendency of many libraries to charge patrons for searches of

electronic databases.



4.Harvey Wheeler: "The Virtual Library: The Electronic Library

Developing Within The Traditional Library". Doheny Documents,

University of Southern California University Library, 1987.



5.Some fields, particularly those propelled by the impetus of

commercial endeavors such as medicine, law, and finance, are beyond

the prototype stage and are into full production.



6.Conservation may allow for only partial preservation of the original

document. The bindings, for example, may be replaced while the body

of the document is conserved.



7.Originally, the term "vellum" was restricted to calfskin. The

distinction between parchment and vellum has eroded over the years.



8.The term <e1>digital technologies</> is also used for brevity

throughout this Glossary.



9.The non-technical reader may wish to compare the odometer of a car

(a <e1>digital</> device which quantizes in precise 1/10th of a mile

increments) with the speedometer (an <e1>analog</> device which

displays speed continuously but which can only be interpreted

approximately).



10.  However, <e1>digital</>(ly-encoded) <e1>video</> is now becoming

part of the panoply of technologies, where analog video signals are

converted to digital signals for purposes of storage, transmission

and playback through a computer (3.6.2.6) or multi-media (3.6.2.7)

workstation.



11.  This assertion, however, may not be true in the future. For

example, music is now recorded in digital electronic form, such as

DDD Compact Discs.



12.  Although an increasing number of books are published on other media

(see the Introduction to this Section). This remark also applies to

1.2.3, 1.2.4, 1.2.5, 1.2.6, and 1.2.8. Video magazines and journals,

for example, are beginning to appear. A few books are being

published only in digital form for playback on a computer

workstation.



13.  In keeping with the spirit noted in the Foreword that this Glossary

is intended to be comprehensive but not exhaustive.



14.  The Term "object" is used here in a sense that is more familiar to

computer professionals than to librarians.



15.  Strictly speaking, monotone documents should be termed "monohue".



16.  Copyright law as it applies to the subject of preservation will be

the subject of a forthcoming paper by the Commission on Preservation

and Access.



17.  For a fuller explanation of copyright laws, see "Copyright Basics",

Circular No. 1, published by the Copyright Office of the U S.

Library of Congress, Washington, DC 20559.



18.  See also "Selection for Preservation of Research Library

Materials," a Report of the Commission on Preservation and Access,

August 1989.



19.  The Research Libraries Group, Inc., is a not-for-profit corporation

owned and operated by its governing members: major universities and

research institutions in the United States.



20.  It is tempting to use the term "remediate" for "media conversion,"

a temptation that has been resisted in the formulation of this

Glossary.



21.  For a discussion of the importance of conservation see "On the

Preservation of Books and Documents in Original Form," by Barclay

Ogden, Report of the Commission on Preservation and Access, October,

1989.



22.  For more information see "Technical Considerations in Choosing Mass

Deacidification Processes," by Peter G. Sparks published by the

Commission on Preservation and Access, May 1 990.



23.  The original, or preservation, negative should not be viewed with a

microform reader (3.6.2.2) because of potential damage to the

negative.



24.  Newer processes becoming available appear to remove the obstacle of

high-contrast recording.



25.  Removable disks, such as floppy disks, are also used for archival

storage. However, magnetic tapes are usually cheaper when large

volumes of data are to be archived.



26.  See Footnote 13.



27.  See Footnote 13.



28.  See Footnote 13.



29.  See Footnote 1 3.



30.  See Footnote 13.



31.  The Technical Assessment Advisory Committee of the Commission for

Preservation and Access is preparing a report on the implications ot

data networks.



.