Abernethy Lecture -- Middlebury College April 27th, 2001

Technology in Libraries, or Plus ca change, plus c'est la meme chose

Middlebury College Abernethy Lecture
April 27th, 2001
Michael S. Lynch

My talk is going to focus on technology in libraries, partly because that is what I know, but mainly because technology will be an important, if not the dominant theme in libraries for the foreseeable future. This is not really anything new; some form of technology has always had a significant effect on libraries. Ralph Parker noted that "At the time of the organization of the American Library Association in 1876, [...] the telephone and the typewriter were not yet accepted tools of library operation, and skepticism of their value was widely expressed." (Parker, p. 195)

Even if I limit myself to electronic technology, it's still not absolutely clear when that begins. The foundation of all automated information retrieval is often traced to the philosopher George Boole, who, in 1854, wrote An Investigation into Laws of Thought, on Which are founded The Mathematical Theories of Logic and Probabilities . In that book, Boole introduced the eponymous and now famous (at least to librarians!) Boolean operators. Some eighty years after that, a researcher at MIT named Claude Shannon first applied Boolean principles to the design and programming of electrical circuits. And about 15 years after that a government contractor named Mortimer Taube developed something called coordinate indexing which allowed database searchers to use Boolean operators to retrieve specific document sets. (Smith, 1993)

Another event that many people choose as a sort of starting point for technology in libraries is the landmark article by Vannevar Bush titled As We May Think in which he describes the mythical MEMEX device. This article first appeared in Atlantic Monthly in July, 1945, and is now, of course, available on the web. Michael Lesk used Bush's article as the 'birth date' in his article titled The Seven Ages of Information Retrieval , which compares information retrieval to Shakespeare's seven ages of man, although he changes the names of some of the stages. For example, he prefers the term fulfillment rather than the age of a comic character, and retirement rather than senility! (Lesk) Let's hope he's right on that point at least.

Well, I'd like to propose November, 1966 as my 'starting date for technology in libraries.' That's when "The Library of Congress' Project MARC experiment became fully operational" [and] "the participating libraries received the first of the weekly magnetic tapes." "Open distribution of tapes in the MARC-II format [...] is scheduled to begin sometime during the latter part of 1968." (Griffin, 1968) For the non-library folks in the audience MARC stands for Machine-Readable Cataloging, and the weekly tapes contained Library of Congress cataloging records.

Now, I don't have to point out that 1966 and '68 were a very long time ago, especially when one is talking about computers. We have a series of reference books here in Starr Library titled Day by Day, and there are volumes for each decade, Day by Day: The Forties, and Day by Day: The Fifties, etc. So I looked in the index of Day by Day: The Sixties , (Thomas Parker and Douglas Nelson) under the word computers, and I found only six entries for the entire decade. One entry, from October 1967, reads, "Reports indicate that the U.S., the Soviet Union, and European countries have agreed to computerize and share their atomic energy research." And in fact, on the first page of the review article which told me about the Library of Congress' Project MARC experiment, there is a note stating that the work of that review article was "supported by the U.S. Atomic Energy Commission."

My point here is that libraries were among the very first groups to adopt computer technology. They used it to automate the obvious routine operations, such as circulation. In 1969 for example, it is reported that "the Lawrence [Radiation Lab] circulation system contains a machine-readable record for each title, [...] on magnetic tape. There are two daily printouts, one arranged by call number and the other by borrower's name. [...] Lawrence programs run on an IBM 1401, with 8K storage, and an IBM 7094 doing some sorting." (Kilgour, 1969) We all know that computers were very large and very expensive in 1969. Just how expensive? Another entry for computers in Day by Day: The Sixties states that "Students at St. George Williams University in Montreal smash a $1 million computer to protest alleged 'racism'."

But to my mind, far more important than the early circulation systems was this development of the MARC format. Because it demonstrates that quite early on, libraries understood that the full and accurate description of bibliographic items was a very complex problem, even for computers. Computer programmers and system designers would ask questions like, "How long does the title field have to be?" or "What's the maximum number of authors for an item?" or "What's the largest possible record in this database?" And they certainly didn't like the answers they would get from librarians: "The title field has to be long enough to accommodate any title." and "There is no maximum number of authors for an item, it's however many authors the item has." and "I don't know what the largest possible record in this database will be, how much space can I get?" Nevertheless, the MARC format was written to accommodate these and other uncertainties. And it would later expand to include other formats, such as music, archives and manuscripts, etc. The fact that from within our online catalog one can click on a link and connect to an electronic journal or book that is actually located somewhere else, is due in part to the fact that MARC was modified about 6 or 7 years ago to include a place for the URL.

Now, the MARC format is far from perfect. In spite of the changes just mentioned, it is still not very well-suited to describing archival materials, and it's not all that great for visual materials for example. But what it provided then -- and what it still provides today, 35 years later -- is a standard method of interchanging machine-readable data. And I want to focus for a few minutes on what that has meant for libraries.

"In 1967, the presidents of the colleges and universities in Ohio founded the Ohio College Library Center (OCLC) to develop a computerized system in which the libraries of Ohio's academic institutions could share resources and reduce costs." In 1971, OCLC introduced an online shared cataloging system. Their initial database was built using those same magnetic tapes of MARC records that were being distributed by the Library of Congress. Before long, they allow libraries outside Ohio to become members of the consortium, (1977) and change their name to the Online Computer Library Center, Inc. (1981) (From a History of OCLC. )

By 1980 we read that "To this base of LC cataloging data, the users of OCLC, Inc. (i.e., libraries like Middlebury) have been adding records at a phenomenal rate, to the point where it now consists of over five million records." Also, "The U.S. Government Printing Office (GPO) now has its records in machine-readable form. The cataloging of the National Library of Medicine (NLM) is in machine-readable form." Not only that, but the Library of Congress "has developed an authority record format [and] has determined that there were 757,431 unique headings in 1,930,310 records." (Veneziano, 1980) Again, for any non-library folks in the audience, an example of a unique authority heading might be Mark Twain. His real name was Samuel Langhorn Clemens, most of his books were written as Mark Twain, but he also wrote some things using the name Quintus Curtius Snodgrass. There would be only one authority heading for this one person who used three names.

At about the same time that Project Marc is underway -- the late 1960s and early 1970s -- the National Library of Medicine is also developing their MEDLARS database, and Dialog Information Systems is selling access to their online index and abstract databases. To give some idea of the scale of these efforts, in 1969 the Excerpta Medica Foundation announced that "Over 200,000 citations and 80,000 abstracts are processed and stored annually in the[ir] database." (Keenan, p. 292)

So, by 1980, libraries understand the complexity of bibliographic records, and they have developed the MARC format to accommodate it. They have also begun to realize the importance of scale, since they have very large bibliographic and citation databases, and large numbers of widely dispersed, simultaneous users. They have developed software that will efficiently search and retrieve information from such databases. And they have begun to develop automated authority control over these databases as well.

Now, how are all these changes perceived by librarians? Well, there's a range of reactions not unlike what one might hear today. Some are optimistic about the potential of the technology. In 1968 Hillis Griffin predicts that "Increased disc-file capacities, new mass storage devices, [...] large bulk memories, [and] higher-speed [...] printers [...] promise to fill many of the needs of library system designers." (Griffin, p. 258) Others are more apprehensive: "Systems and networks [have] proliferated, invading libraries and pervading departments that [...] had been immune to their effects." (Veneziano, p. 109) And apparently some librarians are getting a little bit testy about certain predictions they have been hearing. Kenneth Bierman in 1974 is moved to write: "Gone are the proclamations that a revolution in libraries is just around the corner and that if librarians don't jump on the automation bandwagon [...] libraries, and therefore librarians will become extinct!" (Bierman, p. 157) It all sounds vaguely familiar, doesn't it? Plus ca change, plus c'est la meme chose.

Just to check the non-library context again, I looked at Day by Day: The Eighties . Needless to say, there are a lot more than six entries for computers in the index. I particularly liked this one from March of 1980:

: "The Associated Press reports that the Pentagon's World Wide Military Command and Control system (Wimex), a computer system designed to warn of an enemy attack or international crisis, is unreliable and unacceptably slow."

Now I'll admit that the databases were designed for completely different purposes, so perhaps the comparison is unfair. But I find it interesting that in the same year that the Pentagon's computer system is being described as "unreliable and unacceptably slow," OCLC has a database of over five million records. It's available to libraries all across the country, using OCLC's own dedicated communications network and their own software. Excerpta Medica is adding hundreds of thousands of citations to its database every year, and providing similar nation-wide access. It's true that these databases are sometimes slow -- I know because in 1979 and 1980 I was searching OCLC, Medline, and Excerpta Medica just about every day -- but for the most part they are pretty reliable.

This pattern of libraries being early adopters of technology continues over the next several decades. As they were at the beginning, government libraries remain in the forefront. In the late 1970s and early 1980s the National Library of Medicine developed the Integrated Library System. This software, which was in the public domain and available for the price of the magnetic tapes, was intended to be used by libraries who wished to automate all aspects of their operations: circulation, public access, cataloging, serials, and acquisitions. At the core of any library's implementation of the ILS was a database consisting of the library's MARC records. I worked for a company that supported ILS installations in the Washington metropolitan area. One of our main customers was the Pentagon Library, but they never did let me have a look at their Wimex system. Throughout the 1980s, various companies developed and began to market their own integrated library systems. Some of these local system vendors, perhaps most notably OCLC, did not fare very well. But others thrived, and as Ron has already noted, Middlebury installed its integrated system in 1986. We are still using that same system today. Although most of the equipment has been upgraded since then, there are a handful of terminals in the library that may be original!

The next big thing was gopher servers. Remember gopher? Developed by the University of Minnesota way back in 1991, gopher was fairly widely adopted by colleges around the country and the world. In many institutions, this marked the beginning of significant cooperation between library and ITS staff. Some of you remember Jim Stuart who worked at ITS. He once told me that his very first assignment was to build the college a gopher server. He said, "Great! What's a gopher server?" Here was someone who had just completed his masters' degree in computer science, and yet had never even heard of gopher. But the college and the library knew about it. Working together with Jeff Rehbach, Jim got Middlebury's gopher up and running. In addition to gopher, some of you may remember Archie, and Veronica, and WAIS and a lot of discussion about CWIS, or campus-wide information systems.

We don't hear too much about gopher or Archie anymore, because the most dramatic change was yet to come. About the same time that gopher is burrowing its way around the world, Tim Berners-Lee, a computer scientist working at the European Organization for Nuclear Research (CERN), invents the World Wide Web for the high energy physics community. Along with Robert Cailliau he develops the world's first Web server and browser. By 1993, the National Center for Supercomputing Applications (NCSA) had developed the Mosaic graphical web browser, on which both Netscape and Internet Explorer are based. By the end of 1994, the Web had 10,000 servers, and 10 million users. Traffic was equivalent to shipping the entire collected works of Shakespeare every second. In 2000, it was estimated that there were over 7 million unique web sites. Clearly, the world had become a very different place. (The previous statistics and those below are from CERN and OCLC

Still, libraries continue to be early and usually enthusiastic adopters of the technology. And the MARC format continues to play an important role in many instances. The Middlebury library is a good illustration. The reference librarians here had learned how to write HTML code and had put up a series of library web pages well before I arrived in 1996. They have since implemented standard templates for all of the library's web pages. My assistant, Barbara Merz has written programs that search through the library's MARC records and automatically build web pages containing hot links to electronic books and journals. Judy Watts has implemented an electronic reserves system. I recently did some consulting with that vendor to help them develop software to export their reserves records in the MARC format. We use FTP to retrieve MARC records every month for all of our government documents, and every week for all of our approval books. The catalog department searches and downloads records from remote copies of the Library of Congress' bibliographic and authority MARC databases every day. Many of our new MARC records are sent out to be enhanced with tables-of-contents information. We are about to begin receiving electronic invoices and sending electronic purchase orders using the Electronic Data Interchange (EDI) standard. Our students and faculty have access to citations and to the full-text of many journal articles in a number of databases located all over the world. And they can access all this information not just from their dorm room or office here on campus, but from anywhere in the world where they can plug a computer into the internet.

OCLC is still in business. Their WorldCat database now holds over 46 million records created by libraries around the world, with a new record added every 15 seconds. The information in WorldCat spans over 4,000 years of recorded knowledge with 400 languages represented.

So, now it's my turn to be just a little bit testy. To anyone who still thinks that libraries or librarians are about to disappear, I would say, as Barbara is so fond of saying, "Pish!" It's not going to happen! Yes, the internet is huge and unwieldy and constantly changing. But there will be an identifiable subset of the internet that is useful for scholars. And librarians will be able to apply what was learned in the 1960s and '70s, with respect to complexity and scale, to that subset of information. Librarians have been working on ways of cataloging the internet almost since the days of gopher and Mosaic.

Now, I've already been talking about the exciting MARC format, and I fully realize that mentioning metadata at this point might be just a little too much excitement for some of you, but that's a risk I'm willing to take. Metadata is defined as data about data. A MARC record is descriptive metadata. A book contains data, and a MARC record for that book contains data describing the book: the title, the names of the authors, the ISBN, the name and address of the publisher, etc. But creating a MARC record obviously requires a trained cataloger to spend a lot of time. People realize that it's inappropriate or even impossible to apply MARC cataloging rules to the entire internet. So a number of simpler descriptive metadata schemes have been developed. Some examples are the Encoded Archival Description (EAD), the Global Information Locator Service (GILS), and the Dublin Core, which gets its name from Dublin, Ohio where OCLC is located. OCLC has a project called CORC, which stands for Cooperative Online Resource Catalog. It's a metadata creation system for bibliographic records and electronic resources. The CORC database currently contains about 380,000 records.

But in our brave new networked world, it turns out you need more than just descriptive metadata. You might want or need to have metadata for terms & conditions of use or other administrative metadata; content ratings; provenance; linkage/relationship to other data; or structural metadata. And so we have organizations like the World Wide Web Consortium (W3C), which has 512 commercial, educational and governmental members. It's a pretty impressive list of members, but they aren't the only group working in this area; you also have Council on Library and Information Resources (CLIR) and the Coalition for Networked Information (CNI) to name just two. Still, W3C will nicely illustrate my point. They are working on the Resource Description Framework, which is a way of combining all these different kinds of metadata, and they have a vision that they call the 'Semantic Web'. I want to just read to you part of the introduction from their web page:

: "[S]tandards, technologies and policies must be designed to enable machines to make more sense of the Web, with the result of making the Web more useful for humans. Facilities and technologies to put machine-understandable data on the Web are rapidly becoming a high priority for many communities. For the Web to scale, programs must be able to share and process data even when these programs have been designed totally independently. The Web can reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people." (See W3C Semantic Web )

Let's see, they're talking about a standard for sharing and interchanging machine-understandable data. Where have we heard that before? You say, "Oh, but this is very complex data, and it's in a wide variety of different formats, and there are already over 7 million unique web sites." I say, "Better call a librarian! They've been there, and they've done that."

So, we librarians will continue to perform our traditional mission of selecting, cataloging, and preserving information, regardless of its format! To that holy trinity of library tasks, we have added one: we also guide our users now. We don't just help them locate their information anymore, we also help them evaluate what they find.

While I'm still in this testy mood, I want to address another claim that some people continue to make, namely that the printed book is going to disappear, and that in the future, everything will be digital. Pish, I say, and Pish again!

Here are a couple of quotes about a technology with which we are all very familiar:

: "From one [of these devices]... it was said, classrooms of a whole nation could learn physics or chemistry. No longer would a school have to rely on its own pitiful supply of laboratory gadgets."; "In a few short [years it] had grown from a toy, to a popular diversion, to a pipeline to millions."

Today, that first quote might sound plausible, if a bit optimistic, but the second quote sounds pretty accurate. The only problem is that in both cases, the speakers were referring to television. Now, I'd bet that many people in this room have a computer at home. And that almost all of you have a television. Most of you also have a radio, probably in your car as well as in your house. And most of you have probably read a newspaper some time in the last couple of days. So, not one of these new technologies has caused any of the older technologies to disappear. Why? Because the older technologies work. A lot of people use them and are accustomed to using them, they're convenient, and they're cost-effective. We have been reading written texts on some form of paper for thousands of years. We are not going to stop suddenly! Printed books are not going to disappear any time soon. Maybe it would be nice and simple if everything were digital. I don't know about you, but the world I live in has not been getting simpler lately, only more and more complex.

[Note: the following paragraph was not read during my talk, in the interest of saving a little time; I went directly to the bulleted list below. --Mike]

And, except for the computer, all of the technologies I just mentioned are analog, not digital. And guess what? Human beings are analog too. We're messy and we're imprecise. But we're also quite creative, and we're very good at 'filling in the blanks,' based on what we know or think we know. Two questions may help show what I mean; I'll give one to each side of the room. (If you've played this game before, don't let on.) 1) How many of each type of animal did Moses take onto the ark with him? 2) Suppose a plane crashes right on the U.S.-Canadian border and half the passengers are killed. In which country should the survivors be buried? Okay, the first side, what's your answer? Two? No, the correct answer is none. Because it wasn't Moses who built the ark, it was Noah. And this side? Doesn't matter where? Well it does to the extent that we're not going to bury the survivors anywhere, we're only going to bury the dead. So, that was a silly little game to illustrate that computers are really good at some things, but that people are really good at some things as well. And one lesson is that if your real question is how many people did Noah take onto the ark, and you say Moses by mistake, you probably want the answer you get to be 'two'. You don't really want to be lectured about getting your biblical characters straight, nor do you want to have to explain to some stupid computer that what you really meant to say was Noah, and please just answer the damn question! What some folks have proposed and begun to work on is to design computers that are more like people instead of asking people to become more computer-like.

What are a few of the specific trends or concerns that I see facing libraries in the future? In no particular order:

Earlier, I talked about Middlebury's gopher server, and the library's first web pages. It's quite possible that there are no traces left anywhere of either of those, or of the College's first web pages, or all of the Bicentennial web pages. And in 40 years, when David Haward Bain is writing a new edition of The college on the hill, he's going to have a problem.
I think copyright in the digital era is going to continue to present problems for people, and probably to make lawyers rich, although some of the metadata schemes I talked about are attempting to deal with authenticity of digital documents.
I think that censorship is a big issue that academic libraries have been able to pretty much avoid, and I hope we can to continue to do so.
I think that privacy is a major concern. I was amazed to read recently that the Vermont site for paying your taxes online allowed people to view returns that were not their own.
I personally don't believe that voice-activated computing is ever going to be very prevalent. Maybe it's because I'm a librarian, but I don't believe that people really want to talk all day to each other, much less to a computer. I have this image in my head of a large office full of people saying things like, "Okay, go back two pages. Now up a few lines. One more. Umm, go left about 5 words. No, the other left! Now delete that sentence. I mean delete that word! Damn! Undo. Undo!"
I think that to the extent that things are published in digital format only, there will be significant issues in the area of inter-institutional cooperation. It's easy to say, well your library will own and store and preserve and maintain and make available to every other college library these titles here, and my library will do the same for these other titles. But libraries and colleges don't have very much experience cooperating in that way; most of their experience in those areas has been competition: "We're better, because we've got more books than you do."
I'm intrigued by research being done in the of mapping of information spaces or providing visual representations of document collections that allow you to easily see if your query will yield a lot of documents or only a few, and which direction and how far you might move to change that.
I think Portals are a good thing whose time has not quite come. Definitely, people want to have their own personalized interface, but I'm not sure we've figured out how to give them all the right tools they need to do that effectively. From what I've read, only a small percentage of users actually take advantage of Portals.
I think that a lot of the services libraries provide in the future will take place outside the building, but the building will still be a necessary and vital place for study, meeting, reading, collaborating.
Somewhere I read that libraries know how to weed their collections, but that we need to learn how to weed our services also! I liked that idea.

I'll close with these thoughts from Fred Lerner in The Story of Libraries. Lerner notes that the Pharos lighthouse in Alexandria was one of the Seven Wonders of the Ancient World. It was "an immense lighthouse, casting a light visible for thirty miles." But the great Library of Alexandria was not considered one of the 'Wonders'. Lerner says that both buildings "were monuments to man's ability to project human achievement across time and space." Libraries have continued the work of the Alexandrine library; the record of human achievement has been preserved and projected across time. In a library, the past is available to us in the present. But he says that our role in the future "may well be that of the Pharos lighthouse -- to help our users navigate the seas of information that increasingly impact on their lives," and thus to project human achievement across space as well as time. If we achieve that, perhaps we will be considered a 'Wonder' of the new world.

Bibliography

Bierman, Kenneth J. Library Automation, In: Annual Review of Information Science and Technology (ARIST) Vol. 9, 1974, pp. 123-172.

Griffin, Hillis L. Automation of Technical Processes in Libraries, In: ARIST, Vol. 3, 1968, pp. 241-262.

Keenan, Stella. Science Abstracting and Indexing Services, In: ARIST Vol. 4, 1969, pp. 273-303.

Kilgour, Frederick G. Library Automation, In: ARIST Vol. 4, 1969, pp. 305-337.

Lerner, Fred. The Story of Libraries, The Continuum Publishing Company, New York, NY, 1999.

Lesk, Michael. The Seven Ages of Information Retrieval. IFLANET, UDT Occasional Paper # 5.

Parker, Ralph H., Library Automation, In: ARIST Vol. 5, 1970, pp. 193-222.

Smith, Elizabeth S. On the Shoulders of Giants: From Boole to Shannon to Taube: The Origins and Development of Computerized Information from the Mid-19th Century to the Present, Information Technology and Libraries, Vol. 12, No. 2, June 1993, pp. 217-226.

Velma Veneziano, Library Automation, In: ARIST Vol. 15, 1980, pp. 109-145.