Sunday, February 28, 2010

The Myth of Digital Permanence pt. 1

After last week's relative downer, I thought I'd begin this week on a lighter note:



I don't think anyone who works in digital preservation would actually argue that digital documents or their respective media have any more or less permanence than, say, a book or even a glossy magazine, and, as it turns out, the true genius of digital documents (and their greatest enemy) is their portability.  A digital file is not dependent upon its medium in the way a paper document is.  The slow (sometimes not-so-slow) steady degradation of the material of a given text need not necessarily worry you, as ease of copying and transferring a digital document means it will persist, so long as people pay careful attention to the maintenance of the documents themselves and don't assume that once they've been digitized that they are now "safe" from the caprice of a given material medium.


As Digi-man would remind us, it is important, o Brave Reader, to remain ever vigilant in the struggle against Team Chaos, those champions of entropy who would threaten the total degradation of all our precious information!  His first and third suggestions seem pretty obvious: 1) backup your data somewhere safe and 2) be sure to transfer your data consistently from one place to another in order to avoid catastrophic deterioration.  Number 2 is a lot trickier, though, than Digi-man makes it seem.  For those of you not familiar with what metadata are, they are basically standardized codes about information (yes, o Brave Archivists and Librarians, I know you're already aware of this--but bear with me).  So, for example, if a "book" is the piece of information in question, a catalogue entry in a library database concerning that book would comprise metadata.  It is important that metadata be composed of standardized codes, because the whole point is for said data to describe what your information is and how it is to be used.

That's all well and good, except digital files aren't exactly the same as physical documents, despite the fact we use the language of print and physical texts to describe them.  Digital documents are fundamentally bifurcated in a way physical texts by their nature are not.  What do I mean by that?  With a book (or any physical, print document) the "text" (I refrain from saying "information," because I'd rather not reinforce the neo-Platonic notion of a text as something beyond its material manifestation) is coeval with that which presents it graphically.  In other words, they are the same thing.  With a digital document, the file and the software that "reads" it have no necessary relationship and, as such, exist independently of each other.  That means should either half of the bifurcated text be lost, the other half would be insufficient to reproduce the document.  Only recently have information theorists started paying closer attention to the loss of digital documents that occurs not as a result of the loss of the file but of the software, the interpretive codes, that are necessary to represent the file in a form that is meaningful to us as human users.  Only last year did the European consortium KEEP (Keeping Emulation Environments Portable) form to tackle just this issue of obsolete formats.

The point that I'd like to come to with all of this is how thoroughly wrought all these problems in informatics are with questions of interpretation.  Archivists, Librarians, Information Theorists of the world, I say this from a place of love, but you deal very facilely with the theoretical concerns that surround interpretation.  Might I suggest you take your local homegrown humanist out for a cup of coffee or something stronger and pick her brain a bit.  After all, these problems are not, in fact new: it was a classicist and a cryptologist who made Linear B (the script of the Mycenean language) readable again, and it was a British scientist and a French philologist who, with a little help from a trilingual rock, brought hieroglyphs back into our collective ken.  Some of us really would like to help, because it concerns us too.

As always, should you, o dear reader, have any suggestions for topics or particular objects to examine, feel free to email us at libralthinking@gmail.com.

Sunday, February 21, 2010

You may be a Mac, but...

In Kathleen Fitzpatrick's talk at the UM library on Thursday the 18th, she laid out what amounts to an introduction to the more extensive work done in her most recent project, Planned Obsolescence, a project that both tries to explain what changes are taking place and some that should take place in the world of scholarly publishing. The online version of Planned Obsolescence is an admirable document both in the issues it takes on (though I should admit now that I have significant concerns over what she says) and how it engages with them. I'll have more to say about the document itself at a later date, but I want to get into a concern Fitzpatrick raises over the obsolescence of readers and formats that produced some of the earliest hypertext novels like Stuart Moulthrop's Victory Garden.



Fitzpatrick mentioned the fact that current versions of the Mac OS (Snow Leopard - rawr!) no longer offer support for the "classic" Mac OS (i.e. from the pre-UNIX days) applications, meaning that hypertext documents written using Storyscape, Eastgate's "hypertext writing environment," can no longer be viewed by Mac users running OS 10.4 or later, even though they possess a license for said document.  Of course, if you use a PC, none of this silliness really affects you.

In my previous post, I quoted Jerome McGann's statement from The Scholar's Art about the invisibility of material media, but I did not quite go into the ramifications of what it might mean to read the materiality of a text or the anxiety that might result from constantly keeping materialist concerns in mind when, really, all you want to do is read a damn book (or newspaper or pamphlet or whatever).  Rather then delve into esoterica, as is my wont, the anxiety of the materiality of texts can best be explained by Apple's recent PC vs. Mac commercials.  Their brilliance lies in the way they tap into people's anxiety (even so-called power users') about personal computers and the degree of acumen they, at least in the early days, seemed to demand in order to simply function properly.  I say this, after just having spent the better part of a Wednesday booting my laptop in Safe Mode and individually deleting registry keys left over from a particularly annoying (but altogether somewhat benign) piece of malware I'd gotten from, well, who knows where.  This commercial in particular emphasizes everything people hate about buying a new PC and how the kind people at Mac have graciously taken the time to instill their machines with an ease of immediate use that is unprecedented in personal computing technology.

The problem with treating anxiety is how you have to go about it: by largely taking control away from the person who suffers by inhibiting their conscious mind and its underlying neurochemistry.  Note: my knowledge of psychopharmacology is about a decade old now, but the most common class of anxiolytics (anti-anxiety medications), Benzodiazepines (like Xanax and Valium), are sedatives and at extremely high dosages can become psychotropic.  I don't want to press this point too far but in treating anxiety, be it with drugs or mass media (of course, some would argue "same thing"), you ever so subtly alter your consciousness.  All of the problems of viruses, backwards compatibility, usability, and portability remain--you now simply filter them out or, rather, they are filtered out for you.

So, you may be a Mac, and I hasten to mention your computer preferences ultimately reflect very little, but your documents aren't.  And PC users don't get to act smug here, because what I'm trying to say is that digital documents aren't anything: aren't PC, aren't Mac, aren't Linux, aren't Commodore, etc.  They are markup, and whatever operating system you choose to use has to decode and "read" them.  In the earlier days (though not the earliest) of PC use (back when that acronym meant "personal computer" and not "machine that runs Windows"), back when to even run Windows you had to enter a command into an MS-DOS prompt, a certain amount of knowledge (or at least awareness) of how one interfaces--how one moves from machine language to some "end result"--was part and parcel of using a computer in the first place.  This is, strangely, analogous to the shift in textual consciousness (as I'm calling it now; hopefully I can think of something better) that took place with "printed" texts.

Ancient authors show a marked awareness of how books were produced, as "reading" handwritten documents largely requires an understanding of how they were written.  In the Roman world lectores ("readers") were either slaves or freedman professionals whose job it was to read texts aloud to their wealthy masters.  What the standardization of print did was to ease the burden of reading text, to remove the anxiety of the technologies of textual production in one of the boldest egalitarian gestures of the modern world.  But in order to relieve that anxiety, it had to render them invisible to the public eye by sequestering them.  This might be acceptable to one such as myself, if the problems of formatting and conversion did not persist in the precisely the same way the problems of variance of print also persisted and continue to.

So, now that I've laid a bit of foundation, next time I will deal more extensively with the bifurcation of digital documents and with the great possibilities such a bifurcation presents in terms of portability but also the problems of backwards compatibility and "obsolescence."

As always, should you, o dear reader, have any suggestions for topics or particular objects to examine, feel free to email us at libralthinking@gmail.com.

Sunday, February 14, 2010

Whose Text is it Anyway?

Welcome!  There are two of us creating Libral Thinking and so with a second voice, the second post commences. I, Colleen, am currently a graduate student in the School of Information, and I am working "in the field" as it were, both sitting at a reference desk, and answering patrons' questions through a chat program on the internet.  I hope to be able to add some insights into the practical and professional side of the messy nature of working in collections at this uneasy "transition" time.  (Is it a transition?  Will we keep the books?  --questions for another time.)  For today I take up the topic of the uneasy relationship for the text and the institution created in the space between the physical book on the shelf in a collection and a digitized version (copy?) on the web with access provided through Google Books.

To use Google Books, a person must type "Google" into a browser of some sort.  Each page of the book scanned by Google bears their mark, seen to the left.  It is fairly clear that it is a document scanned and provided by Google, presented by Google through their interface (and their copyright settlement).

Yet, there is a tricky problem:  The same text is marked very clearly with the institution where a physical copy (at one point?) was housed.  In the case of Essays of an Ex-Librarian by Richard Garnett, 1901, it bears a large bookplate identifying the University of Michigan. 
As a "remote reference librarian" I sit in my living room answering questions that pop-up through chat programs from patrons typing on the library website. I have already fielded multiple questions from patrons around the world asking questions resembling the following:

Patron:  Page 193 is blurry, can you scan it for me and e-mail it?
Me:  There is a button in Google's interface to report the page unreadable.
Patron: Come on...It's just a page. Can't you go get it?  At least read it to me?
Me:   No.
Patron:  Pretty please?
Me:  No, but you can request the book through Inter-Library Loan.
Patron:  But, it's just a page, and I'm in Poland.  That would take a month and I'd have to pay.
Me:  No.

Okay, okay, some of that is a lie already.  In fact, I have, on my own time, requested books to be brought over from the Buhr storage library on my own account, and I have photographed the requested page with my own digital camera and e-mailed the photo to the patron.  There is no policy for handling such requests since they are not frequent enough as of yet to be a problem.  In general, our affiliated patrons come first, but with any time left I can choose to help.

What should the response be to these types of questions?  Whose text is it when the digital scan belongs to a company providing access, but yet it never loses its association with its referent, physically housed in another place, but containing the same content?  The combination creates a lot of assumptions.  The person typing usually assumes that I am in the library, that the book still exists in the library, that it is a few feet away from me, and that it will "just take a minute."  They also assume that it is our (my) responsibility to help them, and not Google.  Maybe that is a good thing?  The library is seen as more available, and helpful than Google. The physical item in this case is more accessible since a bad scan cannot be re-scanned.

But, what do I do though when there are three other chat windows open simultaneously and it's the second Google verification request of my shift?  As the face of a "public" research university, who is my patron?  Should I have levels?  Shouldn't the students come first?  How does a local or state institution fund reference services for a public that through those bookplates and our availability on the web becomes a global audience?    Does the whole library become a reference for the digital in that we should make plans duplication charges like for a closed-stack special library?  Are physical books now a "Special Collection" in relation to the digital?

These questions cannot just remain fuzzy since the digitized texts constitute the memory of our culture.  In some sense we all own them since the memories in the texts lie in the spaces between people, sustained by references to the texts each time we read one, think about it, and add it to our store of knowledge that we share with others as we live and chat over coffee.  For now (most of) the physical books that Google references remain in the libraries, but many people unable to see the blurry details behind "the cloud" call for the practical destruction or selling-off of the books and their pricy buildings and air-conditioning as a practical necessity, not seeing the physical servers, fiber optic cable, and software with quick obsolescence upon which a digitized collection such as Google Books collection relies.  We must define our relationship, and take ownership and responsibility for it, or just like the stack of corroded 51/4 inch floppy disks that hold the games my father programmed for me that sit next to a failed Commodore PET disk drive in my closet, the entirety of our cultural memory could become inaccessible while we pay attention to shinier things.

Sunday, February 7, 2010

"I went looking for a book"

 

Yesterday, I went looking for a copy of Mabel Loomis Todd and Thomas Wentworth Higginson's edition of the poems of Emily Dickinson.  It was cold out, as all Michigan dead-of-winters happen to be, and, after finding the catalog entry I was looking for (or rather a close enough approximation thereof), I was loath to brave the wind and the salt-encrusted pavement just to check a reference in a book I would likely not end up using anyway.  But what luck!  The ongoing efforts of the Hathi Trust and Google Books (in alliance with various university libraries, namely yours truly, the University of Michigan) foresaw my laziness and the intellectual inertia it would inspire, so they went ahead and digitized the 1901 edition of Poems by Emily Dickinson (second series) so that I could satisfy my momentary whim from the relative discomfort of my home desktop.  I thus confirmed my suspicion about the marked absence of a particular hyphen (haha!) and, now with plenty of free time on my hands, began wandering through the digital document that had saved me from the compulsions of my own sloth.  This is when I noticed something peculiar.


If you, dear reader, would like to follow along with story time, the link immediately above will bring you to where we will begin.

According to the Hathi Trust's FAQ, a "missing page" designation can mean any of three things.  1) "Pages were missing from the library's print copy of the book;" this is doubtful given the "page" in question is the cover.  2) "One or more pages were not scanned;" well, duh, but that doesn't give any indication as to why.  3) "In some cases, Google will misidentify a page, leading them to believe that a page is missing when it is not;" again, this is highly unlikely here given the "misidentified page" would be the cover.  I make no claims about the intellectual acumen of the fine employees at Google, but I'd like to believe they wouldn't misidentify the cover.  Of the three options, I obviously favor "was not scanned," but this statement amounts to a completely unexplanatory *shrug*.  Irritated with a wholly unnecessary stimulation of my pendantic curiosities, I got up from my desk, took a shower, got dressed, and left to trample the mile or so of salt-rimed pavement between my apartment and the Harlan Hatcher Graduate Library where the "library's print copy" (why not just say "original" - another curiosity...) is housed.

You won't find 828 D553 Ser. 2 1901 on the shelves of the third floor of the south stacks, and not because it went mysteriously missing.  Leveraging one of the few privileges I still retain as adjunct faculty of our fine university, I took 828 D553 Ser. 2 1901 down to the circulation desk and checked it out.  It's a peculiar object, 828 D553 Ser. 2 1901--you may wonder why I don't simply refer to our poor little book as the 1901 edition of Poems by Emily Dickinson, as I do above.  The simplest and most honest answer is that it is not 1901 edition of Poems by Emily Dickinson but a facsimile produced the book preservation and conservation unit of the university library.  How do I know this?  I should begin by pointing out that to hold the book and see it, this fact is obvious.  It is significantly larger than every other edition on the shelves, its cover is much newer, the paper is much newer, and it bears a bibliographic code on page 3 that makes clear this is a facsimile:

grad
31063846
repla
6/17/98
repl

"grad" for graduate library; "31063846" for... honestly, I don't know (perhaps a work order #?); "repla" for replacement; "6/17/98" for June 17, 1998, when, one supposes, the work was completed; and "repl" again for replacement.  One does wonder why this needed to be said twice.  If you've been following along, dear reader, you may have noticed that in the Hathi Trust digital document, this bibliographic code on page three is nearly illegible.  In point of fact, nearly all of the non-"textual" codes that point to the "original" of the digital document being itself a facsimile have been eviscerated.  If it weren't for my own pedantry and for my deep love of old-fashioned print texts, it would have been quite difficult to uncover in the digital document those clues that point to what with the physical book is patently obvious.  You would have to be an inordinately thick moron not to see that the very book I held in my hands (and on that, o dear reader, you will have to take my word!) was not published in 1901 and could not be the historical document all the digital bibliographic codes (including the library's own catalog entry) claim it to be.

Of course, I'm being quite melodramatic; the Google version of the very same digital document (which, oddly, should be the same digital document) retains the bibligraphic code in a completely legible form on its own page 3.  But what this whole experience is meant to unconceal, as Heidegger would say, are all those aspects and conditions of texts we ignore.  As Jerome McGann says in The Scholar's Art (p. 136), "The physical object… is coded and scored with human activity.  An awareness of this is the premise for interpreting material culture, and the awareness is particularly imperative for literary interpretation, where the linguistic 'message' regularly invisibilizes the codependent and equally meaningful 'medium' that codes all messages."

What exactly is a text? What are texts becoming in an age where digital reproduction not only promises to provide access to and new tools for understanding documents from around the world but also threatens to use digital reproductions as an excuse to disregard the materials that heretofore serve as our connection to the textual past? We here at Libral Thinking wish to explore the ramifications of this future in digital media both for the materials to be digitzed and for the digital texts themselves, to emphasize the continuity between physical and digital texts rather than the facile contiguity of the "print is dead" crowd.  Colleen and I (Nicholas) will be bringing you at-least-weekly articles on the future of the book in digital environments with a particularly philosophical and theoretical bent but always with an eye to the practical ramifications of the theoretical. Welcome!

Should you, o dear reader, have any suggestions for topics or particular objects to examine, feel free to email us at libralthinking@gmail.com.