November 14, 2013

Google wins digitalization case

Today, Judge Denny Chin ruled in favor of Google in what may be a landmark case that would enhance Fair Use for digital items. Google argued that scanning in books and publishing 'snippets' of the books online (over 20 million and counting) was within the realm of Fair Use, an argument accepted by the Court. Judge Chin explicitly mentioned that the benefit of having the books digitized, stating that "Indeed, all society benefits".

The case, which began in New York in 2004 (found here) has been a veritable rollercoaster. The ruling, which the Author's Guild said it would appeal, is a victory for not only Google, but for libraries and researchers that would use these scanned books as research aids. Google only puts certain portions of each scanned book online, and has so far scanned in over 20 million books. With that number of books already scanned, Google estimated it could owe the Author's Guild over three billion dollars, at roughly $750 dollars per book, if they had lost.

Judge Chin drew on a previous case that that also saw the Author's Guild claims dismissed. In October 2012, Judge Harold Baer dismissed a case against HathiTrust, a partnership between five research-heavy universities (of which University of Wisconsin is a member), on very similar Fair Use grounds.

The Author's Guild will appeal the decision in both the HathiTrust and Google cases, arguing that both institutions have violated copyright and far exceeded the bounds of a Fair Use defense by instituting mass scanning. Judge Chin's ruling found that the scanning not only was beneficial to the public as a whole, but also a transformative work, meaning that copyright was not violated, but rather would likely boost sales instead of impede them.

To read more about this decision, check out the write-ups from Reuters, BBC News or the New York Times.

November 13, 2013

The University of Wisconsin Digital Collections preserves slices of history

Want to jump in a time machine? The UW Digital Collections (UWDC) is the place to do it. Over the past twelve years, the UWDC has digitized thousands of images and other media from Wisconsin and around the world. One element of librarianship is preservation and it is always exciting to see such wonderful and unique images find a home in an increasingly digital world.

Check out the UW Law School Cane Toss from 1955, or perhaps view German propaganda about Nazi ambitions with a 1938 poster about the Anschluss. These are only a few of the images that I found by browsing various collections. Warning, it is highly addictive finding out what images are on the next page!

The UWDC is not unlike walking through a gigantic museum or archive. History buffs, either casual or serious, will enjoy spending time in these digital 'halls'. It is a fascinating (and free) way to discover the past. Are there any eras of history or specific events that you feel haven't been preserved as well as they should be?

December 12, 2011

CAPTCHAs Being Used to Help Digitize Books with Poor OCR Accuracy

CAPTCHAs are those distorted letters that you have to enter after some internet transactions to verify that you're actually a human.

I recently learned that some CAPTCHAs are being used to help digitize old printed material by asking users to decipher scanned words from books that computerized optical character recognition failed to recognize. That is very cool.

Science Magazine reports that:

Whereas standard CAPTCHAs display images of random characters rendered by a computer, reCAPTCHA [from Google] displays words taken from scanned texts. The solutions entered by humans are used to improve the digitization process. To increase efficiency and security, only the words that automated OCR programs cannot recognize are sent to humans.

This illustration from the Science article helps demonstrate how it works:
The article explains:

In this example, the word "morning" was unrecognizable by OCR. reCAPTCHA isolated the word, distorted it using random transformations including adding a line through it, and then presented it as a challenge to a user.

Because the original word ("morning") was not recognized by OCR, another word for which the answer was known ("overlooks") was also presented to determine if the user entered the correct answer.

For more information, see the reCAPTCHA page and the Science Magazine article.

May 19, 2009

WiLS Offers Digitization on Demand of Public Domain Materials

WiLS (Wisconsin Library Services) has recently announced a new Digitization on Demand service. This service will provide complete digital copies of works from UW Madison Memorial Library's Special Collections and the Mills Music library that are within public domain.

Works will be scanned in their entirety for a library patron to use at their point of need, but the digital copy of the work will also be moved to the Digital Collection Center. Once the works have gone through processing with the Digital Collection Center, they will be linked to in the local OPAC and be hosted at the level of Google and the Hathi Trust to further future access to the work. The cost for this service will be paid for entirely by the requestor.

For more information or to request that an item be digitized, see the WILS web site.

April 17, 2009

Google Books & its Implications for UW Madison

The Daily Cardinal has a very thorough article on the Google Books initiative and its implications for the UW Madison campus.

The article discusses:

  • staffing concerns at campus libraries
  • copyright issues and the status of the settlement
  • how digitizing materials will increase access to important scholarly and historical works

Look for quotes from Law School prof's, Shubha Gosh and Anuj Desai.

Thanks to my colleague, Jenny Zook, for pointing me to the article

October 28, 2008

UWDCC Real Estate Collection Offers Consulting Reports from 1960s-90s

The Real Estate Collection is a new resource from the UW Digital Collections Center. It contains materials and examples of commercial work in real estate done by celebrated University of Wisconsin professor James A. Graaskamp and others.

James Graaskamp taught real estate at the UW-Madison from 1964 to 1988 and was chairman of the Real Estate Department from 1968 until his untimely death in 1988. This digital collection contains over 165 of Landmark Research's consulting reports completed between the late 1960s to the early 1990s. There are appraisals, market and feasibility studies as well as other types of research and analysis.

Publisher/Author Settlement Agreement with Google Opens Door for Full Online Access to Millions of Books

A settlement has been reached in the class action lawsuit against Google over access to copyrighted material in Google Books.

From the AP:

According to a statement issued Tuesday by the Authors Guild, the Association of American Publishers and Google, the agreement "will expand online access to millions of in-copyright books and other written materials in the U.S. from the collections of a number of major U.S. libraries participating in Google Book Search."

Under the deal, Google will pay $125 million to establish a Book Rights Registry to resolve royalty claims.

Google suggests how this might change things...

Until now, we've only been able to show a few snippets of text for most of the in-copyright books we've scanned through our Library Project. Since the vast majority of these books are out of print, to actually read them you'd have to hunt them down at a library or a used bookstore....

This agreement will create new options for reading entire books (which is, after all, what books are there for).

  • Online access - Once this agreement has been approved, you'll be able to purchase full online access to millions of books. This means you can read an entire book from any Internet-connected computer, simply by logging in to your Book Search account, and it will remain on your electronic bookshelf, so you can come back and access it whenever you want in the future.
  • Library and university access - We'll also be offering libraries, universities and other organizations the ability to purchase institutional subscriptions, which will give users access to the complete text of millions of titles while compensating authors and publishers for the service. Students and researchers will have access to an electronic library that combines the collections from many of the top universities across the country. Public and university libraries in the U.S. will also be able to offer terminals where readers can access the full text of millions of out-of-print books for free.

See the Google Book Press Center for the text of the agreement and other related documents, including the Library Opportunities from Google's agreement with Authors and Publishers.

October 6, 2008

Future of the Legal Course Book

Seattlepi reports on a the Workshop on the Future of the Legal Course Book at Seattle University Law School.

Traditional publishers are confused about what professors want and where the industry is going....

Teachers want more flexibility, such as the ability to add their own information to text, insert audio files and provide links. They also want more ways to engage students and sought digital copies of textbooks that can be sorted and searched.

See also coverage from the Chronicle of Higher Education, Legal Times and the National Law Journal.

February 26, 2008

GPO to Digitize All FDLP Legacy Materials

According to a GPO Request for Information:

The United States Government Printing Office (GPO) plans to digitize the entire collection of legacy materials that have been disseminated through the Federal Depository Library Program. The estimated size of the collection is approximately 2.2 million documents, which amounts to about 90 million pages.

Source: GOVDOCS-L list

February 14, 2008

Printing Public Domain Books on Demand

Tom Mighell over at Inter Alia reports on a cool service called whereby you can order a reprint of a book in the public domain. Here's how it works:

1. You request any public domain book from the Internet Archive or Google Books.

2. The book is processed and submitted to Lulu, a no upfront fee print on demand company.

3. You can order the printed book from Lulu at $1 over cost.

Printed reprints currently are priced between $4.99 and $18.99 depending on the number of pages, and in soft cover, perfect binding. Shipping costs extra.

October 26, 2007

Differing Perspectives on Book Digitization

A couple interesting stories on book digitization crossed my path this week. The first is an article in EdTech about UW Madison's involvement in the Google Book project.

Google this spring began scanning 500,000 of the University of Wisconsin-Madison's 7.9 million library holdings, including collections on American and Wisconsin history, medicine, engineering and genealogical materials. Once the materials are scanned, people can read the university's public domain books online for free. For copyrighted books, Google will show a few lines of text and provide links to find the material in libraries or for purchase in online stores.

UW-Madison is among 27 university and public libraries, including Harvard, Stanford and the New York Public Library, that allow Google to digitize parts or all of their collections. Other libraries, however, have chosen not to jump on board with Google or Microsoft, which also runs a digitization project.

According to an article in the New York Times

Several major research libraries have rebuffed offers from Google and Microsoft to scan their books into computer databases, saying they are put off by restrictions these companies want to place on the new digital collections.

The research libraries, including a large consortium in the Boston area, are instead signing on with the Open Content Alliance, a nonprofit effort aimed at making their materials broadly available.

Libraries that agree to work with Google must agree to a set of terms, which include making the material unavailable to other commercial search services. Microsoft places a similar restriction on the books it converts to electronic form. The Open Content Alliance, by contrast, is making the material available to any search service...

"There are two opposed pathways being mapped out," said Paul Duguid, an adjunct professor at the School of Information at the University of California, Berkeley. "One is shaped by commercial concerns, the other by a commitment to openness, and which one will win is not clear."