Free as in Books
Creating an open source audio-navigated book player for my blind gran, found hard-to-browse audiobook project - took matters into my own hands
Picture of open book thanks to Honou@flickr
My gran is losing her sight these days and is unable to read any more. When speculating about a crowdsourced audio version of Project Gutenberg I was excited to find two incredibly valuable community-driven projects; Librivox (a free audiobooks resource) and Rockbox (a free operating system for MP3 players).
Unifying these two free projects could provide a resource for blind readers which far outstrips my gran's few audiobooks which get mailed each month, and the CDs from the library which are impossible for her to navigate.
Here 'free' is intended to mean 'free as in speech, not as in beer'
However, along the way I had to create my own Librivox catalogue to overcome the shortcomings of their problematic browsing interface. The difficulty of browsing in their interface is especially bad when you consider that blind people with screen readers could be a core audience for Librivox.
Librivox and its shortcomings
Librivox is a volunteer maintained collection of audiobooks which could be really relevant for my Gran, who's losing her sight. Some of the narrators could win awards for their paceless monotone, but others are bright and listenable. Sadly, even the one minute-long poems are prefixed with a tedious and mandatory spoken license introduction. Nevertheless, a huge number of out-of-copyright books are already available, and the project invites more contributions.
I'm planning to use Librivox with Rockbox, a project which provides an alternative open source operating system which runs on a huge number of old MP3-players including iPods. These are cheap as chips on eBay.
With audiobook CDs, each time she shuts down the machine, it loses her place. However, you can configure Rockbox to maintains bookmarks (yes they're actually called that), and helps you navigate to your chosen bookmark using audio when the player is powered on again.
Fixing the Librivox catalogue
The problem is that the Librivox resource is accessible only through a tricky web-based database engine, which is oriented for searching rather than browsing. Go try it. It's impossible to see the breadth of the titles they have. If you know what you want, the search interface can find it, but of course you don't know what they have in their database - a catch 22. This problem had to be solved before my gran (or really my mum) would be able to select her chosen titles and put them on the Rockbox player.
XQuery to the Rescue
On further investigation, I discovered that the Librivox files are (very sensibly) hosted on archive.org - a freely-maintained resource of public domain material which has a comprehensive XML database backend.
The task was on to create my own catalog of Librivox which is actually navigable using XQuery. After a bit of thrashing around to find the correct data source, and fix issues with character encoding, I was able to construct a new catalog which anyone can use.
You can navigate and browse the whole resource by author or title, including a page with every book listed so you can easily do text searches within your browser with CTRL-F. This page for example, permits you to crawl and download the whole resource to your local disk in order of popularity, using a tool like wget, but if you're new to using wget, be nice to archive.org and limit the bandwidth and frequency of requests using the --wait and --limit-rate options to wget.
Visit it at http://cefn.com/librivox/
For those of you who like code, here's the core implementation which translates from a normalised XML drop from archive.org into a navigable website. The full source, scripts used and supporting libraries can be downloaded from the front page.