Saturday, August 21, 2010

2010/08/18-19 - goodbye cataloging, hello serials & monographs

The staff meeting Wednesday morning went very well. Aaron, Paula, and Helen presented their plans to merge the acquisitions and cataloging groups into a single unit. Those of us in cataloging will move to new cubes in the acquisitions area of the building by the end of the year. The new unit will be managed as two teams - a monographs team will report to Helen, and Paula will lead the serials and electronic resources team. We all expect our monographs work will continue to shrink over time, but combining acquisitions and cataloging is a good step in streamlining library technical services.

Otherwise I've spent most of the last couple days working on a few patches to the vufind tools. I spent most of the time tracking down the reason why our vufind server does not properly filter connecting diacritics so that a search for vtoriaa finds Russian language records like this. I eventually discovered that vufind's underlying Solr server has a filter system that handles that kind of thing, and that the unicode filter that indexes versions of unicode tokens without diacritics was not properly handling combining halfmark characters. I built an AuUnicodeFilter copy of the unicode filter that just adds a switch-statement block that checks for the combining half marks; problem solved!

Tuesday, August 17, 2010

2010/08/16-17 - patching code

I've spent the last several days on a series of small tasks that added up to take all my time. I checked in several patches to small bugs in the AuCataloging tools, worked on the ACES-project metadata import tool, setup a d-space collection on repo for a professor in computer science to experiment with, installed a valid SSL certificate on repo, .... stuff like that.

Tomorrow there's a big meeting for everyone in the systems, acquisitions, and cataloging groups where Aaron will present some vision of the future of technical services. I'm looking forward to it - technical services can use an overhaul. The cataloging group shrank a lot over the last year with Harriet retiring, Henry's retirement and death, Tom's unexpected death, and Lori leaving for a new job. Then a few weeks ago three women in cataloging were told out of the blue that they would begin reporting to the circulation group. The move was perceived by staff as a ham handed decree from library leadership, and one of the transferred women was angry enough to go ahead and retire rather than accept the change. I'm sure more personnel moves are in store in the near future considering the libraries' budget constraints, but a meeting to present and exchange ideas with staff will help avoid bad feelings.

Tuesday, August 10, 2010

2010/08/09-10 tweaking vufind

I'm finally back at the library after three weeks away. Three weeks sounds like a long time, but it went fast.

I spent most of today and yesterday experimenting with the Solr index that backs our vufind catalog. Last week the library finally decided to make vufind our default online catalog rather than the older Aubiecat Voyager OPAC.

A few small vufind bugs have popped up that we can deal with, but one big problem we've had is that vufind's Solr server would periodically run out of memory, and require a restart. We've tried several things over the past week (I lent some help from home last week) including moving Solr to a 64 bit Solr and allocating a 5 GB heap, and we also tweaked the Solr cache configuration, but the memory problem persisted.

Last night Clint noticed that the Solr memory use spiked when he ran a title-sort on a search result, so we've been looking at sorting since then. It turns out that Solr's lucene index engine uses a "field cache" to implement sorting, and the cache size is proportional to the number of unique entries in the sort field and the size of each entry. Our catalog has over three million unique titles, so it's very expensive to process a title search. I experimented with a solution that just post-sorts the first part of a relevance ordered search outside Solr, but Clint decided to just disable the title-sort as it's not a critical feature. Problem solved - hopefully!