Tuesday, August 10, 2010

2010/08/09-10 tweaking vufind

I'm finally back at the library after three weeks away. Three weeks sounds like a long time, but it went fast.

I spent most of today and yesterday experimenting with the Solr index that backs our vufind catalog. Last week the library finally decided to make vufind our default online catalog rather than the older Aubiecat Voyager OPAC.

A few small vufind bugs have popped up that we can deal with, but one big problem we've had is that vufind's Solr server would periodically run out of memory, and require a restart. We've tried several things over the past week (I lent some help from home last week) including moving Solr to a 64 bit Solr and allocating a 5 GB heap, and we also tweaked the Solr cache configuration, but the memory problem persisted.

Last night Clint noticed that the Solr memory use spiked when he ran a title-sort on a search result, so we've been looking at sorting since then. It turns out that Solr's lucene index engine uses a "field cache" to implement sorting, and the cache size is proportional to the number of unique entries in the sort field and the size of each entry. Our catalog has over three million unique titles, so it's very expensive to process a title search. I experimented with a solution that just post-sorts the first part of a relevance ordered search outside Solr, but Clint decided to just disable the title-sort as it's not a critical feature. Problem solved - hopefully!

No comments:

Post a Comment