Reuben@AU Libraries: September 2010

Sunday, September 26, 2010

2010/09/20-23 - ouch!

I sprained my ankle Wednesday morning when I planted my foot on some uneven ground as my three dogs took after a rabbit, so I spent Thursday at home on the couch, but it was still a productive week. Monday morning I presented the OAI-to-vufind tool I've been working on to several people. Marliese will now take care of maintaining our Content-DM digital library collections in Vufind. Liza heard about the OAI tool on Tuesday, and we were able to harvest records into Vufind from the NASA server she's interested in. She'll get back to me eventually to let me know if she wants to begin setting up some harvests of government documents into our Vufind install or not. I'll need to enable command-line options for OAI harvest, so we can setup cron jobs to automatically harvest new records every week or so.

Otherwise I did various small things this week as usual: patched a bug in Vufind to work aound some bad data in Serial Solution's link-resolver, setup a test for EBSCO's article-level search web services for Marliese, worked through some more of the ACES collection spread sheet, argued with Aaron over Institutional Repository plans, and did a little work on for the Minnows morphology project.

Thursday, September 16, 2010

2010/09/13-16 - fish stories

A lot of small things were going on in parallel this week. I spent some time testing the OAI-to-Vufind tool. I exchanged a few e-mails to help a guy at Purdue giving OAI2Vufind a try. Hope it works for him!

We met with Jon Armbruster on Wednesday to discuss the Minnows Project. I really like Google sites - we'll see if that project page helps us stay organized or not.

Finally, I finished my first pass over my chunk of Claudine's ACES-collection metadata spread sheet. The data is better than I thought it would be. We should be in good shape to import Claudine's collection into d-space with some Voyager metadata next month.

I also got an e-mail from the Tuskegee archive guys. Their DSpace repository is online (or it was yesterday) - looks good!

Thursday, September 9, 2010

2010/09/07-09 - happy labor day

It was a short week for me at the library, but I managed to get a couple things done. First, we setup a spread sheet in Google docs for the ACES collection project where we can manually check that each scan is matched with the right MARC record. I have some code that tried to match scan to MARC by title, but it had a lot of misses that we'll clean up with the spread sheet.

I was also able to use the vufind-import tool to OAI harvest records from ContentDM into vufind's Solr index. I wrote up the details here.

Sunday, September 5, 2010

2010/08/23-26,08/30-09/02 - chugging away

I was absent minded the last couple weeks, and completely forgot to update my bLog. Looking back through my e-mail it looks like I spent most of my time on a few things. First, I helped Clint implement a few vufind patches before he kicked off a new full copy from voyager to vufind.

On Thursday 08/26 Aaron, Bonnie, Midge, and I met with Drew and Iryana from the assessment office to further discuss their plans to implement digital measures to standardize the faculty annual review and assessment process. Auburn University will require her faculty to submit material for annual review to our digital measures server from which department heads, deans, and university management may generate reports and metrics for teaching and research performance for individual faculty on up to the university level.

At the meeting we discussed how assessment's digital measures project might dovetail with the library's nascent plans to implement an institutional repository through which faculty may publish research artifacts for open access by the general public. It turns out that the university's new guidelines for university funded research requests that faculty make their funded research available for open access either through Auburn's own "scholarly repository" (which the library supposedly is developing) or some other online repository.

Anyway, the digital measures project and the open access clause in the funding guidelines have combined to make Bonnie and Aaron very excited that the library quickly establish a repository for faculty research. Some library staff will join Drew and Iryana's group of digital measures beta testers, and I'll setup some collections in our DSpace server on repo that will be ready to accept faculty research papers.

Although I spent a lot of words describing the digital measures effort above, I actually spent much more time over the last two weeks on the tool to marry Claudine's scans of ACES documents with the appropriate MARC records in Voyager from which she wants to pull metadata. Each metadata file from the scanner specifies a title that includes series information like, "Bulletin No. 5 of the Alabama Soils Bla bla: Peanut Performance in Alabama" while the matching MARC record stores the "Peanut Performance in Alabama" part as a title, and stores the series information in other parts of the MARC record. Anyway, from the start I should have just used Lucene or tapped into the Sorl server that backs our Vufind install at http://catalog.lib.auburn.edu, but I thought I already had the set of 1200 or so MARC records to draw from, so I wound up implementing my own simple title-search algorithm. Everything looked good for my set of 4 test records, but when I ran a sanity check against the full set of over 1000 scans, it turned out that many of the MARC records I had pulled based on an earlier exchange with Claudine were not needed, and many needed MARC records were missing. I eventually collected a larger set of MARC records to draw from, but the amount of data that I'm working with now is large enough that my simple title-searcher takes a long time to run over the full record set. Next week I'll run some more tests over a subset of the data, then run the full data set into a Google docs spread sheet, and ask Claudine to go through the title matches, and manually correct the MARC bib-id's for titles that the code does not correctly match. Once we have that spread sheet, I'll feed it to the code I have to generate the dublin core for import into d-space. Anyway - the code is cool and fun, but it's freaking frustrating to trip over so many problems for this stupid little project that could have been done by hand by now.

Finally, I made some good progress on the minnows morphology database project we're working on with Professor Jon Ambruster in biology. I read over an introduction to "geometric morphometrics", and I think I came up with a good way of tricking our Minnows DSpace server to store morphometric data in a dublin core field attached to an item record for a minnow sample's reference photograph. We'll see what Jon thinks next week.

Reuben@AU Libraries