Reuben@AU Libraries: March 2010

Wednesday, March 31, 2010

2010/03/29-31 - advocate for library repository

I've made a little progress on my task list over the last few days. First, I setup our devcat vufind test server to access a Solr server on repo.lib.auburn.edu. I suspect that we'll see some improvement in VuFind performance with Solr running on a physical disk rather than on the VM virtual disk. The repo box also has a ton of memory and runs 64 bit Solaris, so we can configure Solr to access a 3GB java heap rather than the 2 GB maximum on our 32 bit Linux install.

I also dusted off the repository server we setup for Claudine's ag-paper project. I sent an e-mail to Bonnie asking again to ask the Honor's College and School of Architecture if they're interested in setting up online collections for honor's theses and architecture senior projects. Bonnie hasn't replied to me yet - ugh!

Finally, I began to add task-tracker infrastructure to the littleware API. I would like to use littleware to back the reference-question statistics system that I want to work on with Marcia. We will also be able to use the task-tracker to manage data for the click-counter, cat-request, and user-feedback systems.

Friday, March 26, 2010

2010/03/24-25 Reuben in swap

I'm sort of swapping between a few projects now. First, I need to spend some time on the Institutional Repository server to get it ready for the Ag-paper collection now in scanning.

I want to setup a Nagios server to monitor the health of our various library services. I was reminded of the need for this kind of thing on Thursday when a mod_jk log file maxed out to 2GB brought http://lib.auburn.edu down. Anyway - I brought up the OpenSolaris virtual machine I was playing with a while ago, and successfully booted a sparse-zone with a MySQL server as a proof of concept. I'll setup a zone for a Nagios-demo server. Probably systems won't want to deploy OpenSolaris in production, since they're more comfortable on Linux, but it will be easy to migrate the Nagios config files to a Linux box once the prototype is setup the way we want it.

I'm waiting to hear back from Tuskegee's library on when they want me to visit to help setup a DSpace repository server. We finally decided to just install DSpace directly onto their Windows Server 2008 box. Microsoft's licensing would tie us into knots if we try to build a Windows-based VM at Auburn, then deliver it to Tuskegee. Too bad - that would have been cool.

The Extensible Catalog guy wrote me back today. It looks like XC is focussed on their middleware development plan. I don't think Auburn should invest in XC at this point - it's not clear to me where we would use it unless we enhance our Vufind backend, and the XC guy didn't seem interested in that idea at all.

Speaking of vufind - I'm copying the devcat Solr index to repo to run a test where devcat's Vufind install uses a repo-hosted Solr backend. The repo server has a ton of memory, and Solr will run directly on physical disk rather than virtual-machine emulated disk. I'm curious if we'll see any performance boost.

Finally, I have a backlog of updates to AuCataloging code. I want to update the UI for etd2marc and etd2proquest, bump the AuCataloging webapp up to Java EE 6 JSF with Facelets on glassfish v3 (we currently run glassfish v2), migrate the click-counter and cat-request web services to use a littleware node-database backend, and extend the Voyager-to-Vufind tool to become a general Import-to-Vufind tool with support for OAI-harvest and XSL transform to Solr.

Tuesday, March 23, 2010

2010/03/22-23 lots of e-mail

I sent several long e-mails over the last couple days. I'm sure the recipients find them annoying, because looking at them now annoys me. The take-home messages are that Microsoft hyper-v licensing sucks, LC does a poor job publishing authority records, and I'm dubious about the value of the Extensible Catalog.

I also published updates to the ebscoX and vygr2vfnd tools to Google code.


--- Reuben Pasquini 3/22/2010 1:09 PM ---
Hi Hellen!

I downloaded the authorities data available from
     http://id.loc.gov 
, and it looks like the download only includes "subject"
authorities.
That's also what LC indicates on their web site:
    http://id.loc.gov/authorities/about.html
I extracted a summary of the data here:
    http://devcat.lib.auburn.edu/tests/summary.txt 

We could write a tool to harvest authority records from LC's
Voyager backed authority web site:
    http://authorities.loc.gov/ 
, but it's not an elegant solution.

I sent the following e-mail to LC.  I'll let you know if I get
a reply.

Cheers,
Reuben


---------------------

Thank you for posting the LCSH authority data at
     http://id.loc.gov/authorities/ 

I notice that the id.loc.gov data set only includes subject data.  
Does LC offer bulk access to its name, title, and keyword authority databases, 
or is that data only available one record at a time from http://authorities.loc.gov/ ?

Also, does LC publish the updates to its authority database via RSS ?

We would like to setup a mechanism where we can automatically 
synchronize our catalog's authority database with LC's database.  The combination of
bulk-download plus RSS updates would give us everything we need.

Thanks again,
Reuben



--- Reuben Pasquini 3/23/2010 10:24 AM ---
Hi Rod!

I just got some more information from Jon about hyper-v, and it looks like it's more trouble than it's worth.
Sorry to put you through all the work and e-mail.
I'm now inclined to just install d-space directly onto your Windows 2008 server.
I can drive out there some morning, and we can just install everything together.
That will probably take a couple hours to get a basic server up, and we can spend another
couple hours to setup a Tuskegee look and feel.  I can come out again a week after that
to try to answer any questions that come up after you play around with the server a bit.
What do you think ?

Cheers,
Reuben


--- Jon Bell 3/22/2010 4:01 PM ---
...
 
The licensing for Windows Server 2008 R2 Hyper-V, as far as I can tell, requires that you do not use the server for any other role (web server, file server, shares, etc...) while you are using the 4 free licenses.   Keeping a test VM here for a long length of time would impede us from using the server for a Ghosting server, for which it was originally purchased.  I was under the assumption you'd build it here and hand it off to them, then we could start to use the server for Ghosting.
 
It looks to me that with all this licensing troubles the simpler solution would be Linux and VMware Server 2.0, both free.
 
Jon
 


--- Reuben Pasquini 3/22/2010 1:32 PM ---
Hi Rod,

Hope you're having a good Monday.
I just wanted to give you a quick update on 
the Tuskegee d-space project.
At this point we believe we'll require a license
to run Windows in a virtual machine.
Jon, one of our IT experts, is trying to verify that,
but that's our impression.

Assuming Microsoft requires a license to run Windows in a VM,
then we'll need a license from Tuskegee to setup the
VM here at Auburn.  If we setup the VM with an Auburn
license, then when we transfer the VM to your server
Tuskegee will be running an Auburn-licensed server,
which probably violates Auburn's contract with Microsoft.

This license thing looks like a mess.
If you have easy access to two Windows licenses
(doesn't matter which flavor of Windows) that
you can send us (one for the production VM, another
for the test VM), then
we can continue down the road of running Windows
in a VM.  

If you do not have easy access to Windows licenses for the virtual machines,
then we can change our plan to
either run d-space in a Linux VM (Linux is free - no licensing), 
or install d-space directly
onto your Windows Server 2008 box without a VM.
The linux option is my preference, but we can 
make either way work.

Anyway - let me know whether you can supply a couple
Windows license keys or install-media to us if we need it,
or which of the other options you prefer if Windows in a VM
won't work out for us.

Cheers,
Reuben


--- Reuben Pasquini 3/23/2010 11:59 AM ---
Hi Dave,

I'm Reuben Pasquini - one of Auburn's software people.
I'll try to lay out my impression of what XC is, and
what I think we at Auburn would hope to get out of it
if we invest in it.
I'll be grateful if you have time to read over this - let us know
what you think.
I don't speak for anyone else at the Auburn library.

I'm excited about the potential for us to join a team of
developers to work on software tools for use at multiple
libraries.  It seems a shame for small software teams at
different libraries to code different solutions to the same problems,
rather than somehow combine our efforts and attack
more and bigger challenges.
On the other hand, I'm not convinced that XC in itself offers
something useful to us at Auburn.

It's not my decision, but I suspect we will only join XC if
we see a way we can directly use the XC software in our environment.
We would also not be happy to merely accept coding assignments from
the core XC team; we would require input into the design, development process,
and priorities of the project.

I'm involved with several initiatives at Auburn.
We have deployed a Vufind discovery service
    http://catalog.lib.auburn.edu/code/ 
, and a d-space based ETD server
    http://etd.auburn.edu/ 
We hope to make progress on an institutional repository 
this year
    http://repo.lib.auburn.edu/ 
We also want to explore ERM solutions, 
open-source ILS, and we're beta testing a commercial
discovery service that integrates article-level search across many
of the e-journals and databases we subscribe to.

Here's my general impression of what XC is based on watching
the screencast videos (
     http://www.screencast.com/users/eXtensibleCatalog
) and a quick browse of some of the code on google.

  *. A SOLR-based metadata database

  *. Java tools for OAI and ILS metadata harvesting into SOLR 
          using XSL for format conversion

  *. PHP-in-drupal web tools for interacting with SOLR and
           the importers

XC is not an ILS, an ERM, an IR, or a federated
search engine.  XC could support article-level search only if we
can get the article contents into SOLR.
Drupal brings along a platform of services,
but Auburn has access to a commercial CMS via the university,
and we have a lot of in house experience with media-wiki and word press,
so we're not particularly excited about drupal.

Does that sound right ?
If so, then integrating XC with Vufind is
one possible area of collaboration.
XC seems very similar to Vufind.  Vufind stores 
metadata in a SOLR server, has a PHP and javascript web frontend
that accesses that data, and java-based tools for harvesting
metadata into SOLR.  We customized our Auburn Vufind instance
       http://catalog.lib.auburn.edu/ 
to remove its dependency on MARC.  
We have an XSL-based crosswalk in place to harvest
data from our Content-DM digital library:
        http://diglib.auburn.edu/ 
, ex:
        http://catalog.lib.auburn.edu/vufind/Search/Home?lookfor=Caroline+Dean&type=all&submit=Find
There's slightly dated information on some of our code here:
      http://catalog.lib.auburn.edu/code/ 
      http://lib.auburn.edu/auburndevcat/ 

The core http://vufind.org project has several weaknesses including variable code quality,
lack of regression tests, and the necessity for implementors to customize
the code base.  The vufind code Auburn runs is different from the code 
Michigan or Illinois run.  
We have posted our code online, but most other sites do not.
The choice of PHP as an implementation language and the lack
of discipline in locking APIs means that merging updates from the core project into
our code base is often a chore, so it's easier to just fork and grab random
patches that interest us.

I could imagine a 12 month XC-Vufind collaboration something like this:

   *. Phase 1 - integrate Auburn Vufind with XC
            x. Merge Solr schema
            x. Merge import tools.  Leverage XC import manager
                    if such a beast exists for ILS import and OAI
                    harvest into SOLR
            x. End of phase 1 - XC participating libraries all
                   run XC-Vufind discovery instances

     *. Phase 2 - cloud support
             x. Extend XC-Vufind to support multiple SOLR backends
             x. Host a shared SOLR server that holds the common open-access
                     records that all the XC-Vufind libraries search against
             x. Integrate easy cross-institution ILL in the discovery layer

Anyway - that's one way I could image using XC at Auburn.
I imagine that you and the XC team have a different vision for the project.
I'm interested to hear what you have in mind via e-mail before we
commit to attend your meeting.

Cheers,
Reuben

Thursday, March 18, 2010

2010/03/17-18 - Extensible Toolkit, MS Licensing

I have a major improvement to the ebscoX and vygr2vfnd tools nearly ready to release. I've setup a littleware.swingbase module that makes it easy to setup simple Swing UI's with persistent properties, UI feedback, and a tools menu. I hope to release an update next week.

Adam and Jon are looking into Windows VM-licensing for the Tuskegee d-space project. We think it makes sense to deploy the d-space server to a virtual machine, but the Tuskegee library staff doesn't have Linux experience. Unfortunately it looks like Windows has a greedy VM-licensing model, so we might have to either deploy d-space directly onto Tuskegee's host Windows server, or just go ahead with an Ubuntu VM, and give the Tuskegee guys some basic training.

Finally, Aaron is considering whether the library ought to invest some time and effort into the Extensible Catalog project. I sent the following e-mail with my impressions.

Hi Aaron,

I watched the XC webcasts a while ago:
    http://www.screencast.com/users/eXtensibleCatalog 

I remember at the time I wasn't that impressed.
They have basically re-implemented
VuFind with a few more features like integrated OAI harvesting,
decoupling from MARC (what we did), etc.

    *. PHP front-end - VuFind is PHP-Smarty based,
                   XC integrates with Drupal CMS
                        http://drupal.org/

     *. XC integrates with the ILS via some NCIP thing -
           VuFind has an ILS-plugin system

     *. XC dumps all their data into the "Metadata Service Toolkit" -
           which is a SOLR server - same as VuFind.

XC is yet another project that dumps metadata into SOLR and
slaps a PHP web UI in front of it.

I am very interested in having a conversation about plans for
ILS and discovery software, and strategic plans for the library
in general.  My first impression is that XC is not a good place
to invest our time and effort.

Cheers,
Reuben

Tuesday, March 16, 2010

2010/03/15-16 Voyager upgrade

Monday and Tuesday were quiet days in cataloging this week as Auburn's Voyager system has been offline for an upgrade to version 7.2.

I finally posted a web-start version of the ebscoX tool on Google Code. I sent the e-mail below to Marliese and the libdev group with the details. I continue to work on the code underlying ebscoX and vygr2vfnd to make the tools easier to use and maintain. I'm also patching vygr2vfnd to directly use the ebscoX MARC-filter code. I hope to release an update to both tools next week.

I've also exchanged a few more e-mails with Rod and Dana at Tuskegee. The basic plan is to setup Tuskegee's Windows 2008 R2 Enterprise server to host a hyper-v Windows virtual machine guest that runs DSpace repository software. Rod is going to verify that their server has the hardware virtualization support that hyper-v requires. Adam and Jon are already helping work things out.


--- Reuben Pasquini 03/15/10 11:29 AM ---
Hi Marliese,

If you still want to point EBSCO at our Voyager-to-Ebsco export tool,
then I finally posted the latest version of the EbscoX export 
tool to google code:
        http://code.google.com/p/littleware/

Just click on the "EbscoX" link (
      http://ivy2maven2.littleware.googlecode.com/hg/webstart/ebscoX.jnlp 
) to launch the application.

I'll try to improve the user interface over the next month or two.
The interface is pretty clunky, and the user needs to setup
a special properties file with the connection information for his/her
Voyager Oracle database.  I can help you do that for our database if
you want to give it a try.

There are brief instructions on the site if they want to download the code.
I'll write up something with more details one of these days.

Cheers,
Reuben

Thursday, March 11, 2010

2010/03/10-11 coding quietly

It has been a quiet couple days at the library. Yesterday I attended Beth's presentation on the Content-DM to VuFind export/import process. The process Beth and I worked out involves several steps and running some command-line tools from a Linux ssh session. That was fine for Beth and Clint, but it's going to be annoying for Midge or Marliese to have to deal with that. We'll have to write some little GUI tool to manage the process after the dm-data.xml and dm-data-convert.xsl files are ready.

Aaron and I have exchanged a few e-mails with Rod and Dana at Tuskegee. Tuskegee plans to experiment with a DSpace repository, and I'll probably lend a hand with the software install and setup.

Otherwise I've been plugging away at the AuCataloging refactor. I just checked in a patch, so that all our regression tests now pass, and the AuCataloging webapp has an IVY build process. I still need to finish up the build for the webapp, but by the end of next week I hope to point EBSCO and the VuFind e-mail list at the AuCataloging code repository, web-start apps, and a download zipfile with ready-to-run binary apps.

Tuesday, March 9, 2010

2010/03/08-09 Tool Exchange

I'm still polishing up the AuCataloging code, but the code is back in a running state with updates for scala-2.8, java EE 6, and a new IVY build process. I've setup separate build-projects for voyager-to-vufind (v2v), ebsco-export (ebscoX), and the shared auLibrary, and published the code to a Google Code repository. I'll try to move the click-counter and ETD tools currently bundled with auLibrary out into their own build projects over the next week too.

I helped Clint setup a v2v build and test environment yesterday, so he can work out a bug where the vufind import does not properly label our e-journals with a "journal" format facet - the e-journals only show up under the "electronic" facet now. I'm pretty happy Clint is looking at the v2v code - I was a little worried that I was the only one that knew how to work with it.

I still need to update the AuCataloging webapp to use an IVY build process, and I want to setup build rules that bundle web-start and downloadable versions of the v2v and ebscoX tools, so we can easily publish those apps online for other libraries and EBSCO to use. I'll work on that tomorrow.

I would also like to port the AuCataloging webapp JSF code to use java ee6 facelets and annotations, then retire the current ee5 JSP code. We'll have to update our app-server to the new Glassfish v3 server to use EE6 api's. I moved a non-library JSF project to EE6 on glassfish-v3 last week, and I like it a lot.

Friday, March 5, 2010

2010/03/04-05 Still Refactoring

We had a fun libdev meeting this morning. Clint gave us a review of the code4lib 2010 conference he attended last week.

I checked in more AuCataloging patches today that break out the Voyager-2-VuFind (V2V) tool to its own build project. The new V2V project uses an IVY dependency on the AuLibrary 1.0 artifact that I added to littleware's online IVY/Maven repository today.

I also wrote an AuburnIndexerTester regression for the V2V test suite. On Monday I'll sit down with Clint, and setup a V2V build environment, so he can work on the AuburnIndexer's getFormat() method.

Next week I'll break out the EbscoX tool to its own build project, so I can point the EBSCO developers at that code. I'll also republish web-start versions of the Voyager-2-Vufind and EbscoX tools to the project site on Google code. Once the web-start apps are online, anyone will be able to easily run those tools.

Monday, March 1, 2010

2010/03/01 - IVY Build

I nearly have the AuCataloging code building with IVY. I forgot that the Voyager2Vufind code sucks in a load of dependencies via Solr. I tried just enabling pom support, but that's trying to pull in more than I need. Ugh. Anyway - hopefully just one more day of goofing around with the build.

I'll be out the next couple days - my parents are visiting, but I'll be back at my desk Thursday and Friday.

Reuben@AU Libraries