Tuesday, March 23, 2010

2010/03/22-23 lots of e-mail

I sent several long e-mails over the last couple days. I'm sure the recipients find them annoying, because looking at them now annoys me. The take-home messages are that Microsoft hyper-v licensing sucks, LC does a poor job publishing authority records, and I'm dubious about the value of the Extensible Catalog.

I also published updates to the ebscoX and vygr2vfnd tools to Google code.


--- Reuben Pasquini 3/22/2010 1:09 PM ---
Hi Hellen!

I downloaded the authorities data available from
     http://id.loc.gov 
, and it looks like the download only includes "subject"
authorities.
That's also what LC indicates on their web site:
    http://id.loc.gov/authorities/about.html
I extracted a summary of the data here:
    http://devcat.lib.auburn.edu/tests/summary.txt 

We could write a tool to harvest authority records from LC's
Voyager backed authority web site:
    http://authorities.loc.gov/ 
, but it's not an elegant solution.

I sent the following e-mail to LC.  I'll let you know if I get
a reply.

Cheers,
Reuben


---------------------

Thank you for posting the LCSH authority data at
     http://id.loc.gov/authorities/ 

I notice that the id.loc.gov data set only includes subject data.  
Does LC offer bulk access to its name, title, and keyword authority databases, 
or is that data only available one record at a time from http://authorities.loc.gov/ ?

Also, does LC publish the updates to its authority database via RSS ?

We would like to setup a mechanism where we can automatically 
synchronize our catalog's authority database with LC's database.  The combination of
bulk-download plus RSS updates would give us everything we need.

Thanks again,
Reuben




--- Reuben Pasquini 3/23/2010 10:24 AM ---
Hi Rod!

I just got some more information from Jon about hyper-v, and it looks like it's more trouble than it's worth.
Sorry to put you through all the work and e-mail.
I'm now inclined to just install d-space directly onto your Windows 2008 server.
I can drive out there some morning, and we can just install everything together.
That will probably take a couple hours to get a basic server up, and we can spend another
couple hours to setup a Tuskegee look and feel.  I can come out again a week after that
to try to answer any questions that come up after you play around with the server a bit.
What do you think ?

Cheers,
Reuben


--- Jon Bell 3/22/2010 4:01 PM ---
...
 
The licensing for Windows Server 2008 R2 Hyper-V, as far as I can tell, requires that you do not use the server for any other role (web server, file server, shares, etc...) while you are using the 4 free licenses.   Keeping a test VM here for a long length of time would impede us from using the server for a Ghosting server, for which it was originally purchased.  I was under the assumption you'd build it here and hand it off to them, then we could start to use the server for Ghosting.
 
It looks to me that with all this licensing troubles the simpler solution would be Linux and VMware Server 2.0, both free.
 
Jon
 


--- Reuben Pasquini 3/22/2010 1:32 PM ---
Hi Rod,

Hope you're having a good Monday.
I just wanted to give you a quick update on 
the Tuskegee d-space project.
At this point we believe we'll require a license
to run Windows in a virtual machine.
Jon, one of our IT experts, is trying to verify that,
but that's our impression.

Assuming Microsoft requires a license to run Windows in a VM,
then we'll need a license from Tuskegee to setup the
VM here at Auburn.  If we setup the VM with an Auburn
license, then when we transfer the VM to your server
Tuskegee will be running an Auburn-licensed server,
which probably violates Auburn's contract with Microsoft.

This license thing looks like a mess.
If you have easy access to two Windows licenses
(doesn't matter which flavor of Windows) that
you can send us (one for the production VM, another
for the test VM), then
we can continue down the road of running Windows
in a VM.  

If you do not have easy access to Windows licenses for the virtual machines,
then we can change our plan to
either run d-space in a Linux VM (Linux is free - no licensing), 
or install d-space directly
onto your Windows Server 2008 box without a VM.
The linux option is my preference, but we can 
make either way work.

Anyway - let me know whether you can supply a couple
Windows license keys or install-media to us if we need it,
or which of the other options you prefer if Windows in a VM
won't work out for us.

Cheers,
Reuben


--- Reuben Pasquini 3/23/2010 11:59 AM ---
Hi Dave,

I'm Reuben Pasquini - one of Auburn's software people.
I'll try to lay out my impression of what XC is, and
what I think we at Auburn would hope to get out of it
if we invest in it.
I'll be grateful if you have time to read over this - let us know
what you think.
I don't speak for anyone else at the Auburn library.

I'm excited about the potential for us to join a team of
developers to work on software tools for use at multiple
libraries.  It seems a shame for small software teams at
different libraries to code different solutions to the same problems,
rather than somehow combine our efforts and attack
more and bigger challenges.
On the other hand, I'm not convinced that XC in itself offers
something useful to us at Auburn.

It's not my decision, but I suspect we will only join XC if
we see a way we can directly use the XC software in our environment.
We would also not be happy to merely accept coding assignments from
the core XC team; we would require input into the design, development process,
and priorities of the project.

I'm involved with several initiatives at Auburn.
We have deployed a Vufind discovery service
    http://catalog.lib.auburn.edu/code/ 
, and a d-space based ETD server
    http://etd.auburn.edu/ 
We hope to make progress on an institutional repository 
this year
    http://repo.lib.auburn.edu/ 
We also want to explore ERM solutions, 
open-source ILS, and we're beta testing a commercial
discovery service that integrates article-level search across many
of the e-journals and databases we subscribe to.

Here's my general impression of what XC is based on watching
the screencast videos (
     http://www.screencast.com/users/eXtensibleCatalog
) and a quick browse of some of the code on google.

  *. A SOLR-based metadata database

  *. Java tools for OAI and ILS metadata harvesting into SOLR 
          using XSL for format conversion

  *. PHP-in-drupal web tools for interacting with SOLR and
           the importers

XC is not an ILS, an ERM, an IR, or a federated
search engine.  XC could support article-level search only if we
can get the article contents into SOLR.
Drupal brings along a platform of services,
but Auburn has access to a commercial CMS via the university,
and we have a lot of in house experience with media-wiki and word press,
so we're not particularly excited about drupal.

Does that sound right ?
If so, then integrating XC with Vufind is
one possible area of collaboration.
XC seems very similar to Vufind.  Vufind stores 
metadata in a SOLR server, has a PHP and javascript web frontend
that accesses that data, and java-based tools for harvesting
metadata into SOLR.  We customized our Auburn Vufind instance
       http://catalog.lib.auburn.edu/ 
to remove its dependency on MARC.  
We have an XSL-based crosswalk in place to harvest
data from our Content-DM digital library:
        http://diglib.auburn.edu/ 
, ex:
        http://catalog.lib.auburn.edu/vufind/Search/Home?lookfor=Caroline+Dean&type=all&submit=Find
There's slightly dated information on some of our code here:
      http://catalog.lib.auburn.edu/code/ 
      http://lib.auburn.edu/auburndevcat/ 

The core http://vufind.org project has several weaknesses including variable code quality,
lack of regression tests, and the necessity for implementors to customize
the code base.  The vufind code Auburn runs is different from the code 
Michigan or Illinois run.  
We have posted our code online, but most other sites do not.
The choice of PHP as an implementation language and the lack
of discipline in locking APIs means that merging updates from the core project into
our code base is often a chore, so it's easier to just fork and grab random
patches that interest us.

I could imagine a 12 month XC-Vufind collaboration something like this:

   *. Phase 1 - integrate Auburn Vufind with XC
            x. Merge Solr schema
            x. Merge import tools.  Leverage XC import manager
                    if such a beast exists for ILS import and OAI
                    harvest into SOLR
            x. End of phase 1 - XC participating libraries all
                   run XC-Vufind discovery instances

     *. Phase 2 - cloud support
             x. Extend XC-Vufind to support multiple SOLR backends
             x. Host a shared SOLR server that holds the common open-access
                     records that all the XC-Vufind libraries search against
             x. Integrate easy cross-institution ILL in the discovery layer

Anyway - that's one way I could image using XC at Auburn.
I imagine that you and the XC team have a different vision for the project.
I'm interested to hear what you have in mind via e-mail before we
commit to attend your meeting.

Cheers,
Reuben

No comments:

Post a Comment