California Digital Library – YAMZ
We have been duplicating our setup for the the local instance of YAMZ on the Amazon AWS server. The process is similar – kind of – and we’ve come across and worked through some major glitches in its setup.
One challenge that we have experienced is setting up the database. First we had to figure out where PostGreSQL was installed. The address is specified in the code but it had moved to a different location on the new server. There are different steps that the code goes through to determine which database to use (local or remote) and the rules have changed on the new system. Because of that, we have had to figure out our new environments and our permissions, documenting the process as we go along. We’ve set up a markdown file in GitHub which will be the final destination for our process documentation, but in the meantime, we made entries to a file in Google Docs as we worked through the process of the AWS installation.
Finally, we used pg_dump/pg_restore to move the data from the old to the new PostGreSQL database, so now we have over 2500 records and a functioning website on Amazon AWS! This has been a long time coming but it has helped me see the purpose of the whole project, which is to allow people to enter terms and then collaborate to determine which of those terms will become standard in different environments. In order for this to happen, this system will have to be used frequently and consistently over time.
I still have some concerns. Did we document the process correctly? It does not seem feasible to wipe everything out and reinstall it to make sure. Also, we still haven’t worked out the process that should be used for checking out code to make changes.
It’s been a productive summer and we’ve learned a lot, but I feel we are running out of time before completing our mission. Starting and stopping, summer to summer, without continuous focus can be detrimental to projects. This is not the first time I’ve encountered this as it seems to be prevalent in academic life.
So, in summary, I see two challenges to library/data science projects:
- Bridging the gap between librarians and computer science knowledge
- Maintaining the continuity of on going projects