One of my goals after working with this new tool is to obtain the entire LCSH dataset and try to do matching on the local machine. In part because the previous method, while effective may not scale well, so we wish to test out whether we can get better results from downloading and checking it ourselves.
Since the data formats in which they make the data available are not ideal for manipulation my next task will be to try to convert the data from nt format to csv using Apache Jena. The previous fellow had made some notes about trying this method so I will be reading his instructions and seeing if I can replicate this having never used Jena before.
Once I obtain the data in a format that I can use, I will add it to my current workflow and see what the results are looking like. Hopefully the results will be useful and something that can be scaled up.
— Julaine Clunis