For the project, we are also interested in matching against AAT. We have written a SPARQL query to get subject terms from the Getty AAT which was downloaded in json format.
Having data in these various formats I needed to find a way to work with both and evaluate data in one type of file against the other. In the search for answers I came across a tool for data analytics. It can be used for data preparation, blending, advanced and predictive analytics and other data science tasks and so is able to take inputs from various file formats and work with them outright.
A unique feature of the tool is the ability to build a workflow which can be exported and shared and which other members of a team can use as is or can probably easily turn into code if need be.
I’ve managed to join the json and csv file and check for matches and was able to identify exact matches after performing some transformations. This tool has a fuzzy match function which I am still trying to figure out and get working in an effective workflow that can be reproduced. I suspect that will be taking up quite a bit of my time.