These two weeks I have been implementing the experiments we proposed to do: pairwise alignments of the ‘historical sovereignty’ of Taiwan.
Apart from the Darwin-core based occurrence dataset, we believe that adding an extra field called ‘historical sovereignty’ will be very beneficial for scientists to study the historical distribution of certain species. For the case of Pupinella swinhoei , land snail, we found most of the occurrence to be in the location of Taiwan. As the last blog post said, the years that this species occur are across a broad range: from 1700 to now.
However, some blockers I had when I was looking through the actual dataset are as the following:
1. Country Code: If the dataset indicated that the country code is TW (Taiwan), sometimes it is JP (Japan), did they really meant that these species occur in such location? When we cross-referenced the ‘country code’ field with the ‘locality’ field, there’s also some discrepancies such as ‘country code’ being Japan, but the locality is Formosa (Taiwan’s alias). What gets weirder is that the year indicated these records are 1700 — and at that time Taiwan was not part of Japan. The country code, locality, and year fields are problematic in this sense.
2. Year: We have 50 records in total on Pupinella swinhoei. Almost all the records have country codes, but more than two thirds of the records are missing the year information. Knowing the year that the species appeared or was collected is crucial, given this is one factor on how we determine the historical sovereignty of Taiwan.
I suppose we could go from another direction and look at Taiwan’s historical sovereignty based on Taiwan’s timeline – but if we disregard the occurrence data’s ‘years’ and operate solely with other outside information, our original goal of proposing a ‘more precise’ way for merging taxonomically organized dataset would be lost. And also, we probably cannot view this as constructing a data-driven knowledge graph (our endgame).
Another workaround is to have dummy records in addition to the real records, and fill in the years that we wanted to examine.
More to be discussed. Until next week!
Yi-Yun Cheng
PhD student, Research Assistant
School of Information Sciences, University of Illinois at Urbana-Champaign
Email: yiyunyc2@illinois.edu
Twitter: @yiyunjessica
Hi Jessica,
I’m curious…do you know if there are other instances in this data where the location data would benefit from the addition of this historical sovereignty field? I think it’s GREAT that you’ve discovered this need RE: Taiwain and Japan, and wondering what is the broader need for this sovereignty field within the data.
Making sense out of messy data – what a challenge!