Month: August 2019
Rongqian Ma; Week 8-10
Week 8-10: Re-organizing place and date information. Based on the problems that have appeared in the current version of visualizations, I performed another round of data cleaning and modification, especially for the date and geography information. With the goal of reducing the categories for each visualization, I merged some more data into others. For example, all the city information was merged into countries, single date information (e.g., 1470) was merged into the corresponding time period (e.g., in the case of the year 1470, it was merged into the 1450-1475 time period), and inconsistency of data across the time and geography categories was further manipulated. As demonstrated in the following example, the new version of visualizations gets more “clean” in terms of the number of categories and becomes more readable. For the last couple of weeks, I have also had discussions with my mentor about the visualizations, the problems I had, and have worked with my mentor for the data merge. I’m also working on a potential poster submission to iConference 2020.
Example:
Rongqian Ma; Week 6-7: Exploring Timeline JS for the Stories of Book of Hours
Week 7-8 – Sonia Pascua, The SKOS of 1910 LCSH in RDF/XML format
- Digitization – The TEI version of the 1910 LCSH encountered incompleteness therefore we need to go back to the digitization of the print copies and re-do the OCR process.
- Encoding – Parsing, which is one of the activities done in this project encountered not only syntactic and basic semantic structure error but also logic and syntax/semantics interaction.
- Programming
- Characterizing the states if possible and be able to enumerate all of them so that a conditional statement can be composed.
- Data is unclean that pattern is hardly identified for logic formulation.
- Characterizing the states if possible and be able to enumerate all of them so that a conditional statement can be composed.
- Digitalization – MultiTes or Python Program
- MultiTes usage which is manual process but yields 98% accuracy in terms of reppresentation
- Building of a program (Python) to automate the SKOS creation from TEI format to RDF/XML format encountered pattern recognition challenges due to regular expression brought by the OCR process. This yielded higher percentage of error which were identified from the 47 inconsistencies found in the evaluation conducted when the control structures of the program was constructed. Further investigation could verify the percent error yield once compared to MultiTes version of SKOS RDF/XML.
- Metadata – SKOS elements are limited to Concept, PrefLabel, Related and Notes. AltLabel, USE, USE FOR, BT and NT are not represented because HIVE database has no provision for them.
The SKOS-ification of the 1910 LCHS brought a lot of challenges that we documented to contribute to the case studies in digitization, encoding, programming, digitalization and metadata practices.
Alyson Gamble, Week 5: Historical Society of Pennsylvania
—
Bridget Disney, California Digital Library – YAMZ
- Bridging the gap between librarians and computer science knowledge
- Maintaining the continuity of on going projects
Jamillah Gabriel: Python Functions for Merging and Visualizing
This past week, I’ve been working on a function in Python that merges the two different datasets (WRA and FAR) so as to simplify the process of querying the data.
The reason for merging the data was to find a simpler alternative to the previous function for searching developed by Densho which involved if/else for loops to pull data from each dataset.
Now, one can search the data for a particular person and recover all of the available information about that person in a simple query. After the merge, the data output looks something like this when formulated as a list:
In addition to this, I’ve also played with some basic visualizations using Python to display some of the data in pie charts. I’m hoping to wrap up the last week working on more visualizations and functions for querying data.
Working with LCSH
New Data Science Tool
—
Clustering
—