Week 2-3: Sonia Pascua, I am one of the “Mix” ‘s in the Metadata Mixer

LEADS site: Digital Scholarship Center
Project title: SKOS of the 1910 Library of Congress Subject Heading


Last June 13, 2019, I presented our LEADS-4-NDP project at the Metadata Mixer.

 I started my lighting talk discussing the bigger picture of our project.

The Digital Scholarship Center has an ongoing project which is the Nineteenth-Century Knowledge Project that builds the most extensive, open, digital collections available today for studying the structure of the 19th Century knowledge and transformation using historic editions of the Encyclopedia Britannica. This project is progressing hugely towards establishing the controlled vocabulary terms for the purpose of metadata consistency and interoperability and is utilizing vocabularies in HIVE especially LCHS.

Our project works around the SKOS – ination of the 1910 LCHs.

The hypothesis that we would like to explore is that there may be gap or we call it “vocabulary divide” between the vocabularies of the past and the present. With the current version of LCHS (2016) in HIVE, we aim to include the 1910 version of LCHS to cater the researches using resources from the past especially the 19th century knowledge.

Above is our conceptual model. As shown, the 1910 LCHS would be digitized to text format for easy manipulation of words. Then from the text, be it in csv, xls, DocX format – the RDF/XML format is constructed for HIVE integration. Once the 1910 LCHS is into HIVE, it could now be used as a tool for automatic indexing.
In the 5-min talk, I was able to present the proof of concept
We formulated use cases based on the data sets – 1910 LCHS and 2016 LCHS. Four scenarios were devised for data analysis.The gap or “vocabulary divide” is verified and validated by these use cases. 
A simulation of a word – Absorption was conducted. The article about the sun was taken from the 1911 Encyclopedia Britannica. It was subjected to a text analysis using TagCrowd. Frequencies of the words in the article were extracted. For subject cataloging, which was done manually, the descriptors were selected to represent the ABOUTNESS of the article. 1910 LCHS was used for indexing and vocabulary was generated. The same process was executed but this time with the use of 2016 LCHS in HIVE for automatic indexing. The case study fell under scenario 2 which meant that the word “Absorption” intersected both data sets, thus the word existed from 1910 till 2016.

Leave a Reply

Your email address will not be published. Required fields are marked *