Week 6 – Sonia Pascua – Parser, Python & Mapping


LEADS site: Digital Scholarship Center
Project title: SKOS of the 1910 Library of Congress Subject Heading


Finally I met my mentor, Peter Logan last Monday, and it was great to see him in person. In this meeting I presented the progress of the project and figured out that perhaps a TEI format  would be a good data format for me to move forward. As pending action item, TEI format will be generated and provided by Peter.

Here are some of the matters to ponder on in this project.
  • I was able to make a parser code in Python to extract the elements from SKOS RDF/XML format of the 1910 LCSH
  • Concerted assessment of Jane, Peter and I resulted to the following
The sample entry from LCSH

SKO RDF version

Concept : Abandoned children 
PrefLabel: first SEE instance 
USE: succeeding SEE instances – Foundlings & Orphans and orphan-asylums
    • There is an entry in LCSH that has multiple SEE terms that when converted to SKOS RDF/XML using MultiTes, only the first term is accounted as PrefLable and the rest fell into AltLabel. How SEE should be represented is seen as a challenge. Based on LCSH, concept with SEE tag should use the SEE term as subject heading. It is the case in the first term in the SEE tag. It became the PrefLabel. However, AltLabel is used as the tag for the succeeding SEE terms and it is seen as an incorrect representation. Multiple PrefLables are going to be explored. Can it be done? Wouldn’t it violate the LCSH or SKOS rules? I need to conduct further investigation on this.
    • It is decided for now that USE : will be transferred to AltLabel; We will set a meeting with Joan, the developer of HIVE, how USE and Use for will be represented in HIVE.
    • I brought up about some alphanumeric words in 1910 LCSH that is a recognized Library of Congress Classification number. Should it still be needed to be represented? As per Jane, they can be kept as Notes.
    • I need also to investigate how BT and NT are going to be represented both in SKOS and in HIVE DB.
    • The current SKOS RDF/XML at hand, shows the different SKOS elements that some have no representation in HIVE. To address this, we will bring this concern to Joan and consult with her on how this can be added or mapped with the existing HIVE DB fields. 
    • Now that the text file is the input in the parser script I wrote, it is recommended to work on a text file of the 1910 LCSH. Peter to provide the TEI format.

Additionally, earlier today, LEADS-4-NDP 1-minute madness was held. I presented the progress of the project to co-fellow and the LEADS-4-NDP advisory board.


1 thought on “Week 6 – Sonia Pascua – Parser, Python & Mapping”

  1. Sonia, I really like how you showed the process – the entry, the RDF, and then more information. This helped me understand how HIVE works and now I’ll be better able to explain it to my co-writers for our paper.

Leave a Reply

Your email address will not be published. Required fields are marked *