LEADS Blog – Page 6 – Metadata Research Center

I would like to imagine that I’ve had a quite “weird” career path. After getting an undergraduate degree in history, I became a library cataloger in a public library in China. And then because of my love for librarianship, I came to the US to get a Master’s degree in Library and Information Science and then this PhD degree in Information Science. After doing PhD, I gradually developed the dichotomy between being a professional librarian and being a researcher. I think a major difference is one’s epistemological stance: being a PhD means that you should be critical to all ideologies, including those embedded in your own business.

Long story short, all these seemingly not-so-related experience converged in my LEAD4 project: “Automatic Identification of Publisher Entities to Support Discovery and Navigation,” one that is sponsored by OCLC to use data science methods to disambiguate publisher entities recorded in the publication statements in library bibliographic metadata.

Interestingly enough, this project is not a totally new idea for me either. When I was still working at Ingram Content Group in 2014 (also as a cataloger) and was about to start my PhD program, Mrs. Cecilia Preston talked to me about this idea. That was a time when VIAF.org and ISNI were still relatively new projects and “entitization” (or name disambiguation) was a major interest in the library cataloging communities. In general terms, this has been a problem for library cataloging for many years because publisher names are only transcribed into unstandardized text strings, thus preventing the library data from being used in other meaningful ways. This argument, of course, was made in Mr. Roy Tennant’s very famous article, “MARC Must Die.”

I am very glad to get some updated knowledge about this movement from Dr. Jean Godby, my supervisor in this summer project. The entitization of publishers is still a major task faced by library cataloging communities because in the BIBFRAME (Bibliographic Framework) model (one that is to replace the MARC format), the publisher is treated as an entity. To be an entity, all publishers must be freed from the text strings, disambiguated, and assigned their own identifiers.

Screen Shot 2019-06-11 at 4.21.13 PM copy.jpg

[The BIBFRAME Model: https://www.loc.gov/bibframe/docs/bibframe2-model.html]

So this is why I am here. I was super excited to read the project’s description when I decided to apply for the LEADS grant. And I am still super excited to spend the summer to immerse myself in the library bibliographic data to figure out how to extract and disambiguate publishers in the most effective way. This, I hope, will play a small role in making the library data more useful to all its “users.”

Dear Fellows,

This is where your e-mail blog updates will appear. You can use images and html in your e-mails to customize your post. E.g.,

Please include your blog entry title (eg., “Week 2 Jane Doe Update”) as your e-mail subject, and end each post with your name. It would also be useful to viewers for you to include the following information in each of your posts:

LEADS site: e.g,. California Digital Library

Project title: e.g., “Making a Metadata Meritocracy”

You won’t have the ability to edit your post after sending, so make sure you check for spelling errors, etc. Leads PIs and Advisory Board members may comment with their feedback on the individual blog entries. You may reply to these comments directly on the site as an external user, using your name and e-mail address.

Sam Grabus

Category: LEADS Blog

Week 1: Kai Li: How did I get here?

Test Blog Post