LEADS site: Repository Analytics & Metrics Portal
Drexel CCI participated in the 7th IEEE International Conference on Healthcare Informatics (ICHI 2019) in Xi’an, China, from June 10-13th. CCI professor Chris Yang served as the general co-chair and panelist for the conference.
Phd students Ou Stella Liang and Michal Monselise presented their full paper, “Identifying Important Risk Factors Associated with Vehicle Injuries using Driving Behavior Data and Predictive Analytics.” The paper was co-authored with Chris Yang. Ou Stella also presented a data analytics challenges paper co-authored with Ali Jazayeri and Chris Yang, entitled, “Interpatient Similarity-based Imputation of Missing Data in Electronic Health Records.”
Ou Stella participated in the doctoral consortium with her presentation, “Determining Safe Prescription Practices for Pregnant Women.”
As I am so privileged that I am one of the LEADS-4-NDP fellows for this year grant. My placement is with the Digital Scholarship Center of Temple University and my mentor is Peter Logan. Currently, we are at the project proposal stage and establishing proof of concept. We’re looking at a paper too to be one of our outputs which we target to submit to a conference like NKOS or Dublin Core.
As a fellow, I was included in the recent 3-day Data Science boot camp held at our University, Drexel University. As I posted it to LinkedIn, I was really excited to learn and to meet co-fellows in this boot camp. The days had gone by so quickly for this great endeavor. Nonetheless, I had a good account of my experience with this boot camp.
Day 1 was a full pack lecture and getting to know co-fellow and our respective projects. Our ice breaker was fantastic. It gave us the opportunity to know participants in a more fun way by asking a couple of questions to a partner then presented to everyone in the room what you’d found. It revealed exciting facts about co-fellow and broke rigidity amongst ourselves. From that moment on I felt comfortable with everyone.
Lectures on Intro to Data Science by Prof. Erjia and Big Data Management by Prof. Il-Yeong, both from CCI were inspiring especially when they shared their own comprehension of concepts. I liked how Prof Erjia started with “A hundred people will have a hundred definitions of Data science (DS)…” which gave the right understanding on why there’s different treatment experienced in the DS field. I liked too how he drilled on the multidisciplinary skills needed by a modern data scientist and coached us that we should be getting just one skill and be good at it; that it would be hard to work on all four skillsets (Mathematics and Statistics, Programming and Databases, Domain Knowledge and Soft skills, Communications and Visualizations) and be the jack of all trades to them. This may end you up master of none which is not fruitful for a career. As an academic researcher, it’s advisable to boast of one skill and be a good part of a team in a DS endeavor. I appreciated Prof Erjia’s list of biases which I believe if understood, could be keys to overcoming challenges encountered DS.
On the other note, Prof Il-Yeong did expose a lot of compendium account of what happened through time in the database field. His story of “Old SQL to NO SQL to New SQL” was awesome. It provided an understanding of what we have now. It’s also great experiencing validation of what I was teaching. Hearing the database from an “antiqua” person. Don’t get me wrong. For me, “antiqua” term is full of respect and admiration. In my 10 years of teaching database, only a handful of people whom I regard as knowledgeable of the heart and soul of database and Il-Yeong is one of them.
Data Science talk of one of the mentors, Dr. Jean Godby, a senior research scientist at OCLC, was precious. She laid a good perspective to understand data science challenges and promises.
That day ended with our group dinner at Han Dynasty. We were joined by the Department Head of CCI Drexel University, Dr. Xia and Dr. Michelle Rogers and Dr. Peter Logan, one of the mentors of the LEADS-4-NDP Project and the director of Digital Scholarship Center which is my placement.
Day 2 as well as day 3, I should say were another stretches of lectures together with workshop in R. We got our hands dirty with the coding and building of our tech skill in the basics of R. Various topics ran from data pre-processing, data visualization and visual analytics, data mining and machine learning II to text processing and mini-workshop on BigML, a code-free tool for Automated Data Analytics. Dr. Richard Marciano did a small Data Science talk and presented the projects he and Digital Curation Innovation Center (DCIC) were working on. Additionally, Dr. Jane Greenberg delivered her presentation on metadata, data quality, and metadata integration.
I will miss the fellows. We had not gotten much time to really get to know each other but by heart, they are colleague and cohorts whom I can work with in this research journey of my life. I wish all of our successes in all our projects. Looking forward to our virtual meeting because we’re all working in Summer but from different states. How I wish we got time for bonding and trips.
The Metadata Research Center hosted the North American Symposium on Knowledge Organization (NASKO 2019) from June 13-14.
MRC Phd Student Sam Grabus presented her paper, “Representing Aboutness: Automatically Indexing 19th-Century Encyclopedia Britannica Entries.” The presentation discussed topic relevance revaluation for automatic indexing results, evaluating which of three keyword extraction algorithms produce more relevant results for the digital collection.
I would like to imagine that I’ve had a quite “weird” career path. After getting an undergraduate degree in history, I became a library cataloger in a public library in China. And then because of my love for librarianship, I came to the US to get a Master’s degree in Library and Information Science and then this PhD degree in Information Science. After doing PhD, I gradually developed the dichotomy between being a professional librarian and being a researcher. I think a major difference is one’s epistemological stance: being a PhD means that you should be critical to all ideologies, including those embedded in your own business.
Long story short, all these seemingly not-so-related experience converged in my LEAD4 project: “Automatic Identification of Publisher Entities to Support Discovery and Navigation,” one that is sponsored by OCLC to use data science methods to disambiguate publisher entities recorded in the publication statements in library bibliographic metadata.
Interestingly enough, this project is not a totally new idea for me either. When I was still working at Ingram Content Group in 2014 (also as a cataloger) and was about to start my PhD program, Mrs. Cecilia Preston talked to me about this idea. That was a time when VIAF.org and ISNI were still relatively new projects and “entitization” (or name disambiguation) was a major interest in the library cataloging communities. In general terms, this has been a problem for library cataloging for many years because publisher names are only transcribed into unstandardized text strings, thus preventing the library data from being used in other meaningful ways. This argument, of course, was made in Mr. Roy Tennant’s very famous article, “MARC Must Die.”
I am very glad to get some updated knowledge about this movement from Dr. Jean Godby, my supervisor in this summer project. The entitization of publishers is still a major task faced by library cataloging communities because in the BIBFRAME (Bibliographic Framework) model (one that is to replace the MARC format), the publisher is treated as an entity. To be an entity, all publishers must be freed from the text strings, disambiguated, and assigned their own identifiers.
So this is why I am here. I was super excited to read the project’s description when I decided to apply for the LEADS grant. And I am still super excited to spend the summer to immerse myself in the library bibliographic data to figure out how to extract and disambiguate publishers in the most effective way. This, I hope, will play a small role in making the library data more useful to all its “users.”
The Metadata Research Center is co-hosting the North American Symposium on Knowledge Organization (NASKO 2019) from June 13-14th, at the College of Computing and Informatics.
Howard White: “On Patrick Wilson”
TOPIC: Metadata Madness – accomplishments for the year, and/or goals for the summer.
Presenters: CCI PhD students, Cecilia Preston
Date: Wednesday, June 12th
Time: 12:30-1:30 PM
Location: 3675 Market Street,
University City Science Center,
CCI’s new location
Room: Dean’s conference room is #1039 (10th floor)
ADDED FUN: A visit to the Metadata Research Center, now residing on the 11th floor of 3675 Market Street, joining AI (artificial intelligence) and data science [This is for guests outside CCI who may attend].
The LEADS-4-NDP 2019 fellowship program kicked off this week with a 3-day data science boot camp at Drexel University’s College of Computing and Informatics. Eleven fellows from iSchools across the U.S. are paired with nine National Digital Platform partner sites for 10-week remote internships to address data science challenges.
Boot camp sessions included big data management; metadata; data pre-processing; data visualization; data mining and machine learning; large-scale and parallel computing, and automated data analytics tools. As part of the boot camp, LEADS mentors OCLC’s Jean Godby and DCIC’s Richard Marciano shared about data science opportunities at their institutions; And LEADS mentors Steven Dilliplane, Academy of Natural Sciences, and Peter Logan, Temple University’s Digital Scholarship Center, participated in boot camp activities.
Read more about the LEADS program HERE.