News & Events – Metadata Research Center

The ID4 All Hands meeting was held at the Colorado School of Mines on October 13-14, 2025. The MRC was represented by Kio Polson, Scott McClellan, Xintong Zhao, and Dave Breen.

Scott McClellan and Colton Gerber (Toberer Group) presented on MatSci-YAMZ, an LLM-augmented version of Yet Another Metadata Zoo (YAMZ). MatSci-YAMZ is dedicated to studying vocabulary development in the materials science community and investigating the role large language models might play. Their presentation engaged audience members to test the new system by entering terms and definitions as well as commenting and voting on definitions produced by the LLM. Slides from their presentation can be seen here.

Prof. David Breen gave the presentation “AI-Ready Data: Knowledge Extraction from Chemistry Lab Notebooks’. The talk summarized the MRC’s research on converting hand-written chemistry lab notebooks into a structured digital form, making their data AI-ready, i.e. amenable for downstream analysis and model training. The three steps in the conversion process are: 1) automatic segmentation of the notebook pages’ components, 2) extraction of structured data from the components, and 3) error analysis and correction of the data. Slides from Dr. Breen’s presentation can be found here.

News & Events, Research

Summer REUs Present Posters at ID4 Gathering at Northwestern University

August 27, 2025August 27, 2025 Scott McClellan

The MRC would like to recognize our summer Research Experience for Undergraduates (REUs) Addy Ireland and Zach Siapno who presented posters on their research at the NSF HDR ID4 REU conference held at Northwestern University on August 14, 2025. Congratulations to Zach and Addy!

Addy’s poster, titled “Matsci YAMZ: Integrating AI into Metadata Dictionaries,” focused on implementing LLM capability into Yet Another Metadata Zoo (YAMZ), a collaborative tool initially developed by John Kunze. Addy’s research is designed to assist materials researchers by providing alternative LLM-generated definitions for materials research terminology. Addy worked with Scott McClellan.

Zach’s poster, “AI-Ready Data: Structured Processing of Laboratory Notebook Tables,” reported on efforts to improve information extraction from handwritten lab notebooks that have undergone optical character recognition. This work seeks to decrease errors in extracted data and assist materials researchers in transitioning information from paper-based to electronic lab notebooks. Zach worked with Joel Pepper.

News & Events

John Kunze presents at DCMI 2024 on ARK persistent identifiers and the YAMZ metadictionary

October 22, 2024January 28, 2025 John Kunze

At the annual DCMI (Dublin Core Metadata Initiative) 2024 conference this month, Senior Research Associate, John Kunze, gave a tutorial on ARKs (Archival Resource Keys). It included an in-depth case study of YAMZ (Yet Another Metadata Zoo), which helps build consensus on terminology using ARKs as persistent identifiers for vocabulary terms and Linked Data concepts.

John Kunze presenting the ARK (Archival Resource Key) Tutorial at the DCMI 2024 Conference in Toronto

The presentation slides are available below.

DCMI ARK Tutorial 2024.10.20, slides and notes, 120 mins.pdf from John Kunze

News & Events

Kio Polson Presents on HIVE4MAT at SeMatS workshop during SEMANTICS 2024 conference in Amsterdam

September 24, 2024December 1, 2024 Kio Polson

Kio Polson, Information Science (IS), PhD student and member of the Metadata Research Center, Drexel University, recently presented a collaborative research paper, “Enhancing Semantic Interoperability Across MaterialsScience With HIVE4MAT,” at the Semantic Material Science (SeMatS) 2024 workshop in Amsterdam, Netherlands. The workshop was part of the SEMANTiCS 2024 conference, where academia and European companies come together to talk about some of the advances in technology that relates to the “Semantic Web,” and semantic systems more broadly. The conference was lively and included innovative ways to engage the community, including a “fish bowl” event where participants could self-select and self-excuse themself from the center of a circle to talk about the “utopia or dystopia of AI.”

Kio Polson standing in front of the "Meervaart" in Amsterdam, the venue for the SEMANTICS 2024 conference — Kio Polson standing in front of the “Meervaart” in Amsterdam, the venue for the SEMANTICS 2024 conference

HIVE4MAT is software that allows material scientists to explore and use ontologies and other controlled vocabularies to organize and manage their data and information resources. HIVE4MAT supports three key functions: browse, search, and automatic indexing, which are reviewed in the recent SeMatS paper. HIVE4MAT is based on the original HIVE “Helping Interdisciplinary Vocabulary Engineering” software, and targets materials science. Kio has been refactoring, improving, and adding features to the HIVE4MAT edition of the software.

HIVE/HIVE4MAT logo

To learn more about HIVE4MAT, you may also view Kio’s presentations at the Vocabulary Symposium in Canberra Australia, and Code4Lib 2024 conference in Ann Arbor, Michigan. You may also explore the HIVE4MAT demonstration at tool @ hive4mat.cci.drexel.edu. SeMatS publication is forthcoming in the SEMANTiCS 2024 Proceedings.

News & Events

CCI to Host Document Academy 2024 Meeting

September 9, 2024September 9, 2024 Scott McClellan

MRC member Tim Gorichanaz is organizing this year’s annual meeting of the Document Academy (Docam ’24) taking place from September 18-20. This year’s theme is “Documents from the Future.”

Documents involve technology, meaning they change with the times. The ancient Latin root of “document” referred to an oral teaching, and centuries later the most common form of document was a piece of paper. Now, with computing widespread, perhaps most documents are digital. What will tomorrow’s documents bring, particularly in light of generative AI? What might change, and what might stay the same?
http://documentacademy.org/?2024

Docam ’24 will feature an array of presentations exploring the social, cultural, and technological conceptualizations and effects of documents and their possible future iterations. Among the presenters at this year’s meeting will be the MRC members Chris Rauch, Mat Kelly, and Hyung Wook Choi. The conference agenda, including a full list of presenters and presentations can be found here.

The Document Academy was founded in 2001 by Maribeth Back and Niels Windfeld Lund. The organization is dedicated to exploring documents and documentation through a variety of media and means. For more information about the Document Academy, please visit their website: https://documentacademy.org/

News & Events

Congratulations to our three NSF Institute for Data Driven Dynamical Design (ID4) REUs

August 15, 2024September 10, 2024 Scott McClellan

This summer, the MRC hosted three NSF REUs.

Elizabeth Jones and Robert Sammarco present at ID4 REU meeting at Northwestern University

Elizabeth (Lizzie) Jones, Northeastern University (Project: AI-ready data: Knowledge Extraction from Laboratory Notebooks). Lizzie pursued document segmentation, optical character recognition, and text tokenization to extract research protocols and results from digitized lab notebooks produced by members of the Reticular Synthesis Laboratory led by Fernando Uribe-Romo, University of Central Florida. Project aim: To make archival laboratory notebook data AI-ready. (Mentors include Drexel Joel Pepper, David Breen, and Jane Greenberg.)

Robert Sammarco, Drexel University(Project: Developing YAMZ (Yet Another Metadata Zoo) for Materials Science Terminology). Robert extended the YAMZ foundation, and developed a materials science terminology portal. Project aim: To achieve better data interoperability, support the FAIR data principles, and help materials scientists better communicate with-in and across subdomains. (Mentors: Christopher Rauch, John Kunze, and Mat Kelly)

Rob Fleur, University of Michigan (Project: Knowledge Graph Implementation for Materials Science). Rob worked on automatic knowledge graph generation, drawing from an extensive collection of materials science research literature. Project aim: To help researchers more expediently extract knowledge from research literature. Rob will continue his work over September 2024. (Mentor: Alex Kalinowski)

Lizzie and Robert S. also participated in the ID4 REU end-of-summer event at Northwestern University in August 2024, and they each presented posters on their work. Kudos to all our REUs and their awesome accomplishments!

*all ID4 mentors are affiliated with Drexel’s MRC

News & Events

Dave Breen Presents at iDigBio

June 11, 2024June 20, 2024 Scott McClellan

On June 11, 2024, Dave Breen gave a presentation titled “Image Informatics for Metadata Extraction and Verification of Museum Specimen Images” at the Advances in Digital Media Workshop Series at the Yale Peabody Museum. The series is part of the Integrated Digital Biocollections (iDigBio) project and sought to answer the question, “How can we use media technologies to position biodiversity collections for even greater relevance to science, society, and Earth’s biota in the future?”

Dr. Breen’s slides can be found here.

News & Events

Summer 2024 NSF Research Experiences for Undergraduates (REU) Opportunities at the MRC

May 10, 2024May 17, 2024 John Kunze

Virtual (or in person) National Science Foundation Research Experience for Undergraduate research opportunity @ the Metadata Research Center, Drexel University, as part of the Harnessing (HDR) Institute for Data Driven Dynamical Design (ID4)

Dates: Mid-June through Mid-September
(Flexibility with start date, and opportunity to continue work over Fall ‘24 term.)

REU stipend: $5,500

Deadline: Rolling basis (Friday, June 1st for first consideration)

Contacts: Interested applicants, please send a resume and brief statement of interest (1 paragraph) indicating why you would like to participate in the REU program. Please send your application to:

Senior Research Associate John Kunze: jakkbl@gmail.com
Professor Mat Kelly: mrk335@drexel.edu
Professor Jane Greenberg: jg3243@drexel.edu

REU Project title: Materials Science Vocabulary Building: Establishing a YAMZ Portal

Project overview and description: Agreement on terminology is critical for human and machine communication supporting scientific research. Additionally, shared vocabulary provides a necessary foundation of data and metadata standards, as well as the basis for labels in machine learning pipelines. This REU project will develop and enhance YAMZ.net by creating a domain-specific portal for materials science and exploring AI integration. YAMZ is a general purpose crowdsourced, online dictionary using reputation-based voting to support community discussion and consensus. Project REUs will:

Develop and test the domain specific portal in the materials science subdomain
Explore and pilot integrating ChatGPT for drawing in definitions
Document project procedures to enable a generalizable model that can, on demand, present users with a constrained view (or portal) restricted just to terms from the materials science subdomain
Collaborate with project mentors and project staff on a scholarly output (e.g., conference poster, presentation, research paper)

REU applicants for this project should have

Exposure and instruction in at least one of the following disciplines: computer science, data science, chemistry, engineering, physics, and/or materials science
Interest in semantic systems (terminology/vocabulary) and their value for representation, machine learning, and AI
Knowledge of the value of data standards for communicating human to human, human to machine, and machine to machine
Knowledge of database and data science software (SQL, Tableau, Orange, etc.)
Python, Flask or similar web framework, or other coding experience

Applicant restrictions

Must be a non-Drexel undergraduate (not graduated)
May work remotely or onsite
Must be a U.S. citizen or permanent resident of the United States or its possessions

Research Goals

Advance YAMZ.net features supporting domain specific portals (e.g., tagging, group ownership of terms and portals).
Explore and pilot AI integration into YAMZ.net.
Develop ways for domain-specific communities to be mostly self-sufficient in creating and managing portals.

Learning Goals

Gain R&D experience with a working online dictionary, and understand tradeoffs between domain-agnostic and domain-specific portals
Advance semantic research and data science/computer science skills
Obtain a better understanding of the complexity of questions surrounding terminology agreement and its importance for scientific communication and research

News & Events

April 17, 2024, MRC Presents “Accelerating AI for Data-Driven Discovery” a Talk by Shih-Chieh Hsu, PhD, University of Washington

April 13, 2024April 13, 2024 Scott McClellan

Please join the MRC on April 17 from 11AM-12PM in Room 928 of the College of Computing and Informatics for “Accelerating Artificial Intelligence for Data-Driven Discovery” a talk delivered by Shih-Chieh Hsu, PhD, University of Washington.

Abstract

As scientific datasets become progressively larger, algorithms to process this data quickly become more complex. In response, Artificial Intelligence (AI) has emerged as a solution to efficiently analyze these massive datasets. Emerging processor technologies such as graphics processing units (GPUs) and field-programmable gate arrays (FPGAs) allow AI algorithms to be greatly accelerated. The Accelerated AI Algorithms for Data-Driven Discovery (A3D3) Institute sponsored by the National Science Foundation under the Harnessing the Data Revolution program is established to enable real-time AI at scale for broad applications. In this talk, Hsu will give an overview about the challenges of high energy physics, multi-messenger astrophysics and neuroscience regarding AI across latency and throughput regimes. He will introduce various techniques for model compression using state-of-the-art techniques such as pruning and quantization for edge computing. He will demonstrate that acceleration of AI inference as a web service represents a heterogeneous computing solution. Finally Hsu will discuss how A3D3 can bring together disparate communities that are threaded by common data-intensive grand challenges to accelerate discovery in science and engineering.

Biography

Shih-Chieh Hsu, PhD is a professor in physics and adjunct professor in electrical and computer engineering at University of Washington (UW), and director of NSF HDR Institute: Accelerated Artificial Intelligence Algorithms for Data-Driven Discovery. He earned the BS/MS in physics from National Taiwan University and the PhD in Physics from University of California San Diego. He is working on experimental particle physics using proton-proton collision data from the Large Hadron Collider. His research interests range from dark matter searches with the ATLAS experiment neutrino cross-section measurements with the FASER experiment innovative artificial intelligence algorithms for data-intensive discovery and accelerated machine learning with heterogeneous computing.

News & Events

AI-Ready Data: Navigating the Dynamic Frontier of Metadata and Ontologies

February 23, 2024March 23, 2024 Jane Greenberg

ID4: Institute of Data Driven Dynamical Design

Hosted by the Metadata Research Center, College of Computing & Informatics, Drexel University

DATE: April 15-16, 2024
LOCATION: Quorum – University City Science Center, 3675 Market Street, Philadelphia PA, 19104

AI-ready data refers to the high-quality and well-prepared data that is optimized for use in artificial intelligence (AI) applications. AI-ready data increasingly encompasses the inclusion of metadata and ontologies to enhance the value and usability of data. Metadata provides essential context and information about the data, and ontologies offer structured semantic representation of a particular domain. These additional layers of information help data scientists,data scientists, researchers, and AI systems understand, interpret, and apply appropriate algorithms and models for analysis. Metadata and ontologies enable consistent data integration, interoperability, and knowledge sharing across systems, while facilitating more knowledgeable AI applications. Additionally, these systems are proving vital for supporting the FAIR (Findable, Accessible, Interoperable, and Reusable) principles and reproducible computational research (RCR).

Despite these capacities, approaches for developing, implementing, and sustaining metadata and ontologies within AI-ready data pipelines remain inconsistent, cumbersome, and lack sufficient support. Challenges underlie the full data lifecycle from data creation, collection, and research, to longer-term aims of data preservation, archiving, reuse and support for research reproducibility. Collective, community driven efforts are needed to address current obstacles and maximize the value and reliability of data. The AI-Ready Data: Navigating the Dynamic Frontier of Metadata and Ontologies workshop is a step toward addressing this challenge. This workshop will bring together a community of individuals with expertise across the data lifecycle to discuss issues, share solutions, and chart a path forward for addressing key challenges in preparing AI-ready data for scientific research.

Specific workshop goals are to:

Collectively define the state of AI-ready data challenges in the metadata and ontology space
Share current successes and solutions leveraging metadata standards and ontologies.
Contribute to a road map to accelerate the preparation of data for artificial intelligence (AI) applications.

Current topics

What is AI ready data
Research Bottlenecks: Data Life Cycle Challenges and Solutions with Scientific Data
Metadata and Ontologies: Human in the Loop in the Era of LLMs
Annotation: Large-scale Data and Balancing Human and Machine Driven Approaches
Standards Development, Adoption, and Implementation: Realities and Fictions
Knowledge Graphs
Ontology Guided Knowledge Extraction: Leveraging Scholarly Big Data for Scientific Discovery
Future Directions with Metadata and Knowledge Organization Systems

Agenda

Venue

Posters

Category: News & Events

MRC Presents Research at the HDR-ID4 All Hands Meeting

Summer REUs Present Posters at ID4 Gathering at Northwestern University

John Kunze presents at DCMI 2024 on ARK persistent identifiers and the YAMZ metadictionary

Kio Polson Presents on HIVE4MAT at SeMatS workshop during SEMANTICS 2024 conference in Amsterdam

CCI to Host Document Academy 2024 Meeting

Congratulations to our three NSF Institute for Data Driven Dynamical Design (ID4) REUs

Dave Breen Presents at iDigBio

Summer 2024 NSF Research Experiences for Undergraduates (REU) Opportunities at the MRC

April 17, 2024, MRC Presents “Accelerating AI for Data-Driven Discovery” a Talk by Shih-Chieh Hsu, PhD, University of Washington

AI-Ready Data: Navigating the Dynamic Frontier of Metadata and Ontologies

ID4: Institute of Data Driven Dynamical Design