Scott McClellan

The MRC would like to recognize our summer Research Experience for Undergraduates (REUs) Addy Ireland and Zach Siapno who presented posters on their research at the NSF HDR ID4 REU conference held at Northwestern University on August 14, 2025. Congratulations to Zach and Addy!

Addy’s poster, titled “Matsci YAMZ: Integrating AI into Metadata Dictionaries,” focused on implementing LLM capability into Yet Another Metadata Zoo (YAMZ), a collaborative tool initially developed by John Kunze. Addy’s research is designed to assist materials researchers by providing alternative LLM-generated definitions for materials research terminology. Addy worked with Scott McClellan.

Zach’s poster, “AI-Ready Data: Structured Processing of Laboratory Notebook Tables,” reported on efforts to improve information extraction from handwritten lab notebooks that have undergone optical character recognition. This work seeks to decrease errors in extracted data and assist materials researchers in transitioning information from paper-based to electronic lab notebooks. Zach worked with Joel Pepper.

News & Events

CCI to Host Document Academy 2024 Meeting

September 9, 2024September 9, 2024 Scott McClellan

MRC member Tim Gorichanaz is organizing this year’s annual meeting of the Document Academy (Docam ’24) taking place from September 18-20. This year’s theme is “Documents from the Future.”

Documents involve technology, meaning they change with the times. The ancient Latin root of “document” referred to an oral teaching, and centuries later the most common form of document was a piece of paper. Now, with computing widespread, perhaps most documents are digital. What will tomorrow’s documents bring, particularly in light of generative AI? What might change, and what might stay the same?
http://documentacademy.org/?2024

Docam ’24 will feature an array of presentations exploring the social, cultural, and technological conceptualizations and effects of documents and their possible future iterations. Among the presenters at this year’s meeting will be the MRC members Chris Rauch, Mat Kelly, and Hyung Wook Choi. The conference agenda, including a full list of presenters and presentations can be found here.

The Document Academy was founded in 2001 by Maribeth Back and Niels Windfeld Lund. The organization is dedicated to exploring documents and documentation through a variety of media and means. For more information about the Document Academy, please visit their website: https://documentacademy.org/

News & Events

Congratulations to our three NSF Institute for Data Driven Dynamical Design (ID4) REUs

August 15, 2024September 10, 2024 Scott McClellan

This summer, the MRC hosted three NSF REUs.

Elizabeth Jones and Robert Sammarco present at ID4 REU meeting at Northwestern University

Elizabeth (Lizzie) Jones, Northeastern University (Project: AI-ready data: Knowledge Extraction from Laboratory Notebooks). Lizzie pursued document segmentation, optical character recognition, and text tokenization to extract research protocols and results from digitized lab notebooks produced by members of the Reticular Synthesis Laboratory led by Fernando Uribe-Romo, University of Central Florida. Project aim: To make archival laboratory notebook data AI-ready. (Mentors include Drexel Joel Pepper, David Breen, and Jane Greenberg.)

Robert Sammarco, Drexel University(Project: Developing YAMZ (Yet Another Metadata Zoo) for Materials Science Terminology). Robert extended the YAMZ foundation, and developed a materials science terminology portal. Project aim: To achieve better data interoperability, support the FAIR data principles, and help materials scientists better communicate with-in and across subdomains. (Mentors: Christopher Rauch, John Kunze, and Mat Kelly)

Rob Fleur, University of Michigan (Project: Knowledge Graph Implementation for Materials Science). Rob worked on automatic knowledge graph generation, drawing from an extensive collection of materials science research literature. Project aim: To help researchers more expediently extract knowledge from research literature. Rob will continue his work over September 2024. (Mentor: Alex Kalinowski)

Lizzie and Robert S. also participated in the ID4 REU end-of-summer event at Northwestern University in August 2024, and they each presented posters on their work. Kudos to all our REUs and their awesome accomplishments!

*all ID4 mentors are affiliated with Drexel’s MRC

News & Events

Dave Breen Presents at iDigBio

June 11, 2024June 20, 2024 Scott McClellan

On June 11, 2024, Dave Breen gave a presentation titled “Image Informatics for Metadata Extraction and Verification of Museum Specimen Images” at the Advances in Digital Media Workshop Series at the Yale Peabody Museum. The series is part of the Integrated Digital Biocollections (iDigBio) project and sought to answer the question, “How can we use media technologies to position biodiversity collections for even greater relevance to science, society, and Earth’s biota in the future?”

Dr. Breen’s slides can be found here.

News & Events

April 17, 2024, MRC Presents “Accelerating AI for Data-Driven Discovery” a Talk by Shih-Chieh Hsu, PhD, University of Washington

April 13, 2024April 13, 2024 Scott McClellan

Please join the MRC on April 17 from 11AM-12PM in Room 928 of the College of Computing and Informatics for “Accelerating Artificial Intelligence for Data-Driven Discovery” a talk delivered by Shih-Chieh Hsu, PhD, University of Washington.

Abstract

As scientific datasets become progressively larger, algorithms to process this data quickly become more complex. In response, Artificial Intelligence (AI) has emerged as a solution to efficiently analyze these massive datasets. Emerging processor technologies such as graphics processing units (GPUs) and field-programmable gate arrays (FPGAs) allow AI algorithms to be greatly accelerated. The Accelerated AI Algorithms for Data-Driven Discovery (A3D3) Institute sponsored by the National Science Foundation under the Harnessing the Data Revolution program is established to enable real-time AI at scale for broad applications. In this talk, Hsu will give an overview about the challenges of high energy physics, multi-messenger astrophysics and neuroscience regarding AI across latency and throughput regimes. He will introduce various techniques for model compression using state-of-the-art techniques such as pruning and quantization for edge computing. He will demonstrate that acceleration of AI inference as a web service represents a heterogeneous computing solution. Finally Hsu will discuss how A3D3 can bring together disparate communities that are threaded by common data-intensive grand challenges to accelerate discovery in science and engineering.

Biography

Shih-Chieh Hsu, PhD is a professor in physics and adjunct professor in electrical and computer engineering at University of Washington (UW), and director of NSF HDR Institute: Accelerated Artificial Intelligence Algorithms for Data-Driven Discovery. He earned the BS/MS in physics from National Taiwan University and the PhD in Physics from University of California San Diego. He is working on experimental particle physics using proton-proton collision data from the Large Hadron Collider. His research interests range from dark matter searches with the ATLAS experiment neutrino cross-section measurements with the FASER experiment innovative artificial intelligence algorithms for data-intensive discovery and accelerated machine learning with heterogeneous computing.

News & Events

Ajani Levere Presents STAR Scholar Project

August 31, 2023October 21, 2023 Scott McClellan

Ajani Levere, a Drexel University STAR Scholar working with Drs. Jane Greenberg and David Breen presented their research imageomics at the STAR Scholars showcase on August 31, 2023. Their presentation was titled “Computational Fish Specimen Classification: Advancing Machine Learning Model Accuracy” and was part of the NSF-HDR: Biology-guided Neural Networks for Discovering Phenotypic Traits. Ajani’s research is continuing under the guidance of Dr. Greenberg. They describe their project as follows:

Digital specimen metadata is valuable for scientific research and discovery, yet sparse specimen metadata availability restricts its potential. In addition to computational efforts made to remedy this issue, Machine Learning (ML) classification was performed on a computed metadata component, the outline extracted from fish specimen images. An ML model (MLM) approach provided a computational genus classification for a given fish outline. This research improves the MLM’s ability to accurately classify fish from their 2D outlines and demonstrates the expressiveness of this computed metadata item.

In our analysis, we inspected the outlines of the error cases, followed by a statistical review of their numerical data. We discovered our dataset limited higher MLM accuracy potential. Refactoring the dataset with a reduced feature length thus enhanced our dataset for MLM interpretability. Experimental results indicate a 96% accuracy, a 5% improvement over previous results. These results confirm the outline as a unique and highly distinguishable metadata component. Computing metadata components of this nature aids the development of a more robust metadata catalog for ML researchers.

News & Events

John Kunze’s JCDL 2023 Presentation Highlights ARKs and YAMZ

July 24, 2023October 19, 2024 Scott McClellan

John Kunze’s tutorial on ARKs (Archival Resource Keys) delivered at JCDL and his work on YAMZ was mentioned prominently in Lesley Frew’s blogpost at the Web Science and Digital Libraries Group at Old Dominion University, “”Up and running with ARK persistable identifiers” JCDL Tutorial Trip Report.” ARK persistent identifiers form one of the primary features of the YAMZ application and help ensure continuity of terminology.

News & Events

Jane Greenberg Receives ASIS&T Research in Information Science Award

July 6, 2023July 6, 2023 Scott McClellan

The Metadata Reasearch Center congratulates Jane Greenberg, its Director and Founder, for receiving the Association for Information Science & Technology’s (ASIS&T) 2023 Research in Information Science Award. The award “recognizes an individual or team who has made an outstanding contribution to information science research. The award is for a systematic “program of research” in a single area at a level beyond the single study.” ASIS&T recognized Dr. Greenberg’s wide-ranging contributions, including her current positions as principal investigator on the Metadata Capital Initiative (MetaDataCAPT’L) and the NSF-funded Institute for Data Driven Dynamical Design (ID4), and the IMLS-funded project LEADING (LIS Education and Data Science Integrated Network Group). The award committee also singled out her work with the Biology-guided Neural Network (BGNN) project and the Helping Interdisciplinary Vocabulary Engineering (HIVE) tool. Please click here for the full press release from ASIS&T, “Jane Greenberg Receives Association for Information Science and Technology (ASIS&T) Research in Information Science Award.”

News & Events

Summer 2023 NSF Research Experiences for Undergraduates (REU) Opportunities at the MRC

June 30, 2023July 3, 2023 Scott McClellan

Two (2) virtual National Science Foundation Research Experience for Undergraduate research opportunities @ the Metadata Research Center, Drexel University, as part of the Harnessing (HDR) Institute for Data Driven Dynamical Design (ID4)

Dates: Mid-July through Mid-September

REU stipend: $5,500

Deadline: Rolling basis (Friday, July, 7th for first consideration)

Contacts:

Interested applicants, please sent resume and brief statement of interest (1 paragraph) indicating: 1) which REU option you would like to apply for, and 2) why you would like to participate in the REU program.

Please send your application to:

Xintong Zhao: xz485@drexel.edu
Scott McClellan: sm4522@drexel.edu
Jane Greenberg: jg3243@drexel.edu

REU Option 1: Materials Science Repository Semantics

Standards are an integral component of data repository infrastructure and support of the FAIR (findable, accessible, interoperable, and reusable) data. Terminology, specifically the language (vocabulary) used to represent data, is standardized through metadata and semantic ontologies. The focus of this REU will be on investigating metadata infrastructures across a sub-set of materials science repositories, and looking specifically at the terminological representation used and alignment with semantic ontologies.

REU applicants for this project should have:

Some disciplinary exposure to chemistry, engineering, physics, and/or materials science.
Interest in semantic systems (terminology/vocabulary) and their value for representation, machine learning, and AI
Appreciation standards for communication human to human, human to machine, machine to machine
Knowledge of Excel, Tableau, Orange, or other data science software that allows analysis and visualization, or interest in learning
Python, R, or other coding experience helpful, but not necessary

Research Goals

Explore similarities and differences of standards and data representation practices across a subset of materials science data representations.
Analyze and visualize data representation, specifically metadata and semantic systems.
Assess the effectiveness of standards and identifying areas needing more attention.

Learning Goals

Gain knowledge of metadata standards and semantic ontologies are key to the FAIR data principles.
Advance analytical and visualization research skills
Obtain better understanding of the relationship of standards to ML/AI

REU Option 2: Metal-Organic Frameworks (MOFs) Synthesis Extraction from Scholarly Big Data

Metal-Organic Frameworks (MOFs) are a kind of crystals (natural or synthetic) that have advanced the field of materials and solid-state sciences over the last quarter century. The synthesis procedure often reported in literature can play a critical role in data-driven discovery of Metal-organic framework materials. Unfortunately, this valuable knowledge is significantly underutilized as it remains buried in text, which is unstructured and not machine understandable. This challenge is exasperated because it is simply not feasible for human researchers to read every single article in their fields, given there are over thousands of publications, and the number is still growing exponentially. In this project, students will work with researchers in Drexel University’s Metadata Research Center, University of Central Florida and Colorado School of Mines, connected with the NSF/ID4 (Institute for Data Driven Dynamical Design) project. The focus will be on investigating the use of natural language processing techniques to extract key synthesis knowledge from unstructured text data. We seek to develop robust deep learning models which enable automatic knowledge extraction and ultimately construct knowledge graphs from scholarly corpus. REU summer students will gain deeper understanding of natural language processing and use of large pre-trained language models through the text annotation process.

Research Goals

Pre-train language models for downstream NLP tasks in materials science
Develop different deep learning models to improve extraction performance
Construct solid external knowledge sources (e.g., taxonomy, ontology) for future research

Learning Goals

Gain knowledge of deep learning frameworks such as Pytorch
How to generate language representations as features for deep learning models
Obtain better understanding of the complete workflow of information extraction (named entity recognition/relation extraction)

News & Events

Scott McClellan presents at 20th RDA Plenary’s Session on Materials Science Ontologies

March 21, 2023July 9, 2023 Scott McClellan

Scott McClellan, a second year doctoral student, presented research results to the “Data representation in materials and chemicals based on harmonised domain ontologies” birds of a feather group at the Research Data Alliance’s 20th Plenary meeting in Gothenburg, Sweden on March 21-23, 2023. His presentation, titled “Along the Border: Term Overlap Among 5 Matportal Ontologies,” focused on term overlap among a subset of ontologies maintained at the Matportal repository. It looked at how term matching algorithms for materials science semantic artifacts differed when locating terminological or URI results. His presentation stemmed from prior research done with Drs. Yuan An and Jane Greenberg and fellow graduate student Xintong Zhao. [Slides]

Author: Scott McClellan

Summer REUs Present Posters at ID4 Gathering at Northwestern University

CCI to Host Document Academy 2024 Meeting

Congratulations to our three NSF Institute for Data Driven Dynamical Design (ID4) REUs

Dave Breen Presents at iDigBio

April 17, 2024, MRC Presents “Accelerating AI for Data-Driven Discovery” a Talk by Shih-Chieh Hsu, PhD, University of Washington

Ajani Levere Presents STAR Scholar Project

John Kunze’s JCDL 2023 Presentation Highlights ARKs and YAMZ

Jane Greenberg Receives ASIS&T Research in Information Science Award

Summer 2023 NSF Research Experiences for Undergraduates (REU) Opportunities at the MRC

Scott McClellan presents at 20th RDA Plenary’s Session on Materials Science Ontologies