News & Events

Summer 2024 NSF Research Experiences for Undergraduates (REU) Opportunities at the MRC

Virtual (or in person) National Science Foundation Research Experience for Undergraduate research opportunity @ the Metadata Research Center, Drexel University, as part of the Harnessing (HDR) Institute for Data Driven Dynamical Design (ID4)

Dates: Mid-June through Mid-September 
(Flexibility with start date, and opportunity to continue work over Fall ‘24 term.) 

REU stipend: $5,500

Deadline: Rolling basis (Friday, June 1st for first consideration)

Contacts: Interested applicants, please send a resume and brief statement of interest (1 paragraph) indicating why you would like to participate in the REU program. Please send your application to:

REU Project title: Materials Science Vocabulary Building: Establishing a YAMZ Portal

Project overview and description: Agreement on terminology is critical for human and machine communication supporting scientific research. Additionally, shared vocabulary provides a necessary foundation of data and metadata standards, as well as the basis for labels in machine learning pipelines. This REU project will develop and enhance YAMZ.net by creating a domain-specific portal for materials science and exploring AI integration. YAMZ is a general purpose crowdsourced, online dictionary using reputation-based voting to support community discussion and consensus. Project REUs will:

  • Develop and test the domain specific portal in the materials science subdomain
  • Explore and pilot integrating ChatGPT for drawing in definitions
  • Document project procedures to enable a generalizable model that can, on demand, present users with a constrained view (or portal) restricted just to terms from the materials science subdomain
  • Collaborate with project mentors and project staff on a scholarly output (e.g., conference poster, presentation, research paper)

REU applicants for this project should have

  • Exposure and instruction in at least one of the following disciplines: computer science, data science, chemistry, engineering, physics, and/or materials science
  • Interest in semantic systems (terminology/vocabulary) and their value for representation, machine learning, and AI
  • Knowledge of the value of data standards for communicating human to human, human to machine, and machine to machine 
  • Knowledge of database and data science software (SQL, Tableau, Orange, etc.)
  • Python, Flask or similar web framework, or other coding experience

Applicant restrictions

  • Must be a non-Drexel undergraduate (not graduated)
  • May work remotely or onsite
  • Must be a U.S. citizen or permanent resident of the United States or its possessions

Research Goals

  • Advance YAMZ.net features supporting domain specific portals (e.g., tagging, group ownership of terms and portals).
  • Explore and pilot AI integration into YAMZ.net.
  • Develop ways for domain-specific communities to be mostly self-sufficient in creating and managing portals.

Learning Goals

  • Gain R&D experience with a working online dictionary, and understand tradeoffs between domain-agnostic and domain-specific portals
  • Advance semantic research and data science/computer science skills
  • Obtain a better understanding of the complexity of questions surrounding terminology agreement and its importance for scientific communication and research
News & Events

Summer 2023 NSF Research Experiences for Undergraduates (REU) Opportunities at the MRC

Two (2) virtual National Science Foundation Research Experience for Undergraduate research opportunities @ the Metadata Research Center, Drexel University, as part of the Harnessing (HDR) Institute for Data Driven Dynamical Design (ID4)

Dates: Mid-July through Mid-September

REU stipend: $5,500

Deadline: Rolling basis (Friday, July, 7th for first consideration)

Contacts:

Interested applicants, please sent resume and brief statement of interest (1 paragraph) indicating: 1) which REU option you would like to apply for, and 2) why you would like to participate in the REU program.

Please send your application to:

REU Option 1: Materials Science Repository Semantics

Standards are an integral component of data repository infrastructure and support of the FAIR (findable, accessible, interoperable, and reusable) data. Terminology, specifically the language (vocabulary) used to represent data, is standardized through metadata and semantic ontologies. The focus of this REU will be on investigating metadata infrastructures across a sub-set of materials science repositories, and looking specifically at the terminological representation used and alignment with semantic ontologies.

REU applicants for this project should have:

  • Some disciplinary exposure to chemistry, engineering, physics, and/or materials science.
  • Interest in semantic systems (terminology/vocabulary) and their value for representation, machine learning, and AI
  • Appreciation standards for communication human to human, human to machine, machine to machine 
  • Knowledge of Excel, Tableau, Orange, or other data science software that allows analysis and visualization, or interest in learning
  • Python, R, or other coding experience helpful, but not necessary

Research Goals

  • Explore similarities and differences of standards and data representation practices across a subset of materials science data representations.
  • Analyze and visualize data representation, specifically metadata and semantic systems.
  • Assess the effectiveness of standards and identifying areas needing more attention.

Learning Goals

  • Gain knowledge of metadata standards and semantic ontologies are key to the FAIR data principles.
  • Advance analytical and visualization research skills
  • Obtain better understanding of the relationship of standards to ML/AI

REU Option 2: Metal-Organic Frameworks (MOFs) Synthesis Extraction from Scholarly Big Data

Metal-Organic Frameworks (MOFs) are a kind of crystals (natural or synthetic) that have advanced the field of materials and solid-state sciences over the last quarter century. The synthesis procedure often reported in literature can play a critical role in data-driven discovery of Metal-organic framework materials. Unfortunately, this valuable knowledge is significantly underutilized as it remains buried in text, which is unstructured and not machine understandable. This challenge is exasperated because it is simply not feasible for human researchers to read every single article in their fields, given there are over thousands of publications, and the number is still growing exponentially. In this project, students will work with researchers in Drexel University’s Metadata Research Center, University of Central Florida and Colorado School of Mines, connected with the NSF/ID4 (Institute for Data Driven Dynamical Design) project. The focus will be on investigating the use of natural language processing techniques to extract key synthesis knowledge from unstructured text data. We seek to develop robust deep learning models which enable automatic knowledge extraction and ultimately construct knowledge graphs from scholarly corpus. REU summer students will gain deeper understanding of natural language processing and use of large pre-trained language models through the text annotation process.

Research Goals

  • Pre-train language models for downstream NLP tasks in materials science
  • Develop different deep learning models to improve extraction performance
  • Construct solid external knowledge sources (e.g., taxonomy, ontology) for future research

Learning Goals

  • Gain knowledge of deep learning frameworks such as Pytorch
  • How to generate language representations as features for deep learning models
  • Obtain better understanding of the complete workflow of information extraction (named entity recognition/relation extraction)