Two (2) virtual National Science Foundation Research Experience for Undergraduate research opportunities @ the Metadata Research Center, Drexel University, as part of the Harnessing (HDR) Institute for Data Driven Dynamical Design (ID4)
Dates: Mid-July through Mid-September
REU stipend: $5,500
Deadline: Rolling basis (Friday, July, 7th for first consideration)
Contacts:
Interested applicants, please sent resume and brief statement of interest (1 paragraph) indicating: 1) which REU option you would like to apply for, and 2) why you would like to participate in the REU program.
Please send your application to:
REU Option 1: Materials Science Repository Semantics
Standards are an integral component of data repository infrastructure and support of the FAIR (findable, accessible, interoperable, and reusable) data. Terminology, specifically the language (vocabulary) used to represent data, is standardized through metadata and semantic ontologies. The focus of this REU will be on investigating metadata infrastructures across a sub-set of materials science repositories, and looking specifically at the terminological representation used and alignment with semantic ontologies.
REU applicants for this project should have:
- Some disciplinary exposure to chemistry, engineering, physics, and/or materials science.
- Interest in semantic systems (terminology/vocabulary) and their value for representation, machine learning, and AI
- Appreciation standards for communication human to human, human to machine, machine to machine
- Knowledge of Excel, Tableau, Orange, or other data science software that allows analysis and visualization, or interest in learning
- Python, R, or other coding experience helpful, but not necessary
Research Goals
- Explore similarities and differences of standards and data representation practices across a subset of materials science data representations.
- Analyze and visualize data representation, specifically metadata and semantic systems.
- Assess the effectiveness of standards and identifying areas needing more attention.
Learning Goals
- Gain knowledge of metadata standards and semantic ontologies are key to the FAIR data principles.
- Advance analytical and visualization research skills
- Obtain better understanding of the relationship of standards to ML/AI
REU Option 2: Metal-Organic Frameworks (MOFs) Synthesis Extraction from Scholarly Big Data
Metal-Organic Frameworks (MOFs) are a kind of crystals (natural or synthetic) that have advanced the field of materials and solid-state sciences over the last quarter century. The synthesis procedure often reported in literature can play a critical role in data-driven discovery of Metal-organic framework materials. Unfortunately, this valuable knowledge is significantly underutilized as it remains buried in text, which is unstructured and not machine understandable. This challenge is exasperated because it is simply not feasible for human researchers to read every single article in their fields, given there are over thousands of publications, and the number is still growing exponentially. In this project, students will work with researchers in Drexel University’s Metadata Research Center, University of Central Florida and Colorado School of Mines, connected with the NSF/ID4 (Institute for Data Driven Dynamical Design) project. The focus will be on investigating the use of natural language processing techniques to extract key synthesis knowledge from unstructured text data. We seek to develop robust deep learning models which enable automatic knowledge extraction and ultimately construct knowledge graphs from scholarly corpus. REU summer students will gain deeper understanding of natural language processing and use of large pre-trained language models through the text annotation process.
Research Goals
- Pre-train language models for downstream NLP tasks in materials science
- Develop different deep learning models to improve extraction performance
- Construct solid external knowledge sources (e.g., taxonomy, ontology) for future research
Learning Goals
- Gain knowledge of deep learning frameworks such as Pytorch
- How to generate language representations as features for deep learning models
- Obtain better understanding of the complete workflow of information extraction (named entity recognition/relation extraction)