AI-Ready Data: Navigating the Dynamic Frontier of Metadata and Ontologies / Agenda

ID4: Institute of Data Driven Dynamical Design

April 15, 2024

8:15 AM-9:00 AM Coffee/light breakfast
9:00 AM-9:15 AM Session 1: Opening Session
• Welcome (Professors, Jane Greenberg and Yuan An, CCI/Drexel, ID4)
• Workshop goals and ground rules (Jane Greenberg)
9:15 AM-9:30 AM   Session 2: Ice breaker and breakout group activity (all workshop attendees)
What is AI-ready data? Group definitions
Why/when it’s important to consider (or not consider) metadata and semantically-oriented ontological systems?
9:30 AM-9:55 AM Session 3: Keynote
• FAIR AI-Ready Data and AI Models in Particle Physics (Mark Neubauer, Professor, University of Illinois at Urbana-Champaign, A3D3)
(Session moderator, Christine Kirkpatrick, UC San Diego Supercomputer Center)
9:55 AM-10:35 AM   Session 4: Research Bottlenecks: The BGNN to Imageomics Case Study
• AI-Readiness BGNN to Imageomics: An HDR Story (Joel Pepper, et al, Drexel University)
• Our Journey on AIR: Problems and Solutions (Yasin Bakis, Tulane University Biodiversity Research Institute (TUBRI), BGNN/Imageomics)
• FAIR, Modular and Reproducible ML Workflows for Domain Scientists: An Imageomics Case Study (Hilmar Lap, Duke University)
(Session moderator: David Breen, Drexel, BGNN/Imageomics)
10:35 AM-10:50 AM AM coffee break
10:50 AM-11:40 AM Session 5A: Human in the Loop: Curation, Data Annotation, and Metadata Generation for ML/AI
• Curating Human-Robotics Training Datasets for Machine Learning (Maria Esteva, Texas Advanced Computing Center/HDR iHARP)
• LLM for MOF Synthesis Labeling (Xintong Zhao, et al, Drexel University, and Univ. Central Florida collaborators, ID4)
• Ground Truth: Metadata Accuracy Dilemmas in Training AI/ML Models (Bahareh Shakibajahromi, ZF Passive Safety Systems)
(Session moderator: Richard Marciano, AI-Collaboratory, University of Maryland, College Park)
11:40 AM-12:15 PM Session 5B: Human in the Loop: Curation, Data Annotation, and Metadata Generation for ML/AI
• Harnessing Generative AI to Support Exploration and Discovery in Library and Archival Collections (Lori Perine, Rajesh Kumar Gnanasekaran, & Richard Marciano, AI-Collaboratory, University of Maryland, College Park)
• Image Informatics: Automatic Metadata Extraction for ML Applications (Andrew Senin, Susquehanna Int’l Group, BGNN/Imageomics)
(Session moderators and group discussion/identifying themes: Jianwu Wang, UMBC, iHARP, and Mark Underwood, Information Security Strategic Initiatives Advisor)
12:15 PM-1:15 PM Lunch
1:15 PM-1:45 PM Session 6: Knowledge Extraction, Ontologies and Semantic Systems for AI
• Semantic Technology and Artificial Intelligence Applications in Earth and Environmental Science (Anne Thessen, Anschutz School of Medicine, Univ. of Colorado)
• HIVE-4-MAT:  Infrastructure for Leveraging Materials Science Ontologies for AI (Kio Polson, et al, Drexel University, ID4)
(Session moderator: Mark Neubauer, UIUC, A3D3)
1:45 PM-2:45 PM Session 7: AI Empowered Knowledge Graphs
• Open Knowledge Networks, Knowledge Graphs and AI-ready Data (Florence Hudson, Northeast Big Data Hub)
• LLMs and Knowledge Graphs (Bowen Jin, UIUC, I-GUIDE)
• Knowledge Graph Question Answering in Materials Science (KGQA4MAT) (Yuan An, et al, Drexel University, ID4)
• Graphical Materials Histories: Making the Invisible Visible (David Elbert, Johns Hopkins University)
(Session moderator: Alex Kalinowski, Drexel University)
2:45 PM-3:00 PM Afternoon coffee break/snacks
3:00 PM-3:15 PM Session 8: Another View of AI-Ready Data
• AI-ready Geospatial Data: What is AI ready data (Wei Hu, UIUC, I-GUIDE)
(Session moderator: Fernando Uribe-Romo, University of Central Florida, ID4)
3:15 PM-3:45 PM Session 9: Group activity: Revising AI definition/s and discussion topics
Brief activity

• AI-ready definition/s modifications

• New ideas/modification on why/when it’s important to consider metadata and semantically-oriented ontological systems?
New focus
• Identify educational challenges and opportunities specific to metadata and semantic ontologies for AI-ready data
• What topic/s are missing from today’s discussion
• Prep for brief around-the-room group report-outs
3:45 PM-4:00 PM Session 10: Day 1: Closing session
• Around-the-room group report-outs
• Day 2 plans and preparations
• Poster shout-it-outs
(Session moderators: Jane Greenberg and Yuan An, Drexel, ID4)
4:00 PM-5:30 PM Poster session (College of Computing & Informatics, 10th floor Lobby of 3675, Market Street, Philadelphia, PA—same building as the Quorum)

April 16, 2024

8:15 AM-9:00 AM Coffee/light breakfast
9:00 AM-9:15 AM Session 1: Opening Session
• Welcome and workshop goals Day-2 (Jane Greenberg, Drexel, ID4)
• Reflections on Day 1 topics (moderator/s t.b.c.)
9:15 AM-10:00 AM   Session 2A: Data Management, FAIR practices, and Prepping for AI-Ready Pipelines
• Dryad and re-curation (Ryan Scherle, Dryad Repository and CIC/Northeast Big Data Hub)
• DataFed: Making Science Repeatable with ML Pipelines (Joshua Brown, Oak Ridge National Laboratory)
• Showcase: Using FishAIR within a Data Production Pipeline (Xiaojun Wang, Bahadir Altintas, Tulane University Biodiversity Research Institute)
(Session moderator: Megan Force, Clarivate)
10:00 AM-10:45 AM Session 2B: Data Management, FAIR practices, and Prepping for AI-Ready Pipelines
• FAIR Re-use: Implications for AI-Readiness (Lydia Fletcher, Texas Advanced Computing Center, iHARP)
• FAIR and ML, AI Readiness and AI Reproducibility (FARR) (Christine Kirkpatrick, UC San Diego Supercomputer Center)
• AI-ready data and distinctions from FAIR data (Zachary Trautt, National Institute of Standards and Technology)
(Session moderator: Juliane Schneider, Pacific Northwest National Laboratory (PNNL))
10:45 AM-11:00 AM AM coffee break
11:00 AM-12:00 PM Session 3: Standards Development, Adoption, and Implementation: Realities and Fictions
• NISO Standards Processes (Nettie Lagace, NISO) 
• Research Data Alliance (RDA): A Global Standards Making Community (Robert Quick, Univ. of Indiana)
• Innovating the Standards Process with YAMZ: Yet Another Metadata Zoo and AI Implications (Isabel Moreira de Oliveira, et al, Princeton University, and Scott McClellan, Drexel University, et al, ID4)
• MaRDA/MaRCN: AI Efforts, Working Groups & Best Practices (Laura Bartolo, MaRCN) 
(Session moderator: David Elbert, John’s Hopkins University)
12:00 PM-1:15 PM Lunch and optional tour/s
• Drexel’s historical building and Philadelphia skyscape view
• On your own, go see ENIAC
• Rest/re-set for final session
1:15 PM-2:15 PM Session 4 Industry/Government panel
• Semion Saikin, Kebotix, ID4
• Mark Underwood, Co-founder, Information Security Strategic Initiatives Advisor
• Juliane Schneider, Pacific Northwest National Laboratory (PNNL)
(Session moderator: Rachel Frick, OCLC)
2:15 PM-3:00 PM Session 5: AI Research Reproducibility, Validity, and Sharing Models
• A Field Polarized by AI: How to Navigate the Conclusions and Delusions (Josh Agar, Drexel University, collaborator with A3D3 researchers)
• Metadata for Cloud-based Reproducibility (Jianwu Wang, UMBC, iHARP)
• AI Data Readiness and Model Sharing in Computational Health and Climate Sciences (Sanjay Purushotham, UMBC, iHARP)
(Session moderator: Shih-Chieh Hsu, University of Washington, A3D3)
3:00 PM-3:15 PM Afternoon coffee break
3:15 PM-3:40 PM Session 6: Final Group activity/discussion
• Finalize group definition/s on AI-ready data
• Final statements on importance of metadata and semantically-oriented ontological systems?
• Concrete steps and dream ideas for advancing AI-Ready data approaches with metadata and semantic ontologies
3:40 PM-4:00 PM Session 7: Final reporting of groups/around the room
4:00 PM-4:30 PM Session 8: Workshop wrap-up
• Collective white paper logistics
• Workshop thank yous/closure

NOTE: Agenda is fairly well set, but subject to change.