News & Events

AI-Ready Data: Navigating the Dynamic Frontier of Metadata and Ontologies

ID4: Institute of Data Driven Dynamical Design

Hosted by the Metadata Research Center, College of Computing & Informatics, Drexel University

AI-ready data refers to the high-quality and well-prepared data that is optimized for use in artificial intelligence (AI) applications. AI-ready data increasingly encompasses the inclusion of metadata and ontologies to enhance the value and usability of data. Metadata provides essential context and information about the data, and ontologies offer structured semantic representation of a particular domain. These additional layers of information help data scientists,data scientists, researchers, and AI systems understand, interpret, and apply appropriate algorithms and models for analysis. Metadata and ontologies enable consistent data integration, interoperability, and knowledge sharing across systems, while facilitating more knowledgeable AI applications. Additionally, these systems are proving vital for supporting the FAIR (Findable, Accessible, Interoperable, and Reusable) principles and reproducible computational research (RCR).

Despite these capacities, approaches for developing, implementing, and sustaining metadata and ontologies within AI-ready data pipelines remain inconsistent, cumbersome, and lack sufficient support. Challenges underlie the full data lifecycle from data creation, collection, and research, to longer-term aims of data preservation, archiving, reuse and support for research reproducibility. Collective, community driven efforts are needed to address current obstacles and maximize the value and reliability of data. The AI-Ready Data: Navigating the Dynamic Frontier of Metadata and Ontologies workshop is a step toward addressing this challenge. This workshop will bring together a community of individuals with expertise across the data lifecycle to discuss issues, share solutions, and chart a path forward for addressing key challenges in preparing AI-ready data for scientific research. 

Specific workshop goals are to:

  1. Collectively define the state of AI-ready data challenges in the metadata and ontology space
  2. Share current successes and solutions leveraging metadata standards and ontologies.
  3. Contribute to a road map to accelerate the preparation of data for artificial intelligence (AI) applications.

Current topics

  • What is AI ready data
  • Research Bottlenecks: Data Life Cycle Challenges and Solutions with Scientific Data
  • Metadata and Ontologies: Human in the Loop in the Era of LLMs
  • Annotation: Large-scale Data and Balancing Human and Machine Driven Approaches 
  • Standards Development, Adoption, and Implementation: Realities and Fictions
  • Knowledge Graphs
  • Ontology Guided Knowledge Extraction: Leveraging Scholarly Big Data for Scientific Discovery
  • Future Directions with Metadata and Knowledge Organization Systems
News & Events

2024 Alice B. Kroeger Talk sponsored by the Metadata Research Center, College of Computing & Informatics, Drexel University

Navigating the Data Deluge: AI, Infrastructure, and Decision-Making in the Era of Big Data

Joshua C. Agar, Assistant Professor Department of Mechanical Engineering and Mechanics, Drexel University

  • Date/time: Wednesday, February 28, 2024, @ 12:00 PM ET
  • Location/in person: Room 912 (9th floor), College of Computing & Informatics (CCI), Drexel University, 3675 Market Street (please send your name to: mrc.metadata@drexel.edu, for CCI access).
  • Virtual attendees, email mrc.metadata@drexel.edu for ZOOM link invite.

Science has traditionally harnessed data to inform decisions. Historically, data was sufficiently low-dimensional and manageable for human processing. However, the rapid expansion of sensing technologies across disciplines has overwhelmed traditional human-centric methods with vast, high-velocity data streams from diverse and often unreliable sources. Despite the remarkable advances in computers and large language models like ChatGPT, their capabilities remain limited. Current AI algorithms predominantly excel in interpolation, not extrapolation, leading to unrealistic and nonsensical outputs when stretched beyond their training data.

This talk explores the intersection of massive data influx and AI, focusing on their limitations and potential in enhancing decision-making, particularly in data-driven infrastructure. We propose a “humanistic carrot” – not the “stick” approach to address pressing challenges in scientific data management, spotlighting DataFed – a comprehensive data management system. This platform facilitates autonomous pipelines for the curation, sharing, searching, and fine-grain access control of data and metadata. We demonstrate how DataFed can streamline data management for experimentalists, enhancing data stewardship while reducing their workload.

We also delve into the intricacies of handling high-velocity data streams, where gigabits per second of data necessitate immediate processing for critical decision-making or autonomous control. This section covers deploying high-availability inference servers for on-demand data analysis and reduction. Additionally, we explore the concept of AI co-design, where algorithms are optimized to fit on programmable logic, enabling rapid, intelligent analysis, decision-making, and control on ultra-low cost, low-power devices at unprecedented speeds. Finally, we discuss the broad applicability of these methodologies across various fields, from particle physics to astronomy, highlighting their potential to revolutionize our approach to data and AI integration.

Dr. Joshua C. Agar is an Assistant Professor in the Department of Mechanical Engineering and Mechanics at Drexel University. With a foundational background in experimental materials science, Dr. Agar is predominantly renowned for his pioneering contributions to AI algorithms, computing infrastructure, and the development of cyber-physical systems in the fields of materials synthesis and microscopy. His expertise has been applied across a wide array of disciplines, including particle and plasma physics, materials science, and fluid dynamics. An active member of various AI communities, particularly the FastML community, which emphasizes ultra-low latency ML co-design, Dr. Agar has earned recognition as a leader in AI innovation. His work has garnered attention from prestigious institutions such as the National Academy of Engineering and the National Science Foundation