**Applications for LEADING 2023 are now closed**
**LEADING is a virtual fellowship program. Fellows located near their selected site may have more on-site interaction.**
The LEADING fellowship is for early-to-mid career LIS professionals or doctoral students enrolled in iSchool or ALA-accredited LIS programs.
LEADING Fellows will complete the following:
- Online preparatory curriculum (10-15 hours, May 2023)
- Online interactive boot camp led by data science faculty. The boot camp will run the first 3 weeks of June (2 days a week, approximately 4 hours each day) [Example 2022 Schedule]
- Development of communication plan to connect with mentors on a regular basis
- Six-month virtual data science internship coordinated with one of our LEADING Mentors. See member node project descriptions below.
- Development of research output: in the form of papers, posters, and presentations
- Completion of final report documenting methods, outputs, and recommendations for project mentors (template to be provided)
For inquiries about this unique educative program, please contact the Metadata Research Center (email@example.com), or Jane Greenberg directly.
- Fellowship stipend: up to $6,000 (~15 hours per week, July-December 2023; (work is virtual and asynchronous; you will have several ZOOM check-ins with your mentor/s and project faculty/PIs)
- Barring any Covid-19 restrictions, additional financial support will be provided for conference travel during the 2023/2024 academic year to share project outcomes
- Early and mid-career applicants: must hold a master’s degree earned by or before May 2023 (LIS or other disciplines acceptable)
- Doctoral students: need to be enrolled in iSchool or ALA-accredited LIS programs or a connected program.
- International applicants must reside in the U.S., have access to CPT support through your U.S. home institution, and be eligible to work in the U.S.
- Complete the application form and upload the requested application materials
- Rank your top three choices for your data science fellowship placement
Criteria for Selection
- Clear interest in data science applications in the LIS domain
- Relevant connection to the selected fellowship sites
- Strong letter of support from advisor, mentor, or current/past supervisor
Commitment to Diversity, Equity, and Inclusivity (DE&I)
The LEADING program is committed to attracting and recruiting a diverse cohort of fellows. The LEADING team will strive to provide learning environments that are welcoming, inclusive, and equitable. The College demonstrates its commitment and support through the CCI Diversity, Equity & Inclusion Council with representation of current students (undergraduate and graduate), faculty and professional staff. The Council works to support CCI and University-led DE&I initiatives.
- LEADING Application Deadline: Monday, March 6th, 2023, 5:00pm EST
- Notification of Acceptance: by early April, 2023
- May-June: Online pre-curriculum and bootcamp
- July-December: Fellowship period
Application Materials Requested
- One-page statement sharing your interest in the LEADING program and the selected fellowship site. Your one-page statement must address why you seek to learn about the intersection of library science and data science, and your career goals related to becoming an educator and researcher, or furthering your library career
- Brief statement on use or training with any of the following statistical packages: Excel, SPSS, R, MATLAB, or SAS (or other package)
- A resume (1-2 pages)
- For doctoral students: A letter of reference from your advisor or a mentor
- For early-to mid-career professionals: A letter of reference from a current or previous supervisor
- Submit all materials through the application form link below
LEADING Fellowship Projects at Member Node Sites
|Project Title (Link to full project description)
|AI-Collaboratory, University of Maryland iSchool (AIC–MD)
|Impact of Japanese American Segregation during WWII on Sacramento
|Automatically extract demographics from census population schedules for the purpose of identifying Japanese households from 1940 and 1950 census records. Combine data with WRA records. Analyze and visualize data.
|Description: 1950s and 1940s Census images for Sacramento
Data Type: TIFF Image Files
Data Size: Several thousand hi-rez document images
Description: War Relocation Authority Tabular Datasets
Data Type: Comma-separated Text Files
Data Size: Hundreds of thousands of records
|Haverford College, Digital Humanities Department
|Bridge LLOD: Integrating an OER language Learning Application with Linked Linguistic Open Data Repositories
|Experiment with transforming Bridge data (textual corpora and lexica in Ancient Greek and Latin) to promote interoperability with Linked Linguistic Open Data Repositories. (e.g, LiLa, LASLA, Logeion).
|The fellow will have access to Bridge Dictionary (Latin) and Bridge Lexicon (Ancient Greek), as well as the Bridge Corpus database of texts. Other open-access linguistic corpora will also be accessed.
Description: locally developed Latin and Greek lexica, with morphological metadata, that support the suite of Bridge learning apps.
Data type: CSV file
Data size: Bridge Dictionary as ~30,000 entries, each with 16 metadata elements; Bridge Lexicon has nearly 100,000 entries. The Bridge Corpus of ancient texts comprises nearly 2.5 million words.
Description: LLOD (Linked Linguistic Open Data) repositories (LiLa, and Logeion).
Data type: SQL and SPARQL databases.
Data size: varies by repository.
|Kislak Center for Special Collections, Univ. of Pennsylvania Libraries (UPenn)
|Digging into Digital Scriptorium 2.0: Data Analysis in an LOD Environment
|Investigate how Wikibase data can be used in conjunction with other Linked Open Data datasets and sources like Wikidata, the Getty Vocabularies, and the Virtual International Authority File, among others.
|The fellow will have access to the DS Wikibase and SPARQL query service upon acceptance.
Description: Metadata related to the production and transmission of global premodern manuscripts held in US collections
Data Type: Wikibase items, entities, and properties
Data Size: ca. 5000+ records to begin with
|Loretta C. Duckworth Scholars Studio, Temple University Libraries (LCDSS)
|Enhancing and Visualizing Philadelphia Black Art, Artists, and Representations
|Build upon Wikidata, 3D models, maps, and other data produced from LEADING 2022 fellows to produce additional data visualizations, develop an Android AR experience, create additional GIS maps, and/or generate models of public art to reflect how WikiData represent the historical diversity of artists in Philadelphia.
|Description: Data includes a range of media formats including, SPARQL and Python scripts, 3D models of public art, and GIS maps
Data Type: .fbx 3D model files, .gpx GIS shape files, SPARQL and Python scripts
Data Size: 10-20 GB
|Montana State University Library
|Recognizing Bias in Collection Metrics: Analyzing Institutional Scholarship and Grant Networks to Inform Collection Development Decisions
|Analyze Montana State University (MSU) scholarship published from 1960-current, using a corpus of research generated via OpenAlex. Create data visualizations, such as collection profiles or researcher resource recommendations, in order to tell a compelling story about how resources might be used over time at the MSU Library
|Description: Montana State University grant award data provided by local offices or national aggregators, OpenAlex research corpus with MSU authors, and MSU eResources subscriptions list.
MSU Awarded Grant data (1982-2021)
MSU Scholarship (OpenAlex) – showing JSON from API; have data as SQL as well.
● MSU: [LINK]
● MSU total works: [LINK]
● MSU total works 2022: [LINK]
MSU List of Journals (subscriptions)
Data Type: JSON, CSV, SQL
Data Size: MBs
|Movement Alliance Project—People’s Media Record
|People’s Media Record: Activist Media Metadata
|Develop and improve processes and workflows for generating quality and responsible/ethical descriptive metadata records for audio/video media files for associated production materials.
|Description: metadata records for video files, video files, associated production materials (photos, transcripts, etc), associated websites and social media accounts related to productions
Data Type: metadata records (XML, CSV), media files (varied), associated digital material (photo, transcripts, notes, etc)
Data Size: 60 TB semistructured raw media, ~23000 records
|National Science Foundation—Public Access Repository (NSF-PAR)
|OCLC—Research and Development (OCLC-R)
|Mapping DDC to FAST
|Clustering and fuzzy matching to generate Dewey Decimal numbers for each FAST Topical heading. Identify how FAST Topical terms can be mapped, the degree of strength, and value.
|Description: The provided data will be Dewey numbers and FAST topicals (tag 650) that co-occur in Worldcat bibliographic data and will include an occurrence count for the pair of values.
Data Type: CSV
Data Size: Millions of rows
|Research Organization Registry (ROR)
|Research Organization Registry: Curating a Global Registry of Affiliation Metadata
|Review metadata across ROR corpus to perform bulk data analysis and metadata quality assurance; develop, implement, and improve automation of metadata analysis and production; develop and improve automated and manual processes for curation deployment process; explore and analyze the relationship of ROR to sibling registries, such as Crossref.
|Description: ROR Registry records
Data Type: JSON, tabular
Data Size: metadata for 100,000+ research organizations
|Smithsonian Libraries (SL)
|Enhancing the Museum Data Ecosystem through Linking Research Publications to Museum Systems
|Build up on 2022 fellowship to explore and refine methods to extract textual entities from Smithsonian publication PDF files and identify patterns among specimens, taxonomic names, localities, and other entities that can enable linking to museum systems
|Description: The data consist of journal article pdfs covering calendar year 2022. This includes both articles with and without specimens cited, authored by researchers at the Smithsonian Institution.
Data Type: PDFs
Data Size: under 10GB
Further data for analysis may be identified and gathered during this fellowship.
|UC San Diego Library: Project 1
|Applying data science to user experience and inclusive design for library websites
|Examine the application of data science to digital user experience and inclusive design. Use data science and/or machine learning tools to analyze UC San Diego’s web presence, applying an inclusive design lens to consider user expectations, behaviors, and experiences. Develop a data-informed approach to identifying improvements to both structure and content (e.g. accessibility, usage, impact, content effectiveness) as part of an overall analysis of digital user experiences for diverse communities and audiences.
Website usage and user behavior data as tracked through Siteimprove and hotjar. Website content (corpora)
Structured, quantitative data; data visualizations; user behavior recordings
|UC San Diego Library: Project 2
|Creating a community of practice in the LEADING fellowship program
|Examine the perceived value of and engagement in the emerging community of practice (CoP) within the LEADING fellowship program. Work with the team to analyze previously gathered data using mixed method and data science-informed techniques, identify interventions intended to increase CoP member engagement, recommend interventions for the LEADING program to implement, and use mixed method assessments (e.g., semi-structured interviews, questionnaires, etc.) to evaluate their effectiveness.
|Description: Mixed method survey data and transcripts from semi-structured interviews
Data Type: CSV, Text
Data Size: less than 1TB
|University of New Mexico College of University Libraries and Learning Sciences
|Search Stories: Developing methodologies to interpret search behavior of users of institutional repository content.
|Develop scalable and reproducible methods for web scraping, text extraction, and computation to maximize the research and analytic potential of combining institutional repository data (from RAMP) with data from openly available data sources including the World Bank, IPEDS, Crossref, and others.
|Description: Search engine performance data harvested from Google Search Console daily via API. RAMP data extend from January 2017 – now. Full documentation about RAMP data and data processing methods is available from the references below.
Complete descriptive metadata in simple Dublin Core format for all items from 57 repositories that participate in RAMP. The data are currently stored in a SQL database that is not publicly accessible, but we have shared and rebuilt the database by sharing schema and CSV data dumps.
Data Type: JSON, tabular, SQL database
Data Size: 100+ GB, 400 million + rows in tabular format.
|University of Rochester (ROC)
|Understanding Data Deposits
|Using APIs, refine data collection techniques and conduct analyses on UR researcher data deposits in disciplinary data repositories. Analyze researcher behaviors to understand who is depositing data, where are they depositing it, how large are the datasets, and what formats are submitted/supported.
|The fellow will be given two data sources to start and explore API sources for additional data points.
Description: Dataset information from Dimensions filtered by UR researchers with information about data type, repository, date submitted, and researchers.
Data type: CSV file
Description: Data from our Data Repository, Figshare. Including information about deposits with dates, researchers, and file types.
Data type: CSV file
Description: Data from disciplinary repositoriesdiscussed during the project and harvested for co-analysis.
Data type: Depends on repository
(Additional partners members may join, contact: MRC Metadata at: firstname.lastname@example.org)
Please reach out to email@example.com if interested in learning more about joining the LEADING network.