Dryad/HIVE Evaluation

HIVE Overview | Demo | HIVE Community | Publications | Archives


>Back to Research Archives

Status

Topical indexingTitle, Title+Abs, Title+Abs+KwMaui results (done)
Our algorithm (run, but need to review results)
Taxonomic indexingTitle, Title+Abs, Title+Abs+KwOur algorithm (done)
Maui (running now — 6/6)
Neti Neti (not done)
Geographic indexingTitle, Title+Abs, Title+Abs+KwOur algorithm (done)
Maui (running now — 6/6)
Stanford NER (done for BIOSIS only)

Background

This study is concerned with the evaluation of traditional automatic indexing processes using thesauri and other controlled vocabularies. The goal of automatic indexing in the Dryad/HIVE context is to produce a ranked list of indexing terms from one or more controlled vocabularies.

Automatic indexing in the context of Dryad/HIVE evaluation consists of

  1. Identifying candidate strings in text (either through NER or simple thesaurus-based matching)
  2. Matching candidate strings to terms in a controlled vocabulary
  3. Disambiguating ambiguous terms
  4. Filtering and ranking the final list

There has been a significant amount of work in the areas of named-entity recognition (NER) for taxonomic names and geographic placenames in text. The goal of NER is to identify certain types of information in text. NER techniques often use existing vocabularies to develop/train models. However, NER and automatic indexing techniques are different — but can be complimentary. NER processes can be used as an input to the automatic indexing process.

Goals

  • Simultaneously optimize precision and recall for automated subject term suggestion with multiple vocabularies from article metadata to inform submission and curation in the Dryad repository.
  • Where article metadata includes title, abstract, author-assigned keywords, and deposited data

Questions

The focus of this research is on techniques for automatic indexing: how to produce the best ranked list of terms from controlled vocabularies.

Questions:

  1. What algorithm(s) are most effective for term recommendation from the supported vocabularies?
  2. How accurate is HIVE at recommending terms from vocabularies?

Operational questions:

  1. How close can we get to reproducing the professional indexing in the “gold set”?

Selected Vocabularies

Question: What are the best vocabularies for indexing in Dryad?

Based on an earlier analysis of Dryad vocabulary needs, the following vocabularies are used in this study:

  • Medical Subject Headings (MeSH) for subject indexing
  • Getty Thesaurus of Geographic Names (TGN) for geographic indexing
  • Integrated Taxonomic Information System (ITIS) for taxonomic indexing.

Test collections

The test collection (“gold standard”) consists of 180 Dryad records with associated indexing in PubMed and BIOSIS Previews. The collection includes Dryad metadata, PubMed metadata, and BIOSIS Previews metadata with subject indexing, geographic indexing, and taxonomic indexing mapped to three above vocabularies: MeSH, TGN, and ITIS respectively. For more information, see Dryad Gold Set. A separate publication is in progress describing the test collection creation process.

The Dryad test collection is intended to support the evaluation of automatic indexing processes and is not useful for strict NER tasks.

Algorithm evaluation

Questions:

  • What is the best algorithm for automatic term suggestion for Dryad vocabularies?
  • Do different algorithms perform better for title, abstract, full-text, data?
  • Do different algorithms perform better for a particular vocabulary

Using the “gold sets” generated in (B), evaluate the performance of different algorithms for automatic term suggestion from controlled vocabularies. Evaluate different approaches to matching the free-text terms assigned by depositors to controlled terms.

Methods

Topical indexing

  • Question: How does Maui perform with MeSH on Dryad records?
  • Evaluate Maui performance on the Dryad test collection
  • Tune parameters to maximize precision@k, recall@k, and f1
  • Data: title, abstract, and data (time permitting)
  • Compare results to PubMed/MeSH and BIOSIS/MeSH (mapped)

Taxonomic indexing

  • Question: How best to index taxonomic names?
  • Maui is not intended to support taxonomic indexing
  • Compare simple thesaurus-based matching process to NER (NetiNeti?) and possibly to Maui
  • Evaluate disambiguation and ranking processes
  • Data: title, abstract, and data (time permitting)
  • Compare results to PubMed/ITIS (mapped) and BIOSIS/ITIS (mapped)

Geographic indexing

  • Question: How best to index geographic names?
  • Maui is not intended to support geographic indexing
  • Compare simple thesaurus-based matching process to NER (Stanford) and possibly to Maui
  • Evaluate disambiguation and ranking process
  • Data: title, abstract, and data (time permitting)
  • Compare to PubMed/TGN (mapped) and BIOSIS/TGN (mapped)
  • Note: Much of this work was completed in master’s paper

Other (time permitting):

  • Question: How do Dryad author-supplied terms compare to PubMed and BIOSIS indexing? (no automatic indexing)
  • Question: How to PubMed and BIOSIS indexing compare? (no automatic indexing)
  • Compare Dryad author-supplied indexing, PubMed indexing, and BIOSIS indexing independent of automatic indexing.

Measures

  • The first pass will be evaluated using precision@k, recall@k, f1@k
  • Disambiguation will be evaluated using average rank (AR) (or possibly mean reciprocal rank)
  • Time permitting, receiver operator characteristic (ROC) will be used. This requires modification to Maui, which currently supports only p/r/f1@k
  • Time permitting, incorporate a new measure that accounts for thesaurus hierarchy. ROC, precision and recall work with exact matches, whereas thesauri can support partial matches through the selection of broader or narrower terms.