Matching Web Directories

The focus of this task is to evaluate performance of existing matching tools in real world taxonomy integration scenario. Our aim is to show whether ontology matching tools can effectively be applied to integration of "shallow ontologies".

Data sets

The evaluation dataset was extracted from Google, Yahoo and Looksmart web directories. The specific characteristics of the dataset are:

More than 4500 of node matching tasks, where each node matching task is composed from the paths to root of the nodes in the web directories.
Expert correspondences for all the matching tasks.
Simple relationships. Basically web directories contain only one type of relationship so called "classification relation".
Vague terminology and modeling principles: The matching tasks incorporate the typical "real world" modeling and terminological errors.

This implies that the task will be challenging from a technological point of view, but there is guidance for tuning the matching approach that needs to be taken into account. The paper describing the datatset construction methodology is TaxME 2.

The node matching tasks are represented by pairs of OWL ontologies, where the classification relation is modeled as OWL subClassOf construct. Therefore all OWL ontologies are taxonomies, i.e., they contain only classes (without Object and Data properties) connected with subclass relation. Thus, for example, the first matching task is to find a correspondence between 1/source.owl and 1/target.owl and to output it to a file called 1/yourname.rdf. The matching tasks are numbered from 1 to 4639. The tasks are the same as previous years but randomly shuffled.

The directory dataset can be downloaded from here.

A set of partial reference correspondences can be downloaded from here. These reference correspondences are the same as those published in OAEI-2005. However, considering that the directory dataset has grown in size since 2005, the set of reference alignments disclosed here accounts for 49% of the current set of alignments. These alignments can be used together with AlignAPI to compute recall, an example command can be found here.

Modalities

The task is to find correspondences between classes in the ontologies. In order to find the correspondence any information in the two models can be used. In addition, it is allowed to use background knowledge, that has not specifically been created for the matching tasks (i.e., no hand-made correspondences between parts of the ontologies). Admissible background knowledge are "Oracles" such as WordNet, Cyc, UMLS, etc. Furthermore, results must not be tuned manually, for instance, by removing obvious wrong correspondences. Participants are encouraged to submit the results.

Schedule

The resulting alignments should be represented using the common format for alignments and send to Juan Pane for evaluation by September 1st. Quantitative indicators of matching quality (Precision and Recall) will be returned to participants till September 15th.

The complete schedule of the event can be found here.

Tool

The resulting alignments should be represented using the common format for alignments.

Contacts

The resulting alignments should be sent to Juan Pane .

Initial location of this page: http://www.disi.unitn.it/~pane/OAEI/2009/directory/