Task 2: Real-World Challenge: Aligning Web Directories

The focus of this task is to evaluate performance of existing alignment tools in real world taxonomy integration scenario. Our aim is to show whether ontology alignment tools can effectively be applied to integration of "shallow ontologies".

Data sets

The evaluation dataset was extracted from Google, Yahoo and Looksmart web directories. The specific characteristics of the dataset are:

More than 4500 of node matching tasks, where each node matching task is composed from the paths to root of the nodes in the web directories.
Expert mappings for all the matching tasks.
Simple relationships. Basically web directories contain only one type of relationship so called "classification relation".
Vague terminology and modeling principles: The matching tasks incorporate the typical "real world" modeling and terminological errors.

This implies that the task will be challenging from a technological point of view, but there is guidance for tuning matching approach that needs to be taken into account. The paper describing the datatset construction methodology is TaxME 2.

The node matching tasks are represented by pairs of OWL ontologies, where classification relation is modeled as OWL subClassOf construct. Therefore all OWL ontologies are taxonomies (i.e., they contain only classes (without Object and Data properties) connected with subclass relation. The dataset can be downloaded from here. The matching tasks are numbered from 1 to 4639. The tasks are the same as previous years but randomly shuffled. Thus, for example, the first matching task is to find a mapping between 1/source.owl and 1/target.owl and to output it to a file called 1/yourname.rdf.

Modalities

The task is to find an alignment between classes in the ontologies. In order to find the alignment any information in the two models can be used. In addition, it is allowed to use background knowledge, that has not specifically been created for the alignment tasks (ie no hand-made mappings between parts of the ontologies). Admissible background knowledge are "Oracles" such as WordNet, Cyc, UMLS, etc. Further, results must not be tuned manually by for instance removing obviously wrong mappings. Participants are encouraged to submit the results for all representations of the dataset (i.e., both for node matching tasks and large scale taxonomies).

Schedule

Results of the alignment should be represented using the common format for alignments and send to Juan Pane for evaluation by September 1st. Quantitative indicators of matching quality (Precision and Recall) will be returned to participants till September 15th.

The complete schedule of the event can be found here.

Tool

Results of the alignment should be represented using the common format for alignments.

Contacts

Results should be sent to Juan Pane .

Initial location of this page: http://www.disi.unitn.it/~pane/OAEI/2008/directory/.