Matching Web Directories

The focus of this task is to evaluate performance of existing matching tools in real world taxonomy integration scenario. Our aim is to analyze whether ontology matching tools can effectively be applied to integration of "shallow ontologies". Particularly, this year we have used two modalities:

Small tasks: this modality corresponds to the last year directory track and aims at testing single matching tasks.
Single task: this modality contains only one matching task. The source and the target directories to be matched contain 2854 and 6555 nodes respectively. The Single task modality in the Directory track was cancelled due to lack of resources needed to cross check the reference alignments.

Data sets

The evaluation datasets for both modalities were extracted from Google, Yahoo and Looksmart web directories. The specific characteristics of the datasets are:

Common characteristics:

Simple relationships. Basically web directories contain only one type of relationship so called "classification relation".
Vague terminology and modeling principles: The matching tasks incorporate the typical "real world" modeling and terminological errors.

Small tasks:

More than 4500 of node matching tasks, where each node matching task is composed from the paths to root of the nodes in the web directories.
Rerefence correspondences for the equivalence relation for all the matching tasks.

Single task:

A single matching task where the aim is to find the correspondences between the directory nodes, where each directory contains 2854 and 6555 nodes respectively.
Reference correspondences for the matching task. This task includes, besides the equivalence relation, more general and less general relations.

These characteristics make the tasks challenging from a technological point of view, but there is guidance for tuning the matching approach that needs to be taken into account. The paper describing the datatset construction methodology for both modalities is TaxME 2.

In both modalities the node matching tasks are represented by pairs of OWL ontologies, where the classification relation is modeled as OWL subClassOf construct. Therefore all OWL ontologies are taxonomies, i.e., they contain only classes (without Object and Data properties) connected with subclass relation.

Modalities

In the small tasks modality, the first matching task is to find a correspondence between 1/source.owl and 1/target.owl and to output it to a file called 1/yoursystem.rdf. The matching tasks are numbered from 1 to 4639. The tasks in the small tasks modality are the same as last year, since we try to make a comparative study on the evolution of the participating systems from last year to the date. Freezing the dataset means that we do not introduce any new variable affecting the comparisson. The directory dataset for the small tasks modality can be found in the downloads area. A set of partial reference correspondences can also be found in the downloads area. These reference correspondences are the same as those published in OAEI-2005. However, considering that the directory dataset has grown in size since 2005, the set of reference alignments disclosed here accounts for 49% of the current set of alignments.

In the single task modality there is only one source and one target directory containing 2854 and 6555 nodes respectively. These directories correspond to a superset of all the "small" directories contained in the small tasks modality. The aim of this modality is to test the ability of current matching systems to handle and match big directories. We believe this is a realistic scenario that needs to be supported and can be found in many application areas as Google, Yahoo and Looksmart web directories come from commerce. For this modality only one result compliant with the Alignment API format has to be submitted. The file should be named "yoursystem.rdf". The directory dataset for the single task modality can be found in the downloads area. The Single task modality in the Directory track was cancelled due to lack of resources needed to cross check the reference alignments.

These alignments can be used together with the Alignment API to compute recall (see an example command). Both datasets are being developed by the S-Match.org open source project.

Downloads

Dataset for small tasks modality.

Partial reference alignment for the small tasks modality.

Dataset for single task modality.

Submission details

The task is to find correspondences between classes in the ontologies. To find the correspondences any information in the two models can be used. In addition, it is allowed to use background knowledge, that has not specifically been created for the matching tasks (i.e., no hand-made correspondences between parts of the ontologies). Admissible background knowledge are "Oracles" such as WordNet, Cyc, UMLS, etc. Furthermore, results must not be tuned manually, for instance, by removing obviously wrong correspondences. Participants are encouraged to submit the results as computed by their systems.

We expect a single "zipped" file containing the submission for both modalities with the following directory structure:

A single root folder named "yourystem".
One sub-folder for each modality named "smalltasks" and "singletask".
The "smalltasks" folder contains a sub-folder for each mathicng task, numbered from 1 to 4639. Inside each folder, the resulting correspondences should be put in a file named "yoursystem.rdf".
The "singletask" folder contains only one file with the resulting correspondences named "yoursystem.rdf".
All the results should be compliant with the Alignment format.

Schedule

The resulting alignments should be represented using the common format for alignments and send to Juan Pane for evaluation by August 30th. Quantitative indicators of matching quality (Precision and Recall) will be returned to participants till September 15th.

The complete schedule of the event can be found at the OAEI 2010 campaign page.

Format

The resulting alignments should be represented using the common format for alignments.

Contacts

The resulting alignments should be sent to Juan Pane .

Original page: http://www.disi.unitn.it/~pane/OAEI/2010/directory/ [cached: 08/03/2011]