Ontology Alignment Evaluation Initiative - OAEI-2010 CampaignOAEI

Matching Web Directories

The focus of this task is to evaluate performance of existing matching tools in real world taxonomy integration scenario. Our aim is to analyze whether ontology matching tools can effectively be applied to integration of "shallow ontologies". Particularly, this year we have used two modalities:

Data sets

The evaluation datasets for both modalities were extracted from Google, Yahoo and Looksmart web directories. The specific characteristics of the datasets are:

These characteristics make the tasks challenging from a technological point of view, but there is guidance for tuning the matching approach that needs to be taken into account. The paper describing the datatset construction methodology for both modalities is TaxME 2.

In both modalities the node matching tasks are represented by pairs of OWL ontologies, where the classification relation is modeled as OWL subClassOf construct. Therefore all OWL ontologies are taxonomies, i.e., they contain only classes (without Object and Data properties) connected with subclass relation.

Modalities

In the small tasks modality, the first matching task is to find a correspondence between 1/source.owl and 1/target.owl and to output it to a file called 1/yoursystem.rdf. The matching tasks are numbered from 1 to 4639. The tasks in the small tasks modality are the same as last year, since we try to make a comparative study on the evolution of the participating systems from last year to the date. Freezing the dataset means that we do not introduce any new variable affecting the comparisson. The directory dataset for the small tasks modality can be found in the downloads area. A set of partial reference correspondences can also be found in the downloads area. These reference correspondences are the same as those published in OAEI-2005. However, considering that the directory dataset has grown in size since 2005, the set of reference alignments disclosed here accounts for 49% of the current set of alignments.

In the single task modality there is only one source and one target directory containing 2854 and 6555 nodes respectively. These directories correspond to a superset of all the "small" directories contained in the small tasks modality. The aim of this modality is to test the ability of current matching systems to handle and match big directories. We believe this is a realistic scenario that needs to be supported and can be found in many application areas as Google, Yahoo and Looksmart web directories come from commerce. For this modality only one result compliant with the Alignment API format has to be submitted. The file should be named "yoursystem.rdf". The directory dataset for the single task modality can be found in the downloads area. The Single task modality in the Directory track was cancelled due to lack of resources needed to cross check the reference alignments.

These alignments can be used together with the Alignment API to compute recall (see an example command). Both datasets are being developed by the S-Match.org open source project.

Downloads

Submission details

The task is to find correspondences between classes in the ontologies. To find the correspondences any information in the two models can be used. In addition, it is allowed to use background knowledge, that has not specifically been created for the matching tasks (i.e., no hand-made correspondences between parts of the ontologies). Admissible background knowledge are "Oracles" such as WordNet, Cyc, UMLS, etc. Furthermore, results must not be tuned manually, for instance, by removing obviously wrong correspondences. Participants are encouraged to submit the results as computed by their systems.

We expect a single "zipped" file containing the submission for both modalities with the following directory structure:

Schedule

The resulting alignments should be represented using the common format for alignments and send to Juan Pane for evaluation by August 30th. Quantitative indicators of matching quality (Precision and Recall) will be returned to participants till September 15th.

The complete schedule of the event can be found at the OAEI 2010 campaign page.

Format

The resulting alignments should be represented using the common format for alignments.

Contacts

The resulting alignments should be sent to Juan Pane .

Original page: http://www.disi.unitn.it/~pane/OAEI/2010/directory/ [cached: 08/03/2011]