Ontology Alignment Evaluation Initiative - OAEI-2009 CampaignOAEI
The OAEI-2009 results are available here

Ontology Alignment Evaluation Initiative

2009 Campaign

The increasing number of methods available for schema matching/ontology integration necessitate to establish a consensus for evaluation of these methods. Since 2004, OAEI organizes evaluation campaigns aiming at evaluating ontology matching technologies.

The OAEI 2009 campaign is associated to the ISWC Ontology matching workshop to be held at Fairfax (VA US), near Washington DC on Sunday October 25, 2009.


The 2009 campaign introduces two new specific tracks involving oriented matching, i.e., results with subsumption-like relations, and instance matching, i.e., results whose goal is to match instances and not classes.

Comparison track: benchmark
Like in previous campaigns, a systematic benchmark series has been produced. The goal of this benchmark series is to identify the areas in which each alignment algorithm is strong and weak. The test is based on one particular ontology dedicated to the very narrow domain of bibliography and a number of alternative ontologies of the same domain for which alignments are provided.
Expressive ontologies
The anatomy real world case is about matching the Adult Mouse Anatomy (2744 classes) and the NCI Thesaurus (3304 classes) describing the human anatomy.
Participants will be asked to find all correct correspondences (equivalence and/or subsumption correspondences) and/or 'interesting correspondences' within a collection of ontologies describing the domain of organising conferences (the domain being well understandable for every researcher). Results will be evaluated a posteriori in part manually and in part by data-mining techniques and logical reasoning techniques. There will also be evaluation against reference mapping based on subset of the whole collection.
Directories and thesauri
fishery gears
features four different classification schemes, expressed in OWL, adopted by different fishery information systems in FIM division of FAO. An alignment performed on this 4 schemes should be able to spot out equivalence, or a degree of similarity between the fishing gear types and the groups of gears, such to enable a future exercise of data aggregation cross systems.
The directory real world case consists of matching web sites directories (like open directory or Yahoo's). It is more than 4 thousand elementary tests.
Three large SKOS subject heading lists for libraries have to be matched using relations from the SKOS vocabulary. Results will be evaluated on the basis of (i) a partial reference alignment (ii) using the mapping to re-index books from one vocabulary to the other.
Oriented matching
This track focuses on the evaluation of alignments that contain other mapping relations than equivalences.
Instance matching
The instance data matching track aims at evaluating tools able to identify similar instances among different datasets. It features Web datasets, as well as a generated benchmark.
Eprints-Rexa-Sweto/DBLP benchmark
three datasets containing instances from the domain of scientific publications
three datasets covering several topics and structured according to different ontologies
A generated benchmark constituted using one dataset and modifying it according to various criterias.
very large crosslingual resources
The purpose of this task (vlcr) is to match the Thesaurus of the Netherlands Institute for Sound and Vision (called GTAA, see below for more information) to two other resources: the English WordNet from Princeton University and DBpedia.

We summarize below the variation between the results expected by these tests (all results are given in the Alignment format):

benchmarksOWL=[0 1]openEN(36+61)*2*49
anatomyOWL=[0 1]blindEN3k*3k
conferenceOWL-DL=, <=[0 1]blind+openEN
benchmarksubsOWL=,<,>[0 1]openEN
eprintsRDF=[0 1]openEN
tapRDF=[0 1]openEN
iimbRDF=[0 1]openEN
[0 1]blind
[0 1] in the 'confidence-column' means that submission with confidence values in the range [0 1] are preferred, but does not exclude systems which do not distinguish between different confidence values.

Evaluation process

Each data set has a different evaluation process. They can be roughly divided into four groups:

benchmark: open
benchmark tests are provided with the expected results. Participants must return their obtained results to organisers;
anatomy, conference, library, eprints, tab, iimb: blind
these are blind tests, i.e., participants do not know the results and must return their results to organisers;
fishery, library, benchmarksubs: expert
results themselves are evaluated by experts a posteriori on a sample of the results;
library: task-based
results are evaluated by using them in a final task and evaluating the impact in this final task.

However, the evaluation will be processed in the same three successive steps as before.

Preparatory Phase

Ontologies are described in OWL-DL and serialized in the RDF/XML format. The expected alignments are provided in the Alignment format expressed in RDF/XML.

The ontologies and alignments of the evaluation are provided in advance during the period between May 19th and June 15th. This gives potential participants the occasion to send observations, bug corrections, remarks and other test cases to the organizers. The goal of this primary period is to be sure that the delivered tests make sense to the participants. The feedback is important, so all participants should not hesitate to provide it. The tests will certainly change after this period, but only for ensuring a better participation to the tests. The final test base has been released on July 1st.

Execution Phase

During the execution phase the participants will use their algorithms to automatically match the ontologies. Participants should only use one algorithm and the same set of parameters for all tests in all tracks. Of course, it is fair to select the set of parameters that provide the best results (for the tests where results are known). Beside the parameters the input of the algorithms must be the two provided ontology to match and any general purpose resource available to everyone (that is no resourse especially designed for the test). In particular, participants should not use the data (ontologies and results) from other test sets to help their algorithm. And cheating is not fair...

The deadline for delivering final results is September 26th, sharp. However, it is highly advised that participants send results before (preferably by September 1st) to the organisers so that they can check that they will be able to evaluate the results smoothly and can provide some feedback to participants.

Participants will provide their alignment for each test in the Alignment format. The results will be provided in a zip file containing one directory per test (named after its number) and each directory containing one result file in the RDF/XML Alignment format with always the same name (e.g., participant.rdf replacing "participant" by the name you want your system to appear in the results, limited to 6 alphanumeric characters). This should yield the following structure:

+- benchmarks
|  +- 101
|  |  +- participant.rdf
|  +- 103
|  |  +- participant.rdf
|  + ...
+- anatomy
|  +- 1
|  |  +- participant.rdf
|  +- 2
|  |  +- participant.rdf
|  +- ...
+- directory
|  +- 1
|  |  +- participant.rdf
|  + ...
+ ...

Participants will also provide, for September 26th, a paper to be published in the proceedings

All participants are required to provide a link to their program and parameter set. This year, we would like this to collect the requirements of all your tools in order to have an idea of what it would require to offer you automatic evaluation in the years to come.

Appart from the instance matching track, the only interesting alignments are those involving classes and properties of the given ontologies. So these alignments should not align individuals, nor entities from the external ontologies.

Evaluation Phase

The organizers will evaluate the results of the algorithms used by the participants and provide comparisons on the basis of the provided alignments.

In order to ensure that it will be possible to process automatically the provided results, participants are requested to provide (preliminary) results by September 1st. In the case of blind tests only the organizers will do the evaluation with regard to the withheld alignments. In the case of double blind tests, the participants will provide a version of their system and the values of the parameters if any.

An email with the location of the required zip files must be sent to the contact addresses below.

The standard evaluation measures will be precision and recall computed against the reference alignments. For the matter of aggregation of the measures we will use weighted harmonic means (weight being the size of reference alignment). Precision/recall graphs will also be computed, so it is advised that participants provide their results with a weight to each correspondence they found (participants can provide two alignment results: <name>.rdf for the selected alignment and <name>-full.rdf for the alignment with weights.

Furthermore, it is planned to introduce new measures addressing some limitations of precision and recall. These will be presented at the workshop discussion in order for the participants to provide feedback on the opportunity to use them in a further evaluation.


June 1st
datasets are out
June 22nd
end of commenting period
July 6st
tests are frozen
September 1st
participants send preliminary results (for interoperability-checking)
September 28th
participants send final results and papers
October 5th
organisers publish results for comments
October 25th
final results ready and OM-2009 workshop.
November 15th
participants send final versions of papers to Jérôme Euzenat and Pavel Shvaiko.


From the results of the experiments the participants are expected to provide the organisers with a paper to be published in the proceedings of the workshop. The paper must be no more than 8 pages long and formatted using the LNCS Style. To ensure easy comparability among the participants it has to follow the given outline. A package with LaTeX and Word templates is made available here. The above mentionned paper must be sent in PDF format before September 28st to Jerome . Euzenat () inrialpes . fr with copy to Pavel Shvaiko (pavel () dit dot unitn dot it.

Participants may also submit a longer version of their paper, with a length justified by its technical content, to be published online in the CEUR-WS collection and on the OAEI web site (this last paper will be due just before the workshop).

The outline of the paper is as below (see templates for more details):

These papers are not peer-reviewed and are here to keep track of the participations and the description of matchers which took part in the campaign.

The results from both selected participants and organizers will be presented at the Workshop on Ontology matching at ISWC 2009 taking place at Chantilly (VA, USA) near Washington DC on October, 25th, 2009. We hope to see you there.

Tools and material

Here are some tools that may help participants.

Processing tools

Participants may use the Alignment API for generating and manipulating their alignments (in particular for computing evaluation of results).

SKOS conversion tools

The participants may use various options if they need to convert SKOS vocabularies into OWL.

OWL-N3 conversion tools

Vassilis Spiliopoulos pointed out to Altova transformer from OWL to N3 notation. This can be useful for some. This is a commercial tool with a 30 days free trial.

$Id: index.html,v 1.26 2012/12/01 09:15:31 euzenat Exp $