Ontology Alignment Evaluation Initiative

2007 Results

Here are the official results of the Ontology Alignment Evaluation 2007. They will be presented in Busan (KR) at the ISWC 2007 Ontology matching workshop.

A synthesis paper in the proceedings of this workshop summarises the main results of the 2007 campaign (the present version has been lightly updated with regard to the proceedings version). Here further data and update in the results are available. This page is the official result of the evaluation.

The papers provided by the participants have been collected in the ISWC 2007 Ontology matching workshop proceedings (PDF) which are also published as CEUR Workshop Proceedings volume 304.

General summary

This year again, we had more participants than in previous years: 4 in 2004, 7 in 2005, 10 in 2006, and 17 in 2007. We can also observe a common trend: participants who keep on developing their systems improve the evaluation results over the years.

We have had not enough time so far to validate the results which had been provided by the participants, but we scrutinized some of the results leading to improvements for some participants and retraction from others. Validating these results has proved feasible in the previous years so we plan to do it again in the future.

We summarize the list of participants in the table below. Similar to last year not all participants provided results for all tests. They usually did those which are easier to run, such as benchmark, directory and conference. The variety of tests and the short time given to provide results have certainly prevented participants from considering more tests.

There are two groups of systems: those which can deal with large taxonomies (food, environment, library) and those which cannot. The two new test cases (environment and library) are those with the least number of participants. This can be explained by the size of ontologies or their novelty - there are no past results to compare with.

SoftwareconfidencebenchmarkanatomydirectoryfoodenvironmentlibraryconferenceΣ
AgreementMaker 1
AOAS 1
ASMOV 4
DSSim 6
Falcon-AO v0.7 7
Lily 4
OLA2 3
OntoDNA 3
OWL-CM 1
Prior+ 4
RiMOM 4
SAMBO 2
SCARLET ? 1
SEMA 2
Silas ? 1
TaxoMap 2
X-SOM 4
Σ=17 10 13 11 9 6 2 3 6
Table: Participants and the state of their submissions. Confidence stands for the type of result returned by a system: it is ticked when the confidence has been measured as a non boolean value.

This year we have been able to devote more time to performing these tests and evaluation (three full months). This is certainly still too little especially during the summer period allocated for that. However, it seems that we have avoided the rush of previous years.

Track by track results

The summary of the results track by track will be provided in the following six sections as soon as they are made available:

Lesson learnt

The most important applied lesson learned from last year is that we have been able to revise the schedule so we had more time for evaluation. But there remain lessons not really taken into account that we identify with an asterisk (*). So we reiterate those lessons that still apply with new ones, including:

A)
This is a trend that there are now more matching systems and more systems are able to enter such an evaluation. This is very encouraging for the progress of the field.
B)
Many systems have been entering the campaign for several years. This means that we are not dealing with a continuous flow of prototypes but with systems on which there is a persistent development. These systems tend to improve over years.
C*)
The benchmark test case is not discriminant enough between systems. It is still useful for evaluating the strength and weakness of algorithms but does not seem to be sufficient anymore for comparing algorithms. We will have to look into better alternatives.
D)
We have had more proposals for test cases this year (we had actively looked for them). However, the difficult lesson is that proposing a test case is not enough, there is a lot of remaining work in preparing the evaluation. Fortunately, with tool improvements, it will be easier to perform the evaluation. We would also like to have more test cases for expressive ontologies.
E*)
It would be interesting and certainly more realistic, to provide some random gradual degradation of the benchmark tests (5% 10% 20% 40% 60% 100% random change) instead of a general discarding of features one by one. This has still not been done this year but we are considering it seriously for the next year.
F)
We have detected this year, through some random verifications, some submissions which were not strictly complying to the evaluation rules. We may have to be more strict about control in future.
G)
Contrary to what has been noted in 2006, a significant number of systems were unable to output syntactically correct results (i.e., automatically usable by another program). Since fixing these mistakes by hand is becoming too much work, we plan to go towards automatic evaluation in which participants have to input correct results.
H)
There seems to be partitions of the systems: between systems able to deal with large test sets and systems unable to do it, between system robust on all tracks and those which are specialized (see Table). These observations remain to be further analyzed.

Future plans

Future plans for the Ontology Alignment Evaluation Initiative are certainly to go ahead and to improve the functioning of the evaluation campaign. This involves:

Of course, these are only suggestions that will be refined during the coming year.

Conclusions

This year we had more systems that entered the evaluation campaign as well as more systems managed to produce better quality results compared to the previous years. Each individual test case had more participants than ever. This shows that, as expected, the field of ontology matching is getting stronger (and we hope that evaluation has been contributing to this progress).

On the side of participants, it seems that there is clearly a problem of size of input that should be addressed in a general way. We would like to see more participation on the large test cases. On the side of organizers, each year the evaluating of matching systems becomes more complex.

Most of the participants have provided description of their systems and their experience in the evaluation (but SCARLET which system is described in a regular ISWC 2007 paper). These OAEI papers, like the present one, have not been peer reviewed. Reading the papers of the participants should help people involved in ontology matching to find what makes these algorithms work and what could be improved.

The Ontology Alignment Evaluation Initiative will continue these tests by improving both test cases and testing methodology for being more accurate. Further information can be found on this site.


http://oaei.ontologymatching.org/2007/results/

$Id: index.html,v 1.9 2010/09/05 14:54:28 euzenat Exp $