In this page one can find the results of the OAEI 2014 campaign for the MultiFarm track. The details on this data set can be found at MultiFarm data set. If you notice any kind of error (wrong numbers, incorrect information on a matching system, etc.) do not hesitate to contact us (for the mail see below in the last paragraph on this page).
For the 2014 campaign, part of the data set has been used for a kind of blind evaluation. This subset include all the pairs of matching tasks involving the edas and ekaw ontologies (resulting in 36x24 matching tasks), which were not used in previous campaigns. We refer to evaluation as edas and ekaw based evaluation in the following. Participants were able to test their systems on the freely available subset of matching tasks (open evaluation) (including reference alignments), available via the SEALS repository, which is composed of 36x25 tasks.
We distinguish two types of matching tasks : (i) those tasks where two different ontologies have been translated into different languages; and (ii) those tasks where the same ontology has been translated into different languages. For the tasks of type (ii), good results are not directly related to the use of specific techniques for dealing with ontologies in different natural languages, but on the ability to exploit the fact that both ontologies have an identical structure.
This year, only 3 systems use specific cross-lingual methods: AML, LogMap and XMap. All of them integrate a translation module in their implementations. LogMap uses Google Translator API and pre-compiles a local dictionary in order to avoid multiple accesses to the Google server within the matching process. AML and XMap use Microsoft Translator, and AML adopts the same strategy of LogMap computing a local dictionary. The translation step is performed before the matching step itself.
For both settings, the systems have been executed on a Debian Linux VM configured with four processors and 20GB of RAM running under a Dell PowerEdge T610 with 2*Intel Xeon Quad Core 2.26GHz E5607 processors, under Linux ProxMox 2 (Debian). All measurements are based on a single run. Some exceptions were observed for MaasMtch, and it was not able to be executed under the same setting than the other systems. Thus, we do not report on execution time for this system.
The table below present the aggregated results for the open subset of MultiFarm, for the test cases of type (i) and (ii). These results have been computed using the Alignment API 4.6. We did not distinguish empty and erroneous alignments. We observe significant differences between the results obtained for each type of matching task, in terms of precision, for all systems, with lower differences in terms of recall. As we could expect, all systems implementing specific cross-lingual techniques generate the best results for test cases of type (i). A similar behavior has also been observed for the tests cases of type (ii), even if the specific strategies could have less impact due to the fact that the identical structure of the ontologies could also be exploited instead by the other systems. For cases of type (i), while LogMap has the best precision (in detriment of recall), AML has similar results both in terms of precision and recall and outperforms the other systems in terms of F-measure (for both cases). The reader can refer to the OAEI paper for a more detailed discussion on these results.
|Different ontologies (i)||Same ontologies (ii)|
|Specific cross-lingual matchers||AML||11.40||.57||.54||.53||54.89||.95||.62||.48|
Table below presents a comparison, in terms of F-measure, of the systems implementing some cross-lingual strategy in at least one OAEI campaign. For the results marked with one *, the corresponding system version has not implemented specific strategies for the corresponding year. Best F-measures for cases (i) and (ii) over the years are indicated in bold face.
You can download the complete set of generated alignments. These alignments have been generated by executing the tools with the help of the SEALS infrastructure. All results presented above were based on these alignments. You can download as well additional tables of results (including precision and recall for each pair of languages), for both types of matching task (i) and (ii).
This year we have included edas and ekaw in a (pseudo) blind setting. In fact, this subset was, two years ago, by error, available on the MultiFarm web page. Since that, we have removed it from there and it is not available as well for the participants via the SEALS repositories. However, we can not guarantee that the participants have not used this data set for their tests.
We evaluate this subset on the systems implementing specific cross-lingual strategies. The tools run in the SEALS platform using locally stored ontologies. Table below presents the results for AML and LogMap. Using this setting, XMap has launched exceptions for most pairs and its results are not reported for this subset. These internal exceptions were due to the fact that the system exceeded the limit of accesses to the translator. While AML includes in its local dictionaries the automatic translations for the two ontologies, it is not the case for LogMap (real blind case). This can explain the similar results obtained by AML in both settings. However, LogMap has encountered many problems for accessing Google translation server from our server, what explain the decrease in its results and the increase in runtime (besides the fact that this data set is slightly bigger than the open data set in terms of ontology elements). Overall, for cases of type (i) -- remarking the particular case of AML -- the systems maintained their performance with respect to the open setting.
|Different ontologies (i)||Same ontologies (ii)|