MultiFarm

The goal of this track is to evaluate the ability of systems to deal with ontologies in different natural languages. It serves the purpose of evaluating the strengths and the weaknesses of matchers and measuring their progress, with a focus on multilingualism.

Schedule

The schedule is that of OAEI 2015

Evaluation campaign results

The results of the OAEI 2015, for the MultiFarm track are available here

Data set

The original MultiFarm data set is composed of a set of 7 ontologies of the Conference domain (Cmt, Conference, ConfOf, Edas, Ekaw, Iasted, Sigkdd), translated into 8 languages (+English) -- Chinese (cn), Czech (cz), Dutch (nl), French (fr), German (de), Portuguese (pt), Russian (ru), Spanish (es) -- and the corresponding cross-lingual alignments between them. This data set is based on the OntoFarm data set, which has been used successfully for several years in the Conference track of the OAEI campaigns. For details on Multifarm, please refer to the MultiFarm web page.

This year, the data set has evolved (refer to the MultiFarm web page for details):

Arabic translations have been provided (some problems with ontology identifiers have been fixed in 02.09.2015).
Italian translations will be used for blind evaluation.
Bugs and translation issues have been fixed.

The new version of the data set (v2) is available on the SEALS test repository and is accessible through the SEALS client (see instructions here). Alternatively, you can download the ontologies from the MultiFarm web page. For running the MultiFarm test suites you will have to use the following identifiers:

MultiFarm identifiers (testing data set)

Repository: http://repositories.seals-project.eu/tdrs/
Suite-ID: [pair-language]
Version-ID: [pair-language]-v2

The [pair-language] refers to one of the 45 different language pairs: ar-cn, ar-cz, ar-de, ar-en, ar-es, ar-fr, ar-nl, ar-pt, ar-ru, cn-cz, cn-de, cn-en, cn-es, cn-fr, cn-nl, cn-pt, cn-ru, cz-de, cz-en, cz-es, cz-fr, cz-nl, cz-pt, cz-ru, de-en, de-es, de-fr, de-nl, de-pt, de-ru, en-es, en-fr, en-nl, en-pt, en-ru, es-fr, es-nl, es-pt, es-ru, fr-nl, fr-pt, fr-ru, nl-pt, nl-ru, pt-ru. For instance, ar-cn refers to the test cases involving the Arabic and Chinese languages while cn-cz refers to the test cases involving the Chinese and Czech languages. For each pair, 25 alignments involving the ontologies Cmt, Conference, ConfOf, Iasted and Sigkdd are available. As described below, edas and ekaw ontologies are used for blind evaluation.

The original version (v1) of the data set, which has been used in previous OAEI campaigns, is still available on the SEALS test repository. For accessing it, please replace v2 in the MultiFarm identifiers above by v1.

Evaluation modalities and criteria

As last year, in order to perform a blind evaluation, the translations of edas and ekaw ontologies are not available in the current testing data set described above.

Evaluation is based on the well-know measures of precision, recall and F-measure. We compute as well runtime.

Testing your tool

Please, refer to the instructions on how you can test your tools using the test data. Following those instructions, you have to use the MultiFarm data set identifiers indicated above.

We encourage you to use the Alignment API for manipulating and generating your alignments, and, in particular, for computing evaluation of your results. We use the API in order to compute the evaluation results.

Contacts

This track is organized by Cassia Trojahn dos Santos. If you have any problems working with the ontologies, any questions or suggestions, feel free to write an email to cassia [.] trojahn [at] irit [.] fr.