The goal of this track is to evaluate the ability of systems to deal with ontologies in different natural languages. It serves the purpose of evaluating the strengths and the weaknesses of matchers and measuring their progress, with a focus on multilingualism.
The schedule is that of OAEI 2018
The original MultiFarm data set is composed of a set of 7 ontologies of the Conference domain (Cmt, Conference, ConfOf, Edas, Ekaw, Iasted, Sigkdd), translated into 8 languages (+English) -- Chinese (cn), Czech (cz), Dutch (nl), French (fr), German (de), Portuguese (pt), Russian (ru), Spanish (es) -- and the corresponding cross-lingual alignments between them. This data set is based on the OntoFarm data set, which has been used successfully for several years in the Conference track of the OAEI campaigns. For details on Multifarm, please refer to the MultiFarm web page.
The current version of the data set is available on the SEALS test repository and is accessible through the SEALS client (see instructions here). Alternatively, you can download the ontologies from the MultiFarm web page. For running the MultiFarm test suites you will have to use the following identifiers:
The [pair-language] refers to one of the 45 different language pairs: ar-cn, ar-cz, ar-de, ar-en, ar-es, ar-fr, ar-nl, ar-pt, ar-ru, cn-cz, cn-de, cn-en, cn-es, cn-fr, cn-nl, cn-pt, cn-ru, cz-de, cz-en, cz-es, cz-fr, cz-nl, cz-pt, cz-ru, de-en, de-es, de-fr, de-nl, de-pt, de-ru, en-es, en-fr, en-nl, en-pt, en-ru, es-fr, es-nl, es-pt, es-ru, fr-nl, fr-pt, fr-ru, nl-pt, nl-ru, pt-ru. For instance, ar-cn refers to the test cases involving the Arabic and Chinese languages while cn-cz refers to the test cases involving the Chinese and Czech languages. For each pair, 25 alignments involving the ontologies Cmt, Conference, ConfOf, Iasted and Sigkdd are available. As described below, edas and ekaw ontologies are used for blind evaluation.
The original version (v1) of the data set, which has been used till 2014 OAEI campaigns, is still available on the SEALS test repository. For accessing it, please replace v2 in the MultiFarm identifiers above by v1.
As previous years, in order to perform a blind evaluation, the translations of edas and ekaw ontologies are not available in the current testing data set described above.
Evaluation is based on the well-know measures of precision, recall and F-measure. We compute as well runtime.
Please, refer to the instructions on how you can test your tools using the test data. Following those instructions, you have to use the MultiFarm data set identifiers indicated above.
We encourage you to use the Alignment API for manipulating and generating your alignments, and, in particular, for computing evaluation of your results. We use the API in order to compute the evaluation results.
This track is organized by Cassia Trojahn dos Santos, with the suppport of Elodie Thieblin and Daniela Schmidt. If you have any problems working with the ontologies, any questions or suggestions, feel free to write an email to cassia [.] trojahn [at] irit [.] fr.