MultiFarm 2011.5 Dataset

This page informs about the MultiFarm dataset, a comprehensive dataset for multilingual ontology matching. The dataset can be downloaded and used for any kind of scientific purpose. Its generation and structure is briefly explained on this webpage, more details can be found in the following paper.

Christian Meilicke, Raúl García Castro, Fred Freitas, Willem Robert van Hage, Elena Montiel-Ponsoda, Ryan Ribeiro de Azevedo, Heiner Stuckenschmidt, Ondrej Svab-Zamazal, Vojtech Svatek, Andrei Tamilin, Cássia Trojahn, Shenghui Wang. MultiFarm: A Benchmark for Multilingual Ontology Matching. Accepted for publication at the Journal of Web Semantics.

Download the authors version of the paper


The following enumeration describes modifications that have been applied to the dataset after its first publication.

Evaluation campaigns

The dataset has been used in in the following experiments:

Translations in raw format

The dataset has been generated by translating the existing OntoFarm dataset. The results of this first step are available in simple structured textfiles and can be downloaded from the following table. Please notice that all files are UTF-8 encoded. Some letters might be incorrectly displayed by your browser, because it does not detect the encoding correctly.

  Spanish German French Russian Portuguese Czech Dutch Chinese
CMT link link link link link link link link
CONFERENCE link link link link link link link link
CONFOF link link link link link link link link
EDAS - - - - - - - -
EKAW - - - - - - - -
IASTED link link link link link link link link
SIGKDD link link link link link link link link

Complete bundle with ontologies and reference alignments

The results of the translation have been used to generate language specific variants of existing ontologies and reference alignment for all pairs of ontologies. These files are bundled in a single zip-file. They can be downloaded and used in any kind of scenario/experiment.

The zip-file is structured as follows:

      [for each ontology cmt, conference, confOf, edas, ekaw, iasted, sigkdd]
   cz/ (contains 7 files)
   de/ (contains 7 files)
   [a directory for each language cn, cz, de, en, es, fr, nl, pt, ru]
      [overall 21*2=42+7*1 files]
   [a directory for each language pair cn-cz, cn-de, ...]

>>> Download the zipped bundle

SEALS Testsuites

The dataset can also be used via the SEALS platform, where we have prepared and stored a testsuite for each language pair, resulting in 36 testsuites. You need an account for the SEALS platform to search and retrieve them from the test data repository.

>>> Link to the SEALS platform

You can, for example, find the testsuite for the language pair Czech-German if you just type 'cz-de' in the search field of the test data repository.

Involved people

The dataset has been generated by a collaborative initiative of the following people.


Contact Christian Meilicke or Cassia Trojahn. for further information related to this dataset.

Known Bugs

Some users of the dataset have already detected some small bugs. In the future we will fix these bugs, for the moment we will just list them:


