The goal of the benchmark test library is to offer a set of tests which are wide in feature coverage, progressive and stable. It serves the purpose of evaluating the strength and weakness of matchers (by being progressive and wide coverage) and measuring the progress of matchers (by being stable and reusable over the years).
This year, besides evaluating compliance of the tools, the focus will be on scalability, i.e., the ability of matchers to deal with data sets of increasing number of elements. Scalability will be evaluated from two perspectives; on the one hand we will consider four seed ontologies from different domains and with different sizes; on the other hand we will consider the same seed ontology scaling it by reducing its size by different factors.
The schedule is that of http://oaei.ontologymatching.org/2012/.
The benchmark test library consists of data sets that are built from reference ontologies of different sizes and from different domains. The bibliographic ontology described below has been the main reference ontology since the beginning of OAEI campaigns. This year, like in OAEI 2011, we will use new systematically generated benchmarks, based on other ontologies than the bibliographic one.
As for previous campaigns, Benchmark test suites (or data sets) will be generated from these reference ontologies. The following table summarizes the information about ontologies' sizes. For three of this ontologies, the tests will be conducted in a blind fashion, i.e., the participants will have no access to the original ontologies.
| Test set | biblio | 2 | 3 | 4 | finance | 
| ontology size | |||||
| classes+prop | 97 | 247 | 354 | 472 | 633 | 
| instances | 112 | 35 | 681 | 376 | 1113 | 
| entities | 209 | 282 | 1035 | 848 | 1746 | 
| triples | 1332 | 1562 | 5387 | 4262 | 21979 | 
The domain of the first benchmark is Bibliographic references. Its reference ontology is based on a subjective view of what must be a bibliographic ontology. There can be many different classifications of publications (based on area, quality, etc.), but we choose the one common among scholars based on mean of publications; the resultant ontology is reminiscent to BibTeX.
The reference ontology, based on the one of the first EON Ontology Alignment Contest, contains 33 named classes, 24 object properties, 40 data properties, 56 named individuals and 20 anonymous individuals. It has been improved by comprising a number of circular relations that were missing from the first test. In 2006, we have put the UTF-8 version of the tests as standard, the ISO-8859-1 being optional. In 2007, the tests were the same as in 2006.
The reference ontology is put in the context of the semantic web by using other external resources for expressing non bibliographic information. It takes advantage of FOAF (http://xmlns.com/foaf/0.1/) and iCalendar (http://www.w3.org/2002/12/cal/) for expressing the People, Organization and Event concepts. Here are the external reference used:
This reference ontology is a bit limited in the sense that it does not contain attachment to several classes. Similarly the kind of proposed alignments is still limited: they only match named classes and properties, and they mostly use the "=" relation with confidence of 1.
The complete bibliographic reference ontology is that of test #101.
The Ekaw ontology, one of the ontologies from the conference track, has been used as reference ontology for generating a Benchmark data set used in the benchmark track of OAEI 2011. It contains 74 classes and 33 object properties.
The jerm ontology and the provenance ontology has been used as reference ontologies for generating Benchmark data sets used in the benchmark track of OAEI 2011.5. The first one contains 219 classes and 31 properties, while the second one contains 398 classes and 33 properties.
Each data set is composed generally of 111 individual tests confronting a reference ontology with a modified version of it. The tests are systematically generated, starting from the reference ontology, and discarding of it a number of information in order to evaluate how the algorithm behave when this information is lacking. Generated tests are identified by a number; this numbering (almost) fully preserves the numbering of the first EON contest. The ontologies in the tests are described in OWL-DL and serialized in the RDF/XML format.
There are 6 categories of alteration:
| # | Name | Com | Hier | Inst | Prop | Class | Comment | |
| 101 | 0 | 0 | 0 | 0 | 0 | 0 | Reference alignment | |
| 102 | Irrelevant ontology | |||||||
| 103 | 0 | 0 | 0 | 0 | 0 | 0 | Language generalization | |
| 104 | 0 | 0 | 0 | 0 | 0 | 0 | Language restriction | |
| 201 | R | 0 | 0 | 0 | 0 | 0 | No names | |
| 202 | R | N | 0 | 0 | 0 | 0 | No names, no comments | |
| 203 | 0 | N | 0 | 0 | 0 | 0 | No comments (was missspelling) | |
| 204 | C | 0 | 0 | 0 | 0 | 0 | Naming conventions | |
| 205 | S | 0 | 0 | 0 | 0 | 0 | Synonyms | |
| 206 | F | F | 0 | 0 | 0 | 0 | Translation | |
| 207 | F | 0 | 0 | 0 | 0 | 0 | ||
| 208 | C | N | 0 | 0 | 0 | 0 | ||
| 209 | S | N | 0 | 0 | 0 | 0 | ||
| 210 | F | N | 0 | 0 | 0 | 0 | ||
| 221 | 0 | 0 | N | 0 | 0 | 0 | No specialisation | |
| 222 | 0 | 0 | F | 0 | 0 | 0 | Flatenned hierarchy | |
| 223 | 0 | 0 | E | 0 | 0 | 0 | Expanded hierarchy | |
| 224 | 0 | 0 | 0 | N | 0 | 0 | No instance | |
| 225 | 0 | 0 | 0 | 0 | R | 0 | No restrictions | |
| 226 | No datatypes | |||||||
| 227 | Unit difference | |||||||
| 228 | 0 | 0 | 0 | 0 | N | 0 | No properties | |
| 229 | Class vs instances | |||||||
| 230 | 0 | 0 | 0 | 0 | 0 | F | Flattened classes | |
| 231* | 0 | 0 | 0 | 0 | 0 | E | Expanded classes | |
| 232 | 0 | 0 | N | N | 0 | 0 | ||
| 233 | 0 | 0 | N | 0 | N | 0 | ||
| 236 | 0 | 0 | 0 | N | N | 0 | ||
| 237 | 0 | 0 | F | N | 0 | 0 | ||
| 238 | 0 | 0 | E | N | 0 | 0 | ||
| 239 | 0 | 0 | F | 0 | N | 0 | ||
| 240 | 0 | 0 | E | 0 | N | 0 | ||
| 241 | 0 | 0 | N | N | N | 0 | ||
| 246 | 0 | 0 | F | N | N | 0 | ||
| 247 | 0 | 0 | E | N | N | 0 | ||
| 248 | N | N | N | 0 | 0 | 0 | ||
| 249 | N | N | 0 | N | 0 | 0 | ||
| 250 | N | N | 0 | 0 | N | 0 | ||
| 251 | N | N | F | 0 | 0 | 0 | ||
| 252 | N | N | E | 0 | 0 | 0 | ||
| 253 | N | N | N | N | 0 | 0 | ||
| 254 | N | N | N | 0 | N | 0 | ||
| 257 | N | N | 0 | N | N | 0 | ||
| 258 | N | N | F | N | 0 | 0 | ||
| 259 | N | N | E | N | 0 | 0 | ||
| 260 | N | N | F | 0 | N | 0 | ||
| 261 | N | N | E | 0 | N | 0 | ||
| 262 | N | N | N | N | N | 0 | ||
| 265 | N | N | F | N | N | 0 | ||
| 266 | N | N | E | N | N | 0 | ||
| 301 | Real: BibTeX/MIT | |||||||
| 302 | Real: BibTeX/UMBC | |||||||
| 303 | Real: Karlsruhe | |||||||
| 304 | Real: INRIA | |||||||
The transformations can be graded applying the alteration with different percentages. For example, the test 201-4 means that the indicated alteration (replacing names with random strings) has been applied to 40 percent of the entities. The lattice of generated tests is displayed below with their derivation relations. The upper the test, the easier it is supposed to be.
 
 
A test data set is made of a set of directories (one per test), each directory containing an ontology (onto.rdf) in OWL. The directories are named according to test numbers; for example, the directory 201 will contain the ontology corresponding to the test 201. Each directory also contains the reference alignments against which the results of the matching process will be evaluated. These alignments follow the Alignment format described here.
Below are provided the set of tests for the reference Bibliographic ontology. As stated before, the reference ontology is in test #101. Then, the matching task will consist in aligning each test ontology with that one of test #101. The resulting alignment must be provided in the format described here. It will be compared against the reference alignment to produce the compliance measurements (mainly precision and recall) for the matching tool for that test. It is, of course, forbidden to use any of the reference alignments for performing the matching task.
The only interesting alignments are those involving classes and properties of the given ontologies. So the alignments should not align individuals, nor entities from the external ontologies.
There is some chance that the final test be improved by adding entity expansion and reduction. It is also possible that there will be a lot of more instances in each ontology.
This test compares the ontology to itself.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
This test compares the ontology to a totally irrelevant one.
NOTE: The onto.rdf file is not provided here. It is possible to run the test directly on the true file of a totally irrelevant ontology. For example, you can use the food ontology given in the OWL guide (verbatim)., i.e., http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine.
This test compares the ontology with its generalisation in OWL Lite (i.e., unavailable constraints are replaced by the more general available). The generalization basically removes owl:unionOf and owl:oneOf and the Property types (owl:TransitiveProperty).
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
This test compares the ontology with its restriction in OWL Lite (where unavailable constraints have been discarded).
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Each label or identifier is replaced by a random one.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Each label or identifier is replaced by a random one. Comments (rdfs:comment and dc:description) have been suppressed as well.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
A random, but consistent, typo generator should be applied to labels and comments.
Not available in this test (if you know how to do it, contact me).
Different naming conventions (Uppercasing, underscore, dash, etc.) are used for labels. Comments have been suppressed.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Labels are replaced by synonyms. Comments have been suppressed.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
The complete ontology is translated to another language than english (French in the current case, but other languages would be fine).
Ontology : [RDF/XML] 
[RDF/XML in ISO-8859-1] 
[HTML]
 
Alignment : [RDF/XML] 
[RDF/XML in ISO-8859-1] 
[HTML]
 
NOTE: You can use alternatively the ISO-Latin-1 (ISO-8859-1) version of the tests by renaming them after their UTF-8 version.
Each label or identifieris translated to another language than english (French in the current case, but other languages would be fine).
Ontology : [RDF/XML] 
[RDF/XML in ISO-8859-1] 
[HTML]
 
Alignment : [RDF/XML] 
[RDF/XML in ISO-8859-1] 
[HTML]
 
NOTE: You can use alternatively the ISO-Latin-1 (ISO-8859-1) version of the tests by renaming them after their UTF-8 version.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[RDF/XML in ISO-8859-1] 
[HTML]
 
Alignment : [RDF/XML] 
[RDF/XML in ISO-8859-1] 
[HTML]
 
NOTE: You can use alternatively the ISO-Latin-1 (ISO-8859-1) version of the tests by renaming them after their UTF-8 version.
All subclass assertions to named classes are suppressed.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
(variation: compile inheritance)
A hierarchy still exists but has been strictly reduced.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
The alignment here contains relations which are not "=" but "<".
Numerous intermediate classes are introduced within the hierarchy.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
All individuals have been suppressed from the ontology.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
All local restrictions on properties have been suppressed from the ontology.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
(variation: no property nor global restrictions on properties)
In this test all datatypes are converted to xsd:string.
Not available in this test
(Measurable) values are expressed in different datatypes.
Not available in this test
Properties and relations between objects have been completely suppressed.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
(variation: leave the properties in instances)
Some classes have become instances.
Not available in this test.
Some components of classes are expanded in the class structure (e.g., year, month, day attributes instead of date).
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Here one limitation of the proposed format is that it does not cover alignments such as: journalName = name o journal.
Some classes are spreaded over several classes.
Not available in this test.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
Note that the 3xx tests are only here for comparability with previous years. We know that the reference alignments for these tests are not perfect (especially because the ontologies sometimes contain flaws).
For a computer scientist, BibTeX is the starting point for a useful bibliographic ontology. It is of wide use and relatively well thought out. This ontology can be found at and is documented in BibTex in OWL.
This is a test of comparing our test ontology with an actual ontology, simpler and closer to the initial BibTeX ontology. The alignment result contains some inclusion (<) alignment relations.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
This ontology is very similar to the previous one, even closer to the genuine BibTeX, with different extensions and naming conventions. It can be found at http://ebiquity.umbc.edu.
The alignment result also contains some inclusion (<) alignment relations.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
This is a test of comparing our test ontology with an actual ontology which contains more items than the actual items used in the current ontology.
The Karlsruhe ontology (http://www.aifb.uni-karlsruhe.de/ontology) is used in the Ontoweb portal. It is a refinement from other ontologies such as (KA)2. As such it does not only defines bibliographic items but many other items.
The alignment contains < as well as > relations.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
This is a test of comparing our test ontology with an actual ontology which is not equivalent but quite close (it can be though of as a previous version).
This INRIA ontology (fr.inrialpes.exmo.rdf.bib.owl) has been designed by Antoine Zimmermann from the BibTeX in OWL ontology and our Bibliographic XML DTD. Its goal was to gather easily a number of RDF items. These items were BibTeX entries found on the web and transformed in RDF according to this ontology.
The actual hierarchy of this ontology contains classes which are subclasses of several other classes.
Ontology : [RDF/XML] 
[HTML]
 
Alignment : [RDF/XML] 
[HTML]
 
The release notes of the test versions can be found here.
Since 2010, it is not necessary anymore to download the data sets (it has always been better to get it on the web). The SEALS platform will provide the data sets. In the future, it is even planned that it will generate the data set on the fly.
Participants can test their tools using the standard benchmark dataset described above, which can be downloaded here. They can enforce testing with a subset of data sets built with reference ontologies used in previous campaigns, which are stored in the Test Data Repository accessible through the SEALS portal. Note that two of these data sets were built based on the same reference ontologies (biblio and finance) which will be used in OAEI 2012.
All those data sets maintain the structure explained in the Example of a complete benchmark data set section, and testing with those data sets can be done by using the SEALS client. This client iterates over tests in a data set whose identifier is provided as a parameter. In all cases, the ontologies found in the data set directories are matched against the ontology found in 101/onto.rdf. The resulting alignments must be outputted in the alignment format. They are placed in a local directory given also as a parameter to the client.
The whole data sets are available: biblio, jerm, provenance and finance.
The identifiers of data sets for testing with the SEALS client are given below:
We encourage you to use the Alignment API for manipulating and generating your alignments, and, in particular, for computing evaluation of your results.
Many resources have been used for setting up the biblio test:
Various people helped testing or suggested improvements and tests:
Contact address is Jose-Luis : Aguirre # inria : fr