Ontology Alignment Evaluation Initiative - OAEI-2012 Campaign

Benchmark test case

The goal of the benchmark test library is to offer a set of tests which are wide in feature coverage, progressive and stable. It serves the purpose of evaluating the strength and weakness of matchers (by being progressive and wide coverage) and measuring the progress of matchers (by being stable and reusable over the years).

This year, besides evaluating compliance of the tools, the focus will be on scalability, i.e., the ability of matchers to deal with data sets of increasing number of elements. Scalability will be evaluated from two perspectives; on the one hand we will consider four seed ontologies from different domains and with different sizes; on the other hand we will consider the same seed ontology scaling it by reducing its size by different factors.

Schedule

The schedule is that of http://oaei.ontologymatching.org/2012/.

Data sets

The benchmark test library consists of data sets that are built from reference ontologies of different sizes and from different domains. The bibliographic ontology described below has been the main reference ontology since the beginning of OAEI campaigns. This year, like in OAEI 2011, we will use new systematically generated benchmarks, based on other ontologies than the bibliographic one.

As for previous campaigns, Benchmark test suites (or data sets) will be generated from these reference ontologies. The following table summarizes the information about ontologies' sizes. For three of this ontologies, the tests will be conducted in a blind fashion, i.e., the participants will have no access to the original ontologies.

Test set biblio 2 3 4 finance
ontology size
classes+prop 97 247 354 472 633
instances 112 35 681 376 1113
entities 209 282 1035 848 1746
triples 1332 1562 5387 4262 21979

Reference ontologies

The Bibliographic ontology

The domain of the first benchmark is Bibliographic references. Its reference ontology is based on a subjective view of what must be a bibliographic ontology. There can be many different classifications of publications (based on area, quality, etc.), but we choose the one common among scholars based on mean of publications; the resultant ontology is reminiscent to BibTeX.

The reference ontology, based on the one of the first EON Ontology Alignment Contest, contains 33 named classes, 24 object properties, 40 data properties, 56 named individuals and 20 anonymous individuals. It has been improved by comprising a number of circular relations that were missing from the first test. In 2006, we have put the UTF-8 version of the tests as standard, the ISO-8859-1 being optional. In 2007, the tests were the same as in 2006.

The reference ontology is put in the context of the semantic web by using other external resources for expressing non bibliographic information. It takes advantage of FOAF (http://xmlns.com/foaf/0.1/) and iCalendar (http://www.w3.org/2002/12/cal/) for expressing the People, Organization and Event concepts. Here are the external reference used:

This reference ontology is a bit limited in the sense that it does not contain attachment to several classes. Similarly the kind of proposed alignments is still limited: they only match named classes and properties, and they mostly use the "=" relation with confidence of 1.

The complete bibliographic reference ontology is that of test #101.

Other reference ontologies used in previous campaigns

The Ekaw ontology, one of the ontologies from the conference track, has been used as reference ontology for generating a Benchmark data set used in the benchmark track of OAEI 2011. It contains 74 classes and 33 object properties.

The jerm ontology and the provenance ontology has been used as reference ontologies for generating Benchmark data sets used in the benchmark track of OAEI 2011.5. The first one contains 219 classes and 31 properties, while the second one contains 398 classes and 33 properties.

Tests at a glance

Each data set is composed generally of 111 individual tests confronting a reference ontology with a modified version of it. The tests are systematically generated, starting from the reference ontology, and discarding of it a number of information in order to evaluate how the algorithm behave when this information is lacking. Generated tests are identified by a number; this numbering (almost) fully preserves the numbering of the first EON contest. The ontologies in the tests are described in OWL-DL and serialized in the RDF/XML format.

There are 6 categories of alteration:

Name
Name of entities that can be replaced by (R/N) random strings, (S)ynonyms, (N)ame with different conventions, (F) strings in another language than english.
Comments
Comments can be (N) suppressed or (F) translated in another language.
Specialization Hierarchy
can be (N) suppressed, (E)xpansed or (F)lattened.
Instances
can be (N) suppressed
Properties
can be (N) suppressed or (R) having the restrictions on classes discarded.
Classes
can be (E)xpanded, i.e., relaced by several classes or (F)latened.
The table below summarizes what has been retracted in each test from the reference ontology.

#NameComHierInstPropClassComment
101000000Reference alignment
102Irrelevant ontology
103000000Language generalization
104000000Language restriction
201R00000No names
202RN0000No names, no comments
2030N0000No comments (was missspelling)
204C00000Naming conventions
205S00000Synonyms
206FF0000Translation
207F00000
208CN0000
209SN0000
210FN0000
22100N000No specialisation
22200F000Flatenned hierarchy
22300E000Expanded hierarchy
224000N00No instance
2250000R0No restrictions
226No datatypes
227Unit difference
2280000N0No properties
229Class vs instances
23000000FFlattened classes
231*00000EExpanded classes
23200NN00
23300N0N0
236000NN0
23700FN00
23800EN00
23900F0N0
24000E0N0
24100NNN0
24600FNN0
24700ENN0
248NNN000
249NN0N00
250NN00N0
251NNF000
252NNE000
253NNNN00
254NNN0N0
257NN0NN0
258NNFN00
259NNEN00
260NNF0N0
261NNE0N0
262NNNNN0
265NNFNN0
266NNENN0
301Real: BibTeX/MIT
302Real: BibTeX/UMBC
303Real: Karlsruhe
304Real: INRIA

The transformations can be graded applying the alteration with different percentages. For example, the test 201-4 means that the indicated alteration (replacing names with random strings) has been applied to 40 percent of the entities. The lattice of generated tests is displayed below with their derivation relations. The upper the test, the easier it is supposed to be.

Example of a complete benchmark data set

A test data set is made of a set of directories (one per test), each directory containing an ontology (onto.rdf) in OWL. The directories are named according to test numbers; for example, the directory 201 will contain the ontology corresponding to the test 201. Each directory also contains the reference alignments against which the results of the matching process will be evaluated. These alignments follow the Alignment format described here.

Below are provided the set of tests for the reference Bibliographic ontology. As stated before, the reference ontology is in test #101. Then, the matching task will consist in aligning each test ontology with that one of test #101. The resulting alignment must be provided in the format described here. It will be compared against the reference alignment to produce the compliance measurements (mainly precision and recall) for the matching tool for that test. It is, of course, forbidden to use any of the reference alignments for performing the matching task.

The only interesting alignments are those involving classes and properties of the given ontologies. So the alignments should not align individuals, nor entities from the external ontologies.

There is some chance that the final test be improved by adding entity expansion and reduction. It is also possible that there will be a lot of more instances in each ontology.


101) Concept test: Id

This test compares the ontology to itself.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


102) Concept test: ?

This test compares the ontology to a totally irrelevant one.

NOTE: The onto.rdf file is not provided here. It is possible to run the test directly on the true file of a totally irrelevant ontology. For example, you can use the food ontology given in the OWL guide (verbatim)., i.e., http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine.


103) Concept test: Language generalisation

This test compares the ontology with its generalisation in OWL Lite (i.e., unavailable constraints are replaced by the more general available). The generalization basically removes owl:unionOf and owl:oneOf and the Property types (owl:TransitiveProperty).

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


104) Concept test: Language restriction

This test compares the ontology with its restriction in OWL Lite (where unavailable constraints have been discarded).

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


201[-2-4-6-8]) Systematic: No names

Each label or identifier is replaced by a random one.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


202[-2-4-6-8]) Systematic: No names, no comment

Each label or identifier is replaced by a random one. Comments (rdfs:comment and dc:description) have been suppressed as well.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


203) Systematic: Misspelling

A random, but consistent, typo generator should be applied to labels and comments.

Not available in this test (if you know how to do it, contact me).


204) Systematic: Naming conventions

Different naming conventions (Uppercasing, underscore, dash, etc.) are used for labels. Comments have been suppressed.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


205) Systematic: Synonyms

Labels are replaced by synonyms. Comments have been suppressed.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


206) Systematic: Foreign names

The complete ontology is translated to another language than english (French in the current case, but other languages would be fine).

Ontology : [RDF/XML] [RDF/XML in ISO-8859-1] [HTML]
Alignment : [RDF/XML] [RDF/XML in ISO-8859-1] [HTML]

NOTE: You can use alternatively the ISO-Latin-1 (ISO-8859-1) version of the tests by renaming them after their UTF-8 version.


207) Systematic:

Each label or identifieris translated to another language than english (French in the current case, but other languages would be fine).

Ontology : [RDF/XML] [RDF/XML in ISO-8859-1] [HTML]
Alignment : [RDF/XML] [RDF/XML in ISO-8859-1] [HTML]

NOTE: You can use alternatively the ISO-Latin-1 (ISO-8859-1) version of the tests by renaming them after their UTF-8 version.


208) Systematic:

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


209) Systematic:

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


210) Systematic:

Ontology : [RDF/XML] [RDF/XML in ISO-8859-1] [HTML]
Alignment : [RDF/XML] [RDF/XML in ISO-8859-1] [HTML]

NOTE: You can use alternatively the ISO-Latin-1 (ISO-8859-1) version of the tests by renaming them after their UTF-8 version.


221) Systematic: No hierarchy

All subclass assertions to named classes are suppressed.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]

(variation: compile inheritance)


222) Systematic: Flattened hierarchy

A hierarchy still exists but has been strictly reduced.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]

The alignment here contains relations which are not "=" but "<".


223) Systematic: Expanded hierarchy

Numerous intermediate classes are introduced within the hierarchy.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


224) Systematic: No instances

All individuals have been suppressed from the ontology.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


225) Systematic: No restrictions

All local restrictions on properties have been suppressed from the ontology.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]

(variation: no property nor global restrictions on properties)


226) Systematic: No datatypes

In this test all datatypes are converted to xsd:string.

Not available in this test


227) Systematic: Unit differences

(Measurable) values are expressed in different datatypes.

Not available in this test


228) Systematic: No properties

Properties and relations between objects have been completely suppressed.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]

(variation: leave the properties in instances)


229) Systematic: Class vs instances

Some classes have become instances.

Not available in this test.


230) Systematic: Flattening entities

Some components of classes are expanded in the class structure (e.g., year, month, day attributes instead of date).

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]

Here one limitation of the proposed format is that it does not cover alignments such as: journalName = name o journal.


231) Systematic: Multiplying entities

Some classes are spreaded over several classes.

Not available in this test.


232) Systematic: no hierarchy + no instance

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


233) Systematic: no hierarchy + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


236) Systematic: no instance + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


237) Systematic: flattened hierarchy + no instance

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


238) Systematic: expanded hierarchy + no instance

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


239) Systematic: flattened hierarchy + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


240) Systematic: expanded hierarchy + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


241) Systematic: no hierarchy + no instance + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


246) Systematic: flattened hierarchy + no instance + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


247) Systematic: expanded hierarchy + no instance + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


248[-2-4-6-8]) Systematic: scrambled labels + no comments + no hierarchy

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


249[-2-4-6-8]) Systematic: scrambled labels + no comments + no instance

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


250[-2-4-6-8]) Systematic: scrambled labels + no comments + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


251[-2-4-6-8]) Systematic: scrambled labels + no comments + flattened hierarchy

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


252[-2-4-6-8]) Systematic: scrambled labels + no comments + expanded hierarchy

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


253[-2-4-6-8]) Systematic: scrambled labels + no comments + no hierarchy + no instance

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


254[-2-4-6-8]) Systematic: scrambled labels + no comments + no hierarchy + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


257[-2-4-6-8]) Systematic: scrambled labels + no comments + no instance + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


258[-2-4-6-8]) Systematic: scrambled labels + no comments + flattened hierarchy + no instance

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


259[-2-4-6-8]) Systematic: scrambled labels + no comments + expanded hierarchy + no instance

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


260[-2-4-6-8]) Systematic: scrambled labels + no comments + flattened hierarchy + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


261[-2-4-6-8]) Systematic: scrambled labels + no comments + expanded hierarchy + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


262[-2-4-6-8]) Systematic: scrambled labels + no comments + no hierarchy + no instance + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


265) Systematic: scrambled labels + no comments + flattened hierarchy + no instance + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


266) Systematic: scrambled labels + no comments + expanded hierarchy + no instance + no property

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]

Note that the 3xx tests are only here for comparability with previous years. We know that the reference alignments for these tests are not perfect (especially because the ontologies sometimes contain flaws).


301) Real ontology: BibTeX/MIT

For a computer scientist, BibTeX is the starting point for a useful bibliographic ontology. It is of wide use and relatively well thought out. This ontology can be found at and is documented in BibTex in OWL.

This is a test of comparing our test ontology with an actual ontology, simpler and closer to the initial BibTeX ontology. The alignment result contains some inclusion (<) alignment relations.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


302) Real ontology: BibTeX/UMBC

This ontology is very similar to the previous one, even closer to the genuine BibTeX, with different extensions and naming conventions. It can be found at http://ebiquity.umbc.edu.

The alignment result also contains some inclusion (<) alignment relations.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


303) Real ontology: Karlsruhe

This is a test of comparing our test ontology with an actual ontology which contains more items than the actual items used in the current ontology.

The Karlsruhe ontology (http://www.aifb.uni-karlsruhe.de/ontology) is used in the Ontoweb portal. It is a refinement from other ontologies such as (KA)2. As such it does not only defines bibliographic items but many other items.

The alignment contains < as well as > relations.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


304) Real ontology: INRIA

This is a test of comparing our test ontology with an actual ontology which is not equivalent but quite close (it can be though of as a previous version).

This INRIA ontology (fr.inrialpes.exmo.rdf.bib.owl) has been designed by Antoine Zimmermann from the BibTeX in OWL ontology and our Bibliographic XML DTD. Its goal was to gather easily a number of RDF items. These items were BibTeX entries found on the web and transformed in RDF according to this ontology.

The actual hierarchy of this ontology contains classes which are subclasses of several other classes.

Ontology : [RDF/XML] [HTML]
Alignment : [RDF/XML] [HTML]


Versions

The release notes of the test versions can be found here.

Testing your tool

Since 2010, it is not necessary anymore to download the data sets (it has always been better to get it on the web). The SEALS platform will provide the data sets. In the future, it is even planned that it will generate the data set on the fly.

Participants can test their tools using the standard benchmark dataset described above, which can be downloaded here. They can enforce testing with a subset of data sets built with reference ontologies used in previous campaigns, which are stored in the Test Data Repository accessible through the SEALS portal. Note that two of these data sets were built based on the same reference ontologies (biblio and finance) which will be used in OAEI 2012.

All those data sets maintain the structure explained in the Example of a complete benchmark data set section, and testing with those data sets can be done by using the SEALS client. This client iterates over tests in a data set whose identifier is provided as a parameter. In all cases, the ontologies found in the data set directories are matched against the ontology found in 101/onto.rdf. The resulting alignments must be outputted in the alignment format. They are placed in a local directory given also as a parameter to the client.

The whole data sets are available: biblio, jerm, provenance and finance.

The identifiers of data sets for testing with the SEALS client are given below:

Biblio data set

Jerm data set

Provenance data set

Finance data set

We encourage you to use the Alignment API for manipulating and generating your alignments, and, in particular, for computing evaluation of your results.

Acknowledgements

Many resources have been used for setting up the biblio test:

Various people helped testing or suggested improvements and tests:

Contacts

Contact address is Jose-Luis : Aguirre # inria : fr