Ontology Alignment Evaluation Initiative - OAEI-2018 CampaignOAEI OAEI

Complex track

General description

Complex alignments are more expressive than simple alignments as their correspondences can contain logical constructors or transformation functions of literal values.

For example, given two ontologies o1 and o2:

With this track, we evaluate systems which can generate such correspondences.

The complex track contains 4 datasets about 4 different domains: Conference, Hydrography, GeoLink and Taxon. Each of the dataset and the evaluation methods are presented below.

The participants of the track should output their (complex) correspondences in the EDOAL format. This format is supported by the Alignment API. The evaluation will be supported by the SEALS platform. The participants have to wrap their tool against the SEALS client as described at SEALS evaluation for OAEI 2018. For executing the tasks in each dataset the parameters are listed in boxes below (repository, suite-id, version-id).

The number of ontologies, simple (1:1) and complex (1:n), (m:n) correspondences for each dataset of this track are summarized in the following table.

Dataset #Ontologies #(1:1) #(1:n) #(m:n)
Conference 3 78 79 0
Hydrography 4 113 69 15
GeoLink 2 24 15 72
Taxon 4 6 17 3

Schedule

The schedule is available at the OAEI main page.

Datasets and Evaluation Modalities

Conference dataset

Ontologies and correspondences

This dataset is based on the OntoFarm dataset [1] used in the Conference track of the OAEI campaigns. It is composed of 16 ontologies on the conference organisation domain and simple reference alignments between 7 of them. Here, we consider 3 out of the 7 ontologies from the reference alignments (cmt, conference and ekaw), resulting in 3 alignment pairs.

Conference Testsuite

The correspondences were manually curated by 3 experts following the query rewriting methodology in [2]. For each pair o1-o2 of ontologies, the following steps were applied:

4 experts assessed the curated correspondences to reach a consensus.

Evaluation modalities

The complex correspondences output by the systems will be manually compared to the ones of the consensus alignment.

For this first evaluation, only equivalence correspondences will be evaluated and the confidence of the correspondenes will not be taken into account.

The systems can take the ra1 simple alignments as input. The ra1 alignments can be downloaded here.

Hydrography dataset

Ontologies and correspondences

The hydrography dataset is composed of four source ontologies (Hydro3, HydrOntology_native, HydrOntology_translated, and Cree) that each should be aligned to a single target Surface Water Ontology (SWO). The source ontologies vary in their similarity to the target ontology -- Hydro3 is similar in both language and structure, hydrOntology is similar in structure but is in Spanish rather than English, and Cree is very different in terms of both language and structure.

Hydrography

The alignments were created by a geologist and an ontologist, in consultation with a native Spanish speaker regarding the hydrOntology, and consist of logical relations

Tasks

There are three subtasks in the Hydrography complex alignment track:

  1. Entity Identification

    The researchers are asked to identify the entities involved in a simple or complex alignment including classes and properties.

    For example, this is a complex alignment between the HydrOntology_translated and the SWO:

    Forall x, hydrOntology_translated:Aguas_Corrientes(x) -> swo:SurfaceFeature(x) ∧ swo:Waterbody(x) ∧ swo:hasFlow(x,y) ∧ swo:Flow(y).

    The goal in this task is to find the most related entities in the SWO to the class hydrOntology_translated:Aguas_Corrientes. In this case, the best output would be swo:SurfaceFeature, swo:Waterbody, swo:hasFlow, and swo:Flow.

  2. Relationship Identification

    For each source ontology entity that is involved in a relation, the system will be given the related entities from the target ontology. The system should then endeavor to find the concrete relationships, such as equivalence, subsumption, intersection, value restriction, and so on, that hold between the entities.

  3. Full complex alignment Identification

    This task is a combination of the two former steps.

Evaluation modalities

After we collect the results from matching systems, we will manually evaluate the performance. In task 1, we plan to utilize traditional F-measure as the metric to evaluate the performance. In task 2 and task 3, semantic precision and recall [5] will be applied. In addition, we will post the evaluation scripts shortly for the system developers to evaluate the performance. The reference alignment can be downloaded from here.

GeoLink dataset

Ontologies and correspondences

This dataset is from the GeoLink project, which was funded under the U.S. National Science Foundation's EarthCube initiative. It is composed of two ontologies: the GeoLink Base Ontology (GBO) and the GeoLink Modular Ontology (GMO). The GeoLink project is a real-world use case of ontologies, and instance data is available. The alignment between the two ontologies was developed in consultation with domain experts from several geoscience research institutions. More details can be found in the paper [4].

GeoLink

Tasks

The same three tasks as described for the hydrography dataset apply to this dataset also.

Evaluation modalities

The evaluation of the systems will be performed as for the hydrography dataset. The reference alignment can be downloaded from here.

Taxon dataset

Ontologies and correspondences

The Taxon dataset is composed of 4 ontologies which describe the classification of species: AgronomicTaxon, Agrovoc, DBpedia and TaxRef-LD. All the ontologies are populated. The common scope of these ontologies is plant taxonomy. The alignments were manually created with the help of one expert and involve only logical constructors. This dataset extends the one proposed in [3] by adding the TaxRef-LD ontology.

Evaluation modalities

The evaluation of this dataset is task-oriented. We will evaluate the generated correspondences using a SPARQL query rewriting system and manually measure their ability of answering a set of queries over each dataset. The alignments have to be in EDOAL. The systems will be evaluated on a subset of the dataset (common scope). The evaluation is blind.

Organizers

References

[1] Ondřej Zamazal, Vojtěch Svátek. The Ten-Year OntoFarm and its Fertilization within the Onto-Sphere. Web Semantics: Science, Services and Agents on the World Wide Web, 43, 46-53. 2017.

[2] Élodie Thiéblin, Ollivier Haemmerlé, Nathalie Hernandez, Cassia Trojahn. Task-Oriented Complex Ontology Alignment: Two Alignment Evaluation Sets. In : European Semantic Web Conference. Springer, Cham, 655-670, 2018.

[3] Élodie Thiéblin, Fabien Amarger, Nathalie Hernandez, Catherine Roussey, Cassia Trojahn. Cross-querying LOD datasets using complex alignments: an application to agronomic taxa. In: Research Conference on Metadata and Semantics Research. Springer, Cham, 25-37, 2017.

[4] Lu Zhou, Michelle Cheatham, Adila Krisnadhi, Pascal Hitzler. A Complex Alignment Benchamark: GeoLink Dataset. In: International Semantic Web Conference. Springer, 2018.

[5] Jerome Euzenat: Semantic Precision and Recall for Ontology Alignment Evaluation. International Joint Conference on Artificial Intelligence 2007, 348-353.