Ontology Alignment Evaluation Initiative - OAEI-2015 Campaign

Interactive Track

Description

The growth of the ontology alignment area in the past ten years has led to the development of many ontology alignment tools. After several years of experience in the OAEI, we observed that the results can only be slightly improved in terms of the alignment quality (precision/recall resp. F-Measure). Based on this insight, it is clear that fully automatic ontology matching approaches slowly reach an upper bound of the alignment quality they can achieve. A work by (Jimenez-Ruiz et al., 2012) has shown that simulating user interactions with 30% error rate during the alignment process has led to the same results as non-interactive matching. Thus, in addition to the validation of the automatically generated alignments by domain experts, we believe that there is further room for improving the quality of the generated alignments by incorporating user interaction. User involvement during the matching process has been identified as one of the challenges in front of the ontology alignment community by (Shvaiko et al., 2013) and user interaction with a system is an integral part of it.

At the same time with the tendency of increasing ontology sizes, the alignment problem also grows. It is not feasible for a user to, for instance validate all candidate mappings generated by a system, i.e., tool developers should aim at reducing unnecessary user interventions. All required efforts of the human have to be taken into account and it has to be in an appropriate proportion to the result. Thus, beside the quality of the alignment, other measures like the number of interactions are interesting and meaningful to decide which matching system is best suitable for a certain matching task. By now, all OAEI tracks focus on fully automatic matching and semi-automatic matching is not evaluated although such systems already exist, e.g., overview in (Ivanova et al., 2015). As long as the evaluation of such systems is not driven forward, it is hardly possible to systematically compare the quality of interactive matching approaches.

Goal

The OAEI's interactive track aims at offering a systematic and automated evaluation of matching systems with user interaction to compare the quality of interactive matching approaches in terms of F-measure and number of required interactions. To this end we rely on the datasets of the OAEI 2015 conference, anatomy and largebio data sets. We have used the reference alignments of each track as oracles in order to simulate the interaction with a domain expert (see (Jimenez-Ruiz et al., 2012) and (Paulheim et al., 2013)).

In this track we currently focus on one of the challenges regarding user interaction. The goal of this track is to show in general that the exploitation of user interaction allows further improving the results of ontology matching systems in terms of F-measure. We would also like to see which semi-automatic methods exist, which ones perform best, and which ones need the smallest amount of interactions, i.e., make best use of the scarce resource of users' time. Beside the amount of user interactions, the type of the interaction and the involvement time is interesting. Do matching systems involve the user interaction before or during the process? Do they ask the user only to verify single correspondences or complete alignments? Altogether, we aim to promote the development of semi-automatic ontology matching systems and methods to overcome the limitations which are caused by fully automatic techniques. Furthermore, the track will encourage a discussion of different interactive matching techniques as well as a set of relevant interaction primitives. Currently, this track does not evaluate the user experience or the user interfaces of the systems.

Evaluation

The evaluation of this track will also be run with support of SEALS. This requires that you wrap your matching system in a way that allows us to execute it on the SEALS platform (see OAEI 2015 evaluation details). Note that in this track we allow systems to interact with the SEALS client to check if a given alignment is correct or not (see additional methods for the interactive track).

We will also simulate domain experts with variable error rate (see (Jimenez-Ruiz et al., 2012)) which reflects a more realistic scenario where a (simulated) user does not necessarily provide always a correct answer. In these scenarios asking a large number of questions to the user may also have a negative impact. The error rates (0..1) that will be simulated are: 0.1, 0.2 and 0.3. This can be controlled as an input parameter when running the SEALS client, please refer to the OAEI 2015 tutorial.

Data sets

The interactive track relies on the datasets of the OAEI 2015 conference, anatomy and Largebio datasets.

If you use the 2015 version of the SEALS OMT client please refer to the identifiers of each othe tracks (conference, anatomy and largebio) and use the -i parameter as described in the OAEI 2015 tutorial.

This is the required input if you use the 2014 version of the SEALS OMT client:

Conference

This data set covers 16 ontologies describing the domain of conference organization. Over the last years, the quality of the generated alignments has been constantly increased but only to small amount (by a few percent). In 2013, the best system according to F-Measure (YAM++) achieves a value of 71%. This shows that there is significant room for improvement, which could be filled by interactive means.

Id of the task if you use the 2014 version of the SEALS OMT client:

Anatomy

The Anatomy track consists of finding an alignment between the Adult Mouse Anatomy and a part of the NCI Thesaurus (describing the human anatomy).

Id of the task if you use the 2014 version of the SEALS OMT client:

LargeBio: FMA-NCI

This task consists of matching the whole FMA and NCI ontologies, which contains 78,989 and 66,724 classes, respectively. The best results without interaction in terms of F-measure were achieved by YAM++ (87%) in 2013.

Id of the task if you use the 2014 version of the SEALS OMT client:

LargeBio: FMA-SNOMED

This task consists of matching the whole FMA that contains 78,989 classes with a large SNOMED fragment that contains 122,464 classes (40% of SNOMED). The best results without interaction in terms of F-measure were achieved by YAM++ (82%) in 2013.

Id of the task if you use the 2014 version of the SEALS OMT client:

LargeBio: SNOMED-NCI

This task consists of matching the whole NCI that contains 66,724 classes with a large SNOMED fragment that contains 122,464 classes (40% of SNOMED). The best results without interaction in terms of F-measure were achieved by AML (75%) in 2014.

Id of the task if you use the 2014 version of the SEALS OMT client:

References

Heiko Paulheim, Sven Hertling, Dominique Ritze. "Towards Evaluating Interactive Ontology Matching Tools". ESWC 2013. [pdf]

Ernesto Jimenez-Ruiz, Bernardo Cuenca Grau, Yujiao Zhou, Ian Horrocks. "Large-scale Interactive Ontology Matching: Algorithms and Implementation". ECAI 2012. [pdf]

Valentina Ivanova, Patrick Lambrix, Johan Åberg. "Requirements for and evaluation of user support for large-scale ontology alignment". ESWC 2015. [publisher page]

Pavel Shvaiko, Jérôme Euzenat. "Ontology matching: state of the art and future challenges". Knowledge and Data Engineering 2013. [publisher page]

Acknowledgements

We thank Dominique Ritze and Heiko Paulheim, the organisers of the 2013 and 2014 editions of this track, who were very helpful in the setting up of the 2015 edition.

The track is partially supported by the Optique project.

Contact

This track is currently organized by Zlatan Dragisic, Daniel Faria, Valentina Ivanova, Ernesto Jimenez Ruiz, Patrick Lambrix, and Catia Pesquita.

If you have any problems working with the ontologies or any suggestions related to this track, feel free to write an email to ernesto [at] cs [.] ox [.] ac [.] uk or ernesto [.] jimenez [.] ruiz [at] gmail [.] com