Complex track - Populated Enslaved subtrack - Evaluation

There are three alignment systems enrolled in the complex track, which are AMLC, AROA, and CANARD. Besides these three systems, we also evaluated other alignment systems. The systems were run on the Populated Enslaved dataset [1], and output alignments were evaluated as described below.

Evaluation Methodology

In this subtrack, the alignments are automatically evaluated over the populated enslaved dataset which contains around 74k instance data from real-world datasets. In order to assess the quality of a mapping, there are two dimensions that we can look into. We can first evaluate if a mapping contains the correct entities that should be involved based on the reference alignment. Another dimension is the relationship between these entities, e.g., equivalence and subsumption. Based on this, we break the evaluation procedure down into three subtasks, which are entity identification, relationship identification, and full complex identification.

Entity Identification
For each entity in the source ontology, the alignment system is asked to list all of the entities in the target ontologies that are related to it in some way.

For example:
owl:equivalentClasses(ont1:A1 owl:intersectionOf(ont2:B1 owl:someValuesFrom(ont2:B2 ont2:B3))

The goal in this task is to find the most relevant entities in the ont2 to the class ont1:A1. In this case, the best output would be ont2:B1, ont2:B2, and ont2:B3.

The result is evaluated based on precision, recall, and f-measure.

Relationship Identification

For each alignmnet, the system should then endeavor to find the concrete relationships, such as equivalence, subsumption, intersection, value restriction, and so on, that hold between the entities. In terms of the example above, an alignment system needs to eventually determine that the relationship between the two sides is equivalence. Table 1 shows the different similarity that we used in the evaluation for different situations. We do not penalize the incorrect relationship by giving a ZERO value because that would completely neglect the entity identification outputs without considering whether it is a reaonable result or a completely incorrect one.

Table 1. Similarity for Relationship Identification

Found Relation	Correct Relation	Similarity	Comment
=	=	1.0	correct relation
⊂	⊂	1.0	correct relation
⊃	⊃	1.0	correct relation
⊂	=	0.8	return less information, but correct
=	⊃	0.8	return less information, but correct
⊃	=	0.6	return more information, but incorrect
=	⊂	0.6	return more information, but incorrect
⊂	⊃	0.3	incorrect relation
⊃	⊂	0.3	incorrect relation

Full Complex Alignment Identification
This task is a combination of the former two steps. We multiply the results from the entity identification by the similarity of the relationship as the relaxed precision, recall, and f-measure. To be accurate, it could also have been better aggregated with other aggregation functions rather than multiplication. [2]

relaxed_precision = entity_precision * similarity of relationship

relaxed_recall = entity_recall * similarity of relationship

relaxed_f-measure = 2 * relaxed_precision * relaxed_recall/ (relaxed_precision + relaxed_recall)

Results

The output alignments as well as the detailed results of the systems over the Populated enslaved dataset are downloadable here.

AMLC, AROA, CANARD, LogMap, LopMapBio, LogMapKG, and LogMapLt can generate alignment on the populated enslaved dataset. Therein, AMLC, AROA, and CANARD were the only systems can generate the complex alignment on this benchmark. AROA and CANARD both utilized the instance data while AMLC focused on lexical and terminological techniques. Table 2 shows the final performance of each system, including the number of correct simple and complex correspondences found by each system.

Tabel 2. The Performance of All Alignment Systems on the Populated Enslaved Benchmark

Systems	(1:1)	(m:n)	relaxed precision	relaxed recall	relaxed f-measure
reference alignment	15	83	-	-	-
AMLC	12	18	0.73	0.28	0.40
AROA	11	32	0.80	0.38	0.51
CANARD	3	16	0.42	0.13	0.19

Discussion

There are seven systems can produce the complex alignment on Populated Enslaved dataset. Among these found alignments, all correspondences from LogMap, LopMapBio, LogMapKG, and LogMapLt between Enslaved Ontology and Enslaved Wikibase repository are 1-to-1 equivalence. Since the results of these system only contain small number of simple alignment, we do not report the results of these systems this year. Instead, AMLC, AROA, and CANARD can produce complex alignments. The relaxed precision of AMLC and AROA looks fair, while CANARD reports a lower relaxed precision. AROA detected the largest number of the complex correspondences among three systems, while the AMLC outputs the largest number of the simple correspondences. But, it is not surprising that the low recall reflects that the current ontology alignment systems are not capable of identifying more complex relations, which we hope that it will be improved in the next future years.

References

[1] Lu Zhou, Cogan Shimizu, Pascal Hitzler, Alicia M. Sheill, Seila Gonzalez Estrecha, Catherine Foley, Duncan Tarr, Dean Rehberger. The Enslaved Dataset: A Real-world Complex Ontology Alignment Benchmark using Wikibase. In: Conference on Information and Knowledge Management, ACM, 2020.

[2] Marc Ehrig, and Jérôme Euzenat. "Relaxed precision and recall for ontology matching." K-CAP 2005 Workshop on Integrating Ontologies, Banff, Canada, 2005.