Complex track - Evaluation

General description

The complex track aims at evaluating the systems which can generate complex correspondences. The detailed description of each dataset can be found at the OAEI Complex track page

Conference dataset

We had only one participant: Matcha system. Matcha delivered simple equivalences, complex correspondences, and subsumptions. Within this track only complex correspondences have been evaluated. With regard to complex correspondences no TPs were identified. All complex correspondences contained intersections (of classes and also of classes and properties). While the intersection of classes is an important construct, EDOAL does not allow for directly intersecting a class and a property. Instead, class restrictions should be applied, such as AttributeDomainRestriction.

Alignments from Matcha:

Other datasets

The following results relate to seven datasets: Populated Conference, Hydrography, GeoLink, Populated GeoLink, Populated Enslaved, Taxon, and Biomedical. All datasets and references can be found at https://github.com/liseda-lab/complex-OM-benchmark.

This year we had the participation of two systems: CMatch and Matcha. CMatch produces simple and complex 1:n mappings, and Matcha produces simple, subsumption and complex 1:n mappings. Matcha did not format mappings according to the expected EDOAL format. The results from three systems that competed previously are presented as baselines: AMLC, AROA, and CANARD (using alignments from OAEI 2020).

All alignments and evaluation results can be consulted here.

Graph Edit Distance (GED) evaluation

This evaluation converts both mapping and reference into graphs and calculated the edit distance between them. This metric evaluates simple 1:1 mappings and complex 1:n mappings, regardless of relation. Best results in each row by F-measure are in bold.

Matcha achieved the best F-measure overall, possibly explained by the fact that it produced more simple mappings than the other systems which improved the overall performance. Right behind is CANARD, which produces very consistent results and is less computationally heavy than Matcha, since Matcha uses language models in its strategy. CMatch achieves solid precision results but performs worse in terms of recall, resulting in lower values of F-measure. It is interesting to note that AROA outperforms all other systems in the single task it participated. Only Matcha participated in the complex multi-ontology matching task of the biomedical domain, so it will be interesting to see how other systems tackle this new challenge in the future.

	AMLC (2020)			AROA (2020)			CANARD (2020)			CMatch (2025)			Matcha (2025)
	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure
(populated) cmt-conference	-	-	-	-	-	-	0.284	0.464	0.352	-	-	-	-	-	-
(populated) cmt-confOf	-	-	-	-	-	-	0.292	0.496	0.367	-	-	-	-	-	-
(populated) cmt-edas	-	-	-	-	-	-	0.275	0.487	0.352	-	-	-	-	-	-
(populated) cmt-ekaw	-	-	-	-	-	-	0.275	0.330	0.300	-	-	-	-	-	-
(populated) conference-confOf	-	-	-	-	-	-	0.388	0.483	0.430	-	-	-	-	-	-
(populated) conference-edas	-	-	-	-	-	-	0.397	0.423	0.410	-	-	-	-	-	-
(populated) conference-ekaw	-	-	-	-	-	-	0.252	0.256	0.254	-	-	-	-	-	-
cmt-conference	0.557	0.080	0.139	-	-	-	-	-	-	1.000	0.029	0.056	0.526	0.365	0.431
cmt-confOf	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
cmt-edas	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
cmt-ekaw	0.255	0.0.60	0.097	-	-	-	-	-	-	0.615	0.181	0.280	0.529	0.327	0.404
conference-confOf	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
conference-edas	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
conference-ekaw	0.374	0.108	0.167	-	-	-	-	-	-	-	-	-	0.587	0.595	0.591
cree-swo	-	-	-	-	-	-	-	-	-	0.000	0.000	0.000	0.500	0.064	0.113
hydro3-swo	-	-	-	-	-	-	-	-	-	0.319	0.154	0.208	0.645	0.507	0.568
hydroOntology_native-swo	-	-	-	-	-	-	-	-	-	-	-	-	0.372	0.062	0.106
hydroOntology_translated-swo	-	-	-	-	-	-	-	-	-	0.307	0.118	0.170	0.421	0.303	0.353
gbo-gmo	0.000	0.000	0.000	0.472	0.492	0.482	-	-	-	0.644	0.176	0.276	0.287	0.467	0.355
popgbo-popgmo	0.000	0.000	0.000	-	-	-	0.481	0.453	0.467	-	-	-	0.245	0.391	0.301
enslaved wikidata	-	-	-	-	-	-	0.049	0.188	0.078	-	-	-	0.048	0.688	0.091
hp	-	-	-	-	-	-	-	-	-	-	-	-	0.465	0.465	0.465
mp	-	-	-	-	-	-	-	-	-	-	-	-	0.529	0.529	0.529
wbp	-	-	-	-	-	-	-	-	-	-	-	-	0.554	0.554	0.554

Tree Edit Distance (TED)

This metric compares the matcher proposed alignment with the reference alignment using Tree Edit Distance (TED) and finds the most similar pairs of correspondences using an assignment algorithm. The metric also follows desired properties for complex matching evaluation based on reference alignments.

	AMLC (2020)			AROA (2020)			CANARD (2020)			CMatch (2025)			Matcha (2025)
	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure
(populated) cmt-conference	-	-	-	-	-	-	0.183	0.566	0.277	-	-	-	-	-	-
(populated) cmt-confOf	-	-	-	-	-	-	0.270	0.588	0.370	-	-	-	-	-	-
(populated) cmt-edas	-	-	-	-	-	-	0.226	0.712	0.343	-	-	-	-	-	-
(populated) cmt-ekaw	-	-	-	-	-	-	0.218	0.468	0.297	-	-	-	-	-	-
(populated) conference-confOf	-	-	-	-	-	-	0.271	0.467	0.343	-	-	-	-	-	-
(populated) conference-edas	-	-	-	-	-	-	0.228	0.510	0.315	-	-	-	-	-	-
(populated) conference-ekaw	-	-	-	-	-	-	0.171	0.506	0.256	-	-	-	-	-	-
cmt-conference	0.000	0.000	0.000	-	-	-	-	-	-	0.000	0.000	0.000	0.000	0.000	0.000
cmt-confOf	-	-	-	-	-	-	-	-	-	0.750	0.088	0.158	-	-	-
cmt-edas	-	-	-	-	-	-	-	-	-	0.667	0.077	0.138	-	-	-
cmt-ekaw	0.000	0.000	0.000	-	-	-	-	-	-	0.667	0.176	0.279	0.000	0.000	0.000
conference-confOf	-	-	-	-	-	-	-	-	-	0.000	0.000	0.000	-	-	-
conference-edas	-	-	-	-	-	-	-	-	-	1	0.040	0.077	-	-	-
conference-ekaw	0.000	0.000	0.000	-	-	-	-	-	-	0.000	0.000	0.000	0.000	0.000	0.000
cree-swo	-	-	-	-	-	-	-	-	-	0.286	0.013	0.025	0.077	0.011	0.019
hydro3-swo	-	-	-	-	-	-	-	-	-	0.561	0.228	0.325	0.000	0.000	0.000
hydroOntology_native-swo	-	-	-	-	-	-	-	-	-	0.553	0.033	0.062	0.079	0.004	0.008
hydroOntology_translated-swo	-	-	-	-	-	-	-	-	-	-	-	-	0.569	0.202	0.299
gbo-gmo	0.013	0.004	0.006	0.706	0.267	0.388	-	-	-	0.773	0.056	0.104	0.003	0.002	0.002
popgbo-popgmo	-	-	-	-	-	-	0.448	0.182	0.258	-	-	-	0.002	0.004	0.003
enslaved wikidata	0.005	0.002	0.003	-	-	-	0.218	0.166	0.189	0.000	0.000	0.000	0.001	0.002	0.001
taxon-agrovoc	-	-	-	-	-	-	0.580	0.281	0.379	0.071	0.002	0.004	-	-	-
taxon-dbpedia	-	-	-	-	-	-	0.211	0.286	0.243	0.333	0.040	0.071	-	-	-
taxon-taxref	-	-	-	-	-	-	0.309	0.295	0.302	0.000	0.000	0.000	-	-	-

Class evaluation

In this evaluation, the classes used in each mapping were evaluated against the classes used by the reference. This metric evaluates all mappings: simple 1:1, complex 1:n, and complex n:m. False positives are not presented in this summarised table.

In general, the results mostly match the results obtained by the other evaluations. All systems obtain a fair amount of fully correct class sets in the tasks they participate. It is interesting to note that CANARD is the only system that consistently achieves more class sets in the "contains" category, while the categories of "contained" and "overlap" achieve lower counts across all systems. Counting all "correct" hits, AMLC achieves 0 across 5 tasks, AROA achieves 18 in 1 task, CANARD achieves 90 across 9 tasks, CMatch achieves 17 across 7 tasks, and Matcha achieves 100 across 10 tasks (discarding the biomedical tasks where only Matcha competed).

Looking at the overall results, Matcha and CANARD obtain mostly comparable results.Matcha achieves less "correct" and "contains" class sets in the conference tasks but also less "incorrect", with the inverse being true for the GeoLink and Populated Enslaved tasks where it finds more "incorrect" and mostly equal counts of the other categories.

	AMLC (2020)					AROA (2020)					CANARD (2020)					CMatch (2025)					Matcha (2025)
	Correct	Contains	Contained	Overlap	Incorrect	Correct	Contains	Contained	Overlap	Incorrect	Correct	Contains	Contained	Overlap	Incorrect	Correct	Contains	Contained	Overlap	Incorrect	Correct	Contains	Contained	Overlap	Incorrect
cmt-conference	0	0	0	2	2	-	-	-	-	-	14	4	1	4	41	1	0	0	0	0	8	1	2	0	7
cmt-confOf	-	-	-	-	-	-	-	-	-	-	7	2	0	0	12	-	-	-	-	-	-	-	-	-	-
cmt-edas	-	-	-	-	-	-	-	-	-	-	12	8	0	2	29	-	-	-	-	-	-	-	-	-	-
cmt-ekaw	0	0	0	0	5	-	-	-	-	-	7	10	1	0	31	5	0	0	0	1	8	1	2	0	8
conference-confOf	-	-	-	-	-	-	-	-	-	-	10	2	0	0	13	-	-	-	-	-	-	-	-	-	-
conference-edas	-	-	-	-	-	-	-	-	-	-	9	1	0	0	13	-	-	-	-	-	-	-	-	-	-
conference-ekaw	0	0	0	2	3	-	-	-	-	-	14	5	1	1	35	-	-	-	-	-	14	0	1	0	10
cree-swo	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	0	0	0	0	0	3	0	0	0	6
hydro3-swo	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	5	0	1	0	3	13	0	1	0	4
hydroOntology_native-swo	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	0	1	0	0	13	2	0	0	0	5
hydroOntology_translated-swo	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	13	1	0	3	28
gbo-gmo	0	0	0	0	0	18	0	1	3	12	-	-	-	-	-	6	0	0	0	4	17	0	0	0	14
popgbo-popgmo	0	0	0	0	0	-	-	-	-	-	14	14	0	2	23	-	-	-	-	-	11	6	44	0	126
enslaved wikidata	-	-	-	-	-	-	-	-	-	-	3	9	0	0	44	0	0	0	2	0	11	0	0	0	112
hp	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	826	156	393	2366	2246
mp	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	2441	689	543	3131	3318
wbp	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	56	101	14	401	390

Organizers

Cassia Trojahn (IRIT, Toulouse, France), cassia [.] trojahn [at] irit [.] fr
Ondřej Zamazal (Prague University of Economics and Business), ondrej [.] zamazal [at] vse [.] cz
Marta Silva (University of Lisbon), mcdsilva [at] fc [.] ul [.] pt
Guilheme Sousa (IRIT, Toulouse, France), Guilherme [.] Santos [-] Sousa [at] irit.fr

References

[1] Ondřej Zamazal, Vojtěch Svátek. The Ten-Year OntoFarm and its Fertilization within the Onto-Sphere. Web Semantics: Science, Services and Agents on the World Wide Web, 43, 46-53. 2017.

[2] Élodie Thiéblin, Ollivier Haemmerlé, Nathalie Hernandez, Cassia Trojahn. Task-Oriented Complex Ontology Alignment: Two Alignment Evaluation Sets. In : European Semantic Web Conference. Springer, Cham, 655-670, 2018.

[3] Élodie Thiéblin, Fabien Amarger, Nathalie Hernandez, Catherine Roussey, Cassia Trojahn. Cross-querying LOD datasets using complex alignments: an application to agronomic taxa. In: Research Conference on Metadata and Semantics Research. Springer, Cham, 25-37, 2017.

[4] Lu Zhou, Michelle Cheatham, Adila Krisnadhi, Pascal Hitzler. A Complex Alignment Benchamark: GeoLink Dataset. In: International Semantic Web Conference. Springer, Proceedings, Part II, pp. 273-288, 2018.

[5] Marc Ehrig, and Jérôme Euzenat. "Relaxed precision and recall for ontology matching." K-CAP 2005 Workshop on Integrating Ontologies, Banff, Canada, 2005.

[6] Élodie Thiéblin. Do competency questions for alignment help fostering complex correspondences?. In EKAW Doctoral Consortium, 2018.

[7] Élodie Thiéblin, Fabien Amarger, Ollivier Haemmerlé, Nathalie Hernandez, Cassia Trojahn. Rewriting SELECT SPARQL queries from 1:n complex correspondences. In: Ontology Matching, pp. 49-60, 2016.

[8] Silva, M. C., Faria, D., & Pesquita, C. (2024). Complex Multi-Ontology Alignment Through Geometric Operations on Language Embeddings. In ECAI 2024 (pp. 1333–1340). IOS Press. https://doi.org/10.3233/FAIA240632

[9] Aric A. Hagberg, Daniel A. Schult and Pieter J. Swart, “Exploring network structure, dynamics, and function using NetworkX”, in Proceedings of the 7th Python in Science Conference (SciPy2008), Gäel Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA), pp. 11–15, Aug 2008

[10] Silva, M. C., Faria, D., & Pesquita, C. (2025). CMOMgen: Complex Multi-Ontology Alignment via Pattern-Guided In-Context Learning (No. arXiv:2510.21656). arXiv. https://doi.org/10.48550/arXiv.2510.21656

[11] Gargano MA et al., The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res. 2024 Jan 5;52(D1):D1333-D1346. doi: 10.1093/nar/gkad1005. PMID: 37953324; PMCID: PMC10767975., https://doi.org/10.1093/nar/gkad1005