Ontology Alignment Evaluation Initiative - OAEI-2021 CampaignOAEI

Evaluation results for the Biodiv track at OAEI 2021


This year, we have 7 track participants out of the 17 OAEI 2021 participating systems. The seven systems (AML, ATMatcher, LogMap, LogMapBio, LogMapLt, ALOD2Vec and KGMatcher) managed to generate an output for at least one of the track tasks.

Experimental setting

We conducted experiments using the MELT client. We executed each system in its standard settings and we calculated precision, recall and f-measure. Systems have been ordered in terms of f-measure. The execution times are calculated considering the whole process pipeline, starting from ontologies upload and environment preparation.

We have run the evaluation on two different machines: a Windows 10 (64-bit) desktop with an Intel Core i7-4770 CPU @ 3.40GHz x 4, allocating 16GB of RAM as well as on a MacOS laptop with a 2 GHz Quad-Core Intel Core i5 and allocating 16GB of RAM.


1. Results for the ENVO-SWEET matching task

Five systems could handle this task. The systems with the highest precision (LogMapLt and ATMatcher) achieved the lowest recall. AML generated a bigger alignment set with a high number of subsumption mappings, it still achieved the best F-measure for the task. It is worth nothing that due the specific structure of the SWEET ontology, a lot of the false positives come from homonyms. KGMatcher generated non meaningful mappings with a very low measure.

System Time (HH:MM:SS) # Mappings Scores
Precision  Recall  F-measure
AML 00:00:47 986 0,745 0,895 0,813
LogMap 00:00:13 675 0,782 0,643 0,705
LogMapLt 00:12:12 576 0,829 0.568 0,684
ATMatcher 00:00:06 572 0,817 0,569 0,671
KGMatcher 00:00:32 2 1,0 0,002 0,005
Table 1: Results for ENVO-SWEET.

2. Results for the ANAEETHES-GEMET matching task

This task's thesauri are developed in SKOS. AML, LogMap and LogMapLt could handle the files in their original format. AML achieves the best results. ATMatcher had a better recall and an acceptable precision. LogMap and LogMapBio took a much longer time due to downloading 10 mediating ontologies from BioPortal, still the gain is not significant in terms of performance. ALOD2Vec generated a huge set of non meaningful mappings and KGMatcher a very low number of mappings, both led to a very low measure.

System Time (HH:MM:SS) # Mappings Scores
Precision  Recall  F-measure
AML 00:00:21 359 0,976 0,764 0,839
ATMatcher 00:00:08 486 0,631 0,919 0,748
LogMapLt 00:00:10 184 0,840 0,458 0,593
LogMapBio 00:19:03 1844 0,177 0,982 0,301
LogMap 00:21:58 1844 0,177 0,982 0,301
ALOD2Vec 00:01:43 5890 0,055 0,973 0,104
KGMatcher 00:00:32 12 0,916 0,033 0,063
Table 2: Results for ANAEETHES-GEMET.

3. Results for the AGROVOC-NALT matching task

This task has been managed only by AML. All other systems failed in generating mappings on both the SKOS and OWL versions of the thesauri. AML achieves good results and a very high recall. It generated a higher number of mappings than the curated reference alignment. We performed a manual assessment of a subset of those mappings to reevaluate the precision and f-measure. This years results are slightly better than those of last years evaluation.

System Time (HH:MM:SS) # Mappings Scores
Precision  Recall  F-measure
AML 00:03:16 18102 0,853 0,904 0,877
Table 3: Results for AGROVOC-NALT.

4. Results for the NCBITAXON-TAXREFLD matching task

This task could not be managed by any system, due to the large size of the considered ontologies. We plan to submit targeted subsets of the ontologies for the upcoming edition of OAEI.


This evaluation has been run by Naouel Karam, Alsayed Algergawy and Amir Laadhar. If you have any problems working with the ontologies, any questions related to tool wrapping, or any suggestions related to the Biodiv track, feel free to write an email to: naouel [.] karam [at] fokus [.] fraunhofer [.] de