We have collected all generated alignments and make them available in a zip-file via the following link. These alignments are the raw results that the following report is based on.
We conducted experiments by executing each system in its standard setting and we compare precision, recall, F-measure and recall+. The measure recall+ indicates the amount of detected non-trivial correspondences. The matched entities in a non-trivial correspondence do not have the same normalized label.
We ran the matchers on a machine with 16GB RAM installed. This year, we used the MELT platform to execute our evaluations for all systems except ALIN, AML and AMD for which we used the SEALS client.
As in earlier years, we slightly changed the way how precision and recall are computed, i.e., the results generated by the MELT and SEALS clients vary in some cases by 0.5% compared to the results presented below.
In particular, we removed trivial correspondences in the oboInOwlnamespace like
http://...oboInOwl#Synonym = http://...oboInOwl#Synonym
as well as correspondences expressing relations different from equivalence.
Using the Pellet reasoner we also checked whether the generated alignment is coherent, i.e., there are no unsatisfiable concepts when the ontologies are merged with the alignment.
In the following, we analyze all participating systems that could generate an alignment. The listing comprises of 15 entries. LogMap participated with different versions, namely LogMap, LogMapBio, and a lightweight version LogMapLite that uses only some core components as in previous years. The five systems TOM, Fine-TOM, LSMatch, OTMapOnto and AMD participated in the anatomy track this year for the first time. The rest of the systems have participated in OAEI for more than two years. The previous time ALIN, AML Lily and and LogMap (all versions) participated in the anatomy track was last year. LogMap is a constant participant since 2011. AML joined the track in 2013. ALIN joined in 2016. For more details, we refer the reader to the papers presenting the systems. Thus, this year we have 12 different systems (not counting different versions) which generated an alignment. For GMap, we used its results running on a different configuration since the system needs more RAM. Thus, the execution time for GMap (marked with * in the table) is not fully comparable to the other systems.
This year 6 out of 15 systems were able to achieve the alignment task in less than 100 seconds. These are AML, LogMapLite, LogMap, AMD, LSMatch and OTMapOnto. In 2020 and 2019, there were 4 out of 11 and 5 out of 12 systems respectively which generated an alignment in this time frame. Similarly to the last 9 years, LogMapLite has the shortest run time. Depending on the specific version of the systems, they require between 2 and 98 seconds to match the ontologies. The table shows that there is no correlation between the required time for running and the quality of the generated alignment for specific metrics. This result has also been observed in previous OAEI campaigns. Additionally, for long-term participants such as AML and LogMap, we observe that execution times do not have obvious changes in the shift from SEALS to MELT.
The table also shows the results for Precision, F-measure, recall+ and the size of the alignments. Regarding F-measure, the top three ranked systems are AML, Lily and LogMapBio. Among these, AML achieved the highest F-measure (0.941). AML and different versions of LogMap show similar results to those from 2020. Regarding recall+, AML, LogMapLite, Lily and Wiktionary show similar results as earlier. LogMapBio had an decrease from 0.801 in 2019 to 0.74 in 2020, but a increase to 0.773 in 2021. ALIN has an increase on recall+ from 0.382 in 2020 to 0.438 in 2021. The new systems in 2020 do not show high values for recall+. Regarding the number of correspondences, some long-term participants such as LogMapLite and AML computed a similar number of correspondences as last year. Compared with last year's results, LogMapBio generated 42 more correspondences.
This year 13 out of 15 systems achieved an F-measure higher than the baseline which is based on (normalized) string equivalence (StringEquiv in the table). Among these 13 systems, AMD, ATMatcher, TOM and Fine-TOM are new participants.This year three systems produced coherent alignments which are AML, LogMapBio and LogMap.
The number of participating systems varies between the years. In 2021, there are four more participants than in 2020. As noted earlier there are newly-joined systems as well as long-term participants.
Similarly to last year, AML sets the top result for the anatomy track with respect to the F-measure. Following AML, Lily has the second and LogMapBio has the third best results of F-measure.
This track is organized by Huanyu Li, Mina Abd Nikooie Pour, Ying Li, and Patrick Lambrix. If you have any problems working with the ontologies, any questions related to tool wrapping, or any suggestions related to the anatomy track, feel free to write an email to oaei-anatomy [at] ida [.] liu [.] se.