We have collected all generated alignments and make them available in a zip-file via the following link. These alignments are the raw results that the following report is based on.
>>> download raw results
We conducted experiments by executing each system in its standard setting and we compare precision, recall, F-measure and recall+. The measure recall+ indicates the amount of detected non-trivial correspondences. The matched entities in a non-trivial correspondence do not have the same normalized label.
For ALIN, we allocated 16GB RAM as it needs 12GB around. For the other systems, we ran on a server with 8GB RAM allocated to each matching system. Further, we used the SEALS client to execute our evaluation. For OntoConnect, we took the results from the Hobbit platform in the evaluation.
As before, we slightly changed the way how precision and recall are computed, i.e., the results generated by the SEALS client vary in some cases by 0.5% compared to the results presented below.
In particular, we removed trivial correspondences in the oboInOwlnamespace like
http://...oboInOwl#Synonym = http://...oboInOwl#Synonym
as well as correspondences expressing relations different from equivalence. Using the Pellet reasoner we also checked whether the generated alignment is coherent, i.e., there are no unsatisfiable concepts when the ontologies are merged with the alignment.
In the following, we analyze all participating systems that could generate an alignment. The listing comprises of 11 entries. LogMap participated with different versions, namely LogMap, LogMapBio, and a lightweight version LogMapLite that uses only some core components as previous years. There are three systems which are ATBox, DESKMatcher and OntoConnect participating in the anatomy track this year for the first time. Wiktionary and ALOD2Vec participate in the anatomy track this year for the second time (last time in 2019 and 2018 respectively). The previous time ALIN, AML, LogMap(all versions) and Lily participated in the anatomy track was last year. LogMap is a constant participant since 2011. AML joined the track in 2013. ALIN and Lily joined in 2016. For more details, we refer the reader to the papers presenting the systems. Thus, this year we have 9 different systems (not counting different versions) which generated an alignment.
This year 4 out of 11 systems were able to achieve the alignment task in less than 100 seconds. These are AML, LogMapLite, Wiktionary and LogMap. In 2019 and 2018, there were 5 out of 12 and 6 out of 14 systems respectively which generated an alignment in this time frame. Similarly to the last 8 years, LogMapLite has the shortest run time. Depending on the specific version of the systems, they require between 2 and 65 seconds to match the ontologies. The table shows that there is no correlation between the required time for running and the quality of the generated alignment in specific metric. This result has also been observed in previous OAEI campaigns.
The table also shows the results for F-measure, recall+ and the size of the alignments. Regarding F-measure, the top 3 ranked systems are AML, Lily and LogMapBio. Among these, AML achieved the highest F-measure (0.941). AML and different versions of LogMap show similar results to those from 2019. Lily has an increase on F-measure from 0.833 in 2019 to 0.901 in 2020. ALIN had a notable increase in F-measure from 0.506 in 2017 to 0.813 in last year. This year, ALIN also has an increase to 0.832. Regarding recall+, LogMap and LogMapLite show similar results as earlier. LogMapBio had an increase from 0.756 in 2017 to 0.801 in 2019, but a decrease to 0.74 in 2020. ALIN has a slight increase on recall+ from 0.365 in 2019 to 0.382 in 2020. The new systems in 2020 do not show high results for recall+. Regarding the number of correspondences, some long-term participants such as LogMap, LogMapLite and AML computed a similar number of correspondences as last year. Compared with last year's results, Lily generated 138 more correspondences, ALIN generated 21 more correspondences, LogMapBio generated 63 less correspondences.This year 10 out of 11 systems achieved an F-measure higher than the baseline which is based on (normalized) string equivalence (StringEquiv in the table). Among these 10 systems, ATBox and OntoConnect are new participants.
This year four systems produced coherent alignments which are AML, ALIN, LogMapBio and LogMap.
The number of participating systems varies between the years. In 2020, there is one less participant than in 2019. As noted earlier there are newly-joined systems as well as long-term participants.
Similarly to last year, AML sets the top result for the anatomy track with respect to the F-measure. Following AML, Lily and LogMapBio have better reslut with respect to the F-measure, respectively.
This track is organized by Huanyu Li, Mina Abd Nikooie Pour, Ying Li, and Patrick Lambrix. If you have any problems working with the ontologies, any questions related to tool wrapping, or any suggestions related to the anatomy track, feel free to write an email to oaei-anatomy [at] ida [.] liu [.] se.