We have collected all generated alignments and make them available in a zip-file via the following link. These alignments are the raw results that the following report is based on.
>>> download raw results
We conducted experiments by executing each system in its standard setting and we compare precision, recall, F-measure and recall+. The measure recall+ indicates the amount of detected non-trivial correspondences. The matched entities in a non-trivial correspondence do not have the same normalized label. The approach that generates only trivial correspondences is depicted as baseline StringEquiv in the following section.
For SANOM, we took the results from the Hobbit platform in the evaluation.
For the other systems, we ran on a server with 3.46 GHz (6 cores) and 8GB RAM allocated to each matching system.
Further, we used the SEALS client to execute our evaluation.
However, we slightly changed the way how precision and recall are computed, i.e., the results generated by the SEALS client vary in some cases by 0.5% compared to the results presented below.
In particular, we removed trivial correspondences in the oboInOwlnamespace like
http://...oboInOwl#Synonym = http://...oboInOwl#Synonym
as well as correspondences expressing relations different from equivalence. Using the Pellet reasoner we also checked whether the generated alignment is coherent, i.e., there are no unsatisfiable concepts when the ontologies are merged with the alignment.
In the following, we analyze all participating systems that could generate an alignment. The listing comprises of 14 entries. LogMap participated with different versions, namely LogMap, LogMapBio, and a lightweight version LogMapLite that uses only some core components as previous years. There are three systems which are ALOD2Vec, DOME, and Holontology participating in the anatomy track this year for the first time. Meanwhile, five systems participated for the second time. Two of them are POMAP++ with the previous version POMap in 2017 and FCAMapX with the previous version FCA_MAP in 2016 respectively. Lily participated in the anatomy track for the first time in 2016. The other two systems are KEPLER and SANOM which participated in the anatomy track in the last year for the first time. The previous time ALIN, AML, LogMap(all versions) and XMap participated in the anatomy track was last year. LogMap is a constant participant since 2011. AML and XMap joined the track in 2013. ALIN joined in 2016. For more details, we refer the reader to the papers presenting the systems. Thus, this year we have 12 different systems (not counting different versions) which generated an alignment.
This year 6 out of 14 systems were able to achieve the alignment task in less than 100 seconds. These are LogMapLite, DOME, LogMap, XMap, AML and ALOD2Vec. In 2017 and 2016, there were 5 out of 11 systems and 4 out of 13 systems respectively which generated an alignment in this time frame. Similarly to the last 6 years, LogMapLite has the shortest run time. Depending on the specific version of the systems, they require between 18 and 75 seconds to match the ontologies. The table shows that there is no correlation between the required time for running and the quality of the generated alignment in specific metric. This result has also been observed in previous OAEI campaigns.
The table also shows the results for F-measure, recall+ and the size of the alignments. Regarding F-measure, the top 5 ranked systems are AML, LogMapBio, POMAP++, XMap, and LogMap. Among these, AML achieved the highest F-measure (0.943). All of the long-term participants in the track showed comparable results in terms of F-measure to their results last year and at least as good as the results of the best systems in OAEI 2007-2010. ALIN has a notable increase in F-measure from 0.506 in 2017 to 0.758 in 2018. Regarding recall+, AML, LogMap, LogMapLite show similar results as earlier. LogMapBio has a slight increase from 0.728 in 2016 to 0.733 in 2017, further to 0.756 in 2018. The new systems and systems with new versions in 2018 do not show high results for recall+. Regarding the number of correspondences, some long-term participants computed a similar number of correspondences as last year. AML and LogMap generated the same number of correspondences, LogMapBio generated 16 more correspondences, LogMapLite generated 1 less and XMap generated 1 more. On the other hand, ALIN generated 412 more correspondences because of the withdrawal of additional criteria for the automatic classification of mappings at the beginning of its execution.This year 11 out of 14 systems achieved an F-measure higher than the baseline which is based on (normalized) string equivalence (StringEquiv in the table). Among these 11 systems, ALOD2Vec is a new participant. Regarding systems participating in the anatomy track this year for the second time, SANOM shows increases in F-measure from 0.828 to 0.865 and Recall+ from 0.419 to 0.632 respectively, KEPLER and Lily keep the same performance, both POMAP++ (POMap in 2017) and FCAMapX (FCA_Map in 2016) have decreases in F-measure and Recall+.
This year five systems produced coherent alignments which is the same as last year.
The number of participating systems varies between the years. In 2018, there were three more participants than in 2017. As noted earlier there are newly-joined systems as well as long-term participants.
Same as the last year, AML sets the top result for the anatomy track with respect to the F-measure. LogMapBio with a slight increase in comparison with last year achieves the 2nd best score, followed by POMAP++ and XMap with very close similar results.
This track is organized by Huanyu Li and Patrick Lambrix. If you have any problems working with the ontologies, any questions related to tool wrapping, or any suggestions related to the anatomy track, feel free to write an email to oaei-anatomy [at] ida [.] liu [.] se.