We have collected all generated alignments and make them available in a zip-file via the following link. These alignments are the raw results that the following report is based on.
We conducted experiments by executing each system in its standard setting and we compare precision, recall, F-measure and recall+. The measure recall+ indicates the amount of detected non-trivial correspondences. The matched entities in a non-trivial correspondence do not have the same normalized label. The approach that generates only trivial correspondences is depicted as baseline StringEquiv in the following section.
We run the systems on a server with 3.46 GHz (6 cores) and 8GB RAM allocated to each matching system.
Further, we used the SEALS client to execute our evaluation.
However, we slightly changed the way how precision and recall are computed, i.e., the results generated by the SEALS client vary in some cases by 0.5% compared to the results presented below.
In particular, we removed trivial correspondences in the oboInOwlnamespace like
http://...oboInOwl#Synonym = http://...oboInOwl#Synonym
as well as correspondences expressing relations different from equivalence.
Using the Pellet reasoner we also checked whether the generated alignment is coherent, i.e., there are no unsatisfiable concepts when the ontologies are merged with the alignment.
In the following, we analyze all participating systems that could generate an alignment. The listing comprises of 11 entries. LogMap participated with different versions, namely LogMap, LogMapBio, and a lightweight version LogMapLite that uses only some core components as previous years. There are a number of systems which participate in the anatomy track for the first time. These are KEPLER, POMap, SANOM, WikiV2 and YAM-BIO. On the other hand, the previous time ALIN, AML, LogMap(all versions) and XMap participated in the anatomy track is last year. LogMap is a constant participant since 2011. AML and XMap joined the track in 2013. For more details, we refer the reader to the papers presenting the systems. Thus, this year we have 9 different systems (not counting different versions) which generated an alignment.
This year 5 out of 11 systems were able to achieve the alignment task in less than 100 seconds. These are LogMapLite, LogMap, XMap, AML and YAM-BIO. In 2016 and 2015, there were 4 out of 13 systems and 6 out of 15 systems respectively which generated an alignment in this time frame. Similarly to the last 5 years LogMapLite has the shortest runtime. Depending on the specific version of the systems, they require between 19 and 70 seconds to match the ontologies. The table shows that there is no correlation between quality of the generated alignment in terms of precision and recall and required runtime. This result has also been observed in previous OAEI campaigns.
The table also shows the results for F-measure, recall+ and the size of alignments. Regarding F-measure, the top 5 ranked systems are AML, YAM-BIO, POMap, LogMapBio and XMap. Among these, AML achieved the highest F-measure (0.943). All of the long-term participants in the track showed comparable results in terms of F-measure to their last year's results and at least as good as the results of the best systems in OAEI 2007-2010. Regarding recall+, AML, LogMap, LogMapLite showed similar results as earlier. LogMapBio has a slight increase from 0.728 in 2016 to 0.733 in 2017. XMap decreases a bit from 0.647 to 0.639. Two new participants obtained good results for recall+, POMap scored 0.824 (second place) followed by YAM-BIO with 0.794 (third place). In terms of the number of correspondences, long-term participants computed similar number of correspondences as last year. AML and LogMap generated the same number of correspondences, LogMapBio generated 3 more correspondences, LogMapLite generated 1 more, ALIN generated 6 more and XMap generated 1 less.
This year 10 out of 11 systems achieved an F-measure higher than the baseline which is based on (normalized) string equivalence (StringEquiv in the table). This is a slightly better result (percentage-wise) compared to the last year when 9 out of 13.This year five systems produced coherent alignments which is comparable to the last two years when 7 out of 13 and 5 out of 10 systems achieved this. Two of the five best systems with respect to F-measure (YAM-BIO and POMap) produced incoherent alignments.
The number of participating systems varies between the years and is this year lower than 2016 and 2015 but one more than 2014. As noted earlier there are newly-joined systems as well as long-term participants.
The systems that participated in the previous edition in 2016 scored similarly to their previous results. Same as the last year, the AML system set the top result for anatomy track with respect to the F-measure. Two of the newly-joined systems (YAM-BIO and POMap) achieved 2nd and 3rd best score in terms of F-measure this year.
This track is organized by Huanyu Li, Zlatan Dragisic, Valentina Ivanova and Patrick Lambrix. If you have any problems working with the ontologies, any questions related to tool wrapping, or any suggestions related to the anatomy track, feel free to write an email to oaei-anatomy [at] ida [.] liu [.] se.