We have collected all generated alignments and make them available in a zip-file via the following link. These alignments are the raw results that the following report is based on.
>>> download raw results
We conducted experiments by executing each system in its standard setting and we compare precision, recall, F-measure and recall+. The measure recall+ indicates the amount of detected non-trivial correspondences. The matched entities in a non-trivial correspondence do not have the same normalized label. The approach that generates only trivial correspondences is depicted as baseline StringEquiv in the following section.
We run the systems on a server with 3.46 GHz (6 cores) and 8GB RAM allocated to each matching system. Further, we used the SEALS
client to execute our evaluation. However, we slightly changed the way how precision and recall are computed, i.e., the results generated by the SEALS client vary in some cases by 0.5% compared to the results presented below. In particular, we removed trivial correspondences in the oboInOwlnamespace like
http://...oboInOwl#Synonym = http://...oboInOwl#Synonym
as well as correspondences expressing relations different from equivalence. Using the Pellet reasoner we also checked whether the generated alignment is coherent, i.e., there are no unsatisfiable concepts when the ontologies are merged with the alignment.
In the following, we analyze all participating systems that could generate an alignment. The listing comprises of 13 entries. As previous years some of the systems participated with different versions. LogMap participated with LogMap, LogMapBio and a lightweight version LogMapLite that uses only some core components. Similarly, DKP-AOM also participated with two versions, DKP-AOM and DKP-AOM-Lite. There are a number of systems which participate in the anatomy track for the first time. These are Alin, FCA_Map, DLPHOM and LYAM. While every year there are several new participants in this track, there are also systems participating for several years in a row. LogMap is a constant participant since 2011. AML and XMap joined the track in 2013. DKP-AOM, Lily and CroMatcher participate for the second year in a row in this track. Lily participated in the track back in 2011. CroMatcher participated in 2013 but did not produce an alignment within the given time frame back then. For more details, we refer the reader to the papers presenting the systems. Thus, this year we have 10 different systems (not counting different versions) which generated an alignment.
Unlike the last two editions of the track when 6 systems genereated an alignment in less than 100 seconds, this year only 4 of them were able to complete the alignment task in this time frame. These are AML, XMap, LogMap and LogMapLite. Similarly to the last 4 years LogMapLite has the shortest runtime, followed by LogMap, XMap and AML. Depending on the specific version of the systems, they require between 20 and 50 seconds to match the ontologies. The table shows that there is no correlation between quality of the generated alignment in terms of precision and recall and required runtime. This result has also been observed in previous OAEI campaigns.
The table also shows the results for precision, recall and F-measure. In terms of F-measure, the top 5 ranked systems are AML, CroMatcher, XMap, LogMapBio and FCA_Map. LogMap is sixth with a F-measure very close to FCA_Map. All of the long-term participants in the track showed comparable results (in term or F-measure) to their last year's results and at least as good as the results of the best systems in OAEI 2007-2010. LogMap and XMap generated the same number of correspondences in their alignment (XMap generated one correspondence more). AML and LogMapBio generated a slightly different number - 16 correspondences more for AML and 18 less for LogMapBio. The results for the DKP-AOM systems are identical this year; in comparison the last year the lite version performed significantly better in terms of the observed measures. After Lily improved its results in 2015 in comparison to 2011 (precision: from 0.814 to 0.870, recall: from 0.734 to 0.793 and the F-measure: from 0.772 to 0.830), this year it performed similarly to the last year. CroMatcher improved its results in comparison to its results last year. Out of all systems participating in the anatomy track CroMatcher showed the largest improvement in the observed measures in comprarison to its values from the previous edition of the track. Comparing the F-measures of the new systems, FCA_Map scored (0.882) very close to one of the tracks' long-term participants LogMap. Another of the new systems - LYAM - also achieved good F-measure (0.869) which puts it on the sixth place. From the other two systems LPHOM achieved slightly lower F-measure than the baseline (StringEquiv); Alin scored lower than the baseline.This year 9 out of 13 systems achieved an F-measure higher than the baseline which is based on (normalized) string equivalence (StringEquiv in the table). This is a slightly better result (percentage-wise) compared to the last year when 9 out of 15 and similar to 2014 when 7 out of 10 systems produced alignments with F-measure higher than the baseline. Two of the new participants in the track and the two DKP-AOM systems achieved an F-measure lower than the baseline. LPHOM scored under the StringEquiv baseline but at the same time it is the system that produced the highest number of correspondences. Its precision is significantly lower than the other three systems which scored under the baseline and generated only trivial correspondences.
This year seven systems produced coherent alignments which is comparable to the last two years when 5 out of 10 and 7 out of 15 systems achieved this. From the first five best systems only FCA_Map produced an incoherent alignment.
The number of participating systems varies between the years and this year is lower than 2015 and 2013 but higher in comparison to 2014. As noted earlier there are newly-joined systems as well as long-term participants.
The systems that participated in the previous edition in 2015 scored similarly to their previous results. Two of the newly-joined systems (FCA_Map and LYAM) scored 4th and 6th best score with respect to the F-measure this year.
We would like to thank Christian Meilicke for his advices and support with the organization of this track.
This track is organized by Zlatan Dragisic, Huanyu Li, Valentina Ivanova and Patrick Lambrix. If you have any problems working with the ontologies, any questions related to tool wrapping, or any suggestions related to the anatomy track, feel free to write an email to oaei-anatomy [at] ida [.] liu [.] se.