We have collected all generated alignments and make them available in a zip-file via the following link. These alignments are the raw results that the following report is based on.
>>> download raw results
We conducted experiments by executing each system in its standard setting and we compare precision, recall, F-measure and recall+. The measure recall+ indicates the amount of detected non-trivial correspondences. The matched entities in a non-trivial correspondence do not have the same normalized label. The approach that generates only trivial correspondences is depicted as baseline StringEquiv in the following section.
We run the systems on a server with 3.46 GHz (6 cores) and 8GB RAM allocated to each matching system. Further, we used the SEALS
client to execute our evaluation. However, we slightly changed the way how precision and recall are computed, i.e., the results generated by the SEALS client vary in some cases by 0.5% compared to the results presented below. In particular, we removed trivial correspondences in the oboInOwlnamespace like
http://...oboInOwl#Synonym = http://...oboInOwl#Synonym
as well as correspondences expressing relations different from equivalence. Using the Pellet reasoner we also checked whether the generated alignment is coherent, i.e., there are no unsatisfiable concepts when the ontologies are merged with the alignment.
In the following, we analyze all participating systems that could generate an alignment. The listing comprises of 15 entries. LogMap participated with different versions, namely LogMap, LogMap-Bio, LogMap-C and a lightweight version LogMapLite that uses only some core components. Similarly, DKP-AOM is also participating with two versions, DKP-AOM and DKP-AOM-lite, DKP-AOM performs coherence analysis. There are a number of systems which participate in the anatomy track for the first time. These are COMMAND, DKP-AOM, DKP-AOM-lite, GMap and JarvisOM. On the other hand, the previous time AML, LogMap (all versions), RSDLWB and XMap participated in the anatomy track was last year while Lily and CroMatcher participated in 2011 and 2013 respectively. However, CroMatcher did not produce an alignment within the given timeframe in 2013. For more details, we refer the reader to the papers presenting the systems. Thus, this year we have 11 different systems (not counting different versions) which generated an alignment.
This year there were three systems (COMMAND, GMap and Mamba) which run out of memory and could not finish execution with the allocated amount of memory. Therefore, they were run on a different configuration with allocated 14 GB of RAM (Mamba additionally had database connection problems.). Therefore, the execution times for COMMAND and GMap (marked with * and ** in the table) are not fully comparable to the other systems. Same as last year, we have 6 systems which finished their execution in less than 100 seconds. The top systems in terms of runtimes are LogMap, RDSLWB and AML. Depending on the specific version of the systems, they require between 20 and 40 seconds to match the ontologies. The table shows that there is no correlation between quality of the generated alignment in terms of precision and recall and required runtime. This result has also been observed in previous OAEI campaigns.
The table also shows the results for precision, recall and F-measure. In terms of F-measure, the top ranked systems are AML, XMap, LogMap-Bio and LogMap. The results of these four systems are at least as good as the results of the best systems in OAEI 2007-2010. AML, LogMap and LogMap-Bio produce very similar alignments compared to the last years. For example, AML's and LogMap's alignment contained only one correspondence less than the last year. Out of the systems which participated in the previous years, only Lily showed improvement. Lily's precision was improved from 0.814 to 0.870, recall from 0.734 to 0.793 and the F-measure from 0.772 to 0.830. This is also the first time that CroMatcher successfully produced an alignment given the set timeframe and its result is 6th best with respect to the F-measure.This year we had 9 out of 15 systems which achieved an F-measure that is higher than the baseline which is based on (normalized) string equivalence (StringEquiv in the table). This is a slightly worse result (percentage-wise) compared to the previous years when 7 out of 10 (2014) and 13 out of 17 systems (2012) produced alignments with F-measure higher than the baseline. The list of systems which achieved an F-measure lower than the baseline is comprised mostly of newly competing systems. The only exception is RSDLWB which competed last year when it also achieved a lower-than-baseline result.
Moreover, nearly all systems find many non-trivial correspondences. Exception are systems RSDLWB and DKP-AOM that generate only trivial correspondences.
This year seven systems produced coherent alignments which is comparable to the last year when 5 out of 10 systems achieved this.
This year we have again experienced an increase in the number of competing systems. The list of competing systems is comprised of both systems which participated in the previous years and new systems.
The evaluation of the systems has shown that most of the systems which participated in the previous years did not improve their results and in most cases they achieved slightly worse results. The only exception is system Lily which showed some improvement compared to the previous time it competed. Out of the newly participating systems, GMap displayed the best performance and achieved the 5th best result with respect to the F-measure this year.
We would like to thank Christian Meilicke for his advices and support with the organization of this track.
This track is organized by Zlatan Dragisic, Valentina Ivanova and Patrick Lambrix. If you have any problems working with the ontologies, any questions related to tool wrapping, or any suggestions related to the anatomy track, feel free to write an email to oaei-anatomy [at] ida [.] liu [.] se.