Ontology Alignment Evaluation Initiative - OAEI-2012 Campaign

Large biomedical ontology results

We have run the evaluation in a high performance server with 16 CPUs and allocating 15 Gb RAM. In total, 15 out of 23 participating systems/configurations have been able to cope with at least one of the tasks of the track matching problem. Optima and MEDLEY failed to complete the smallest task with a time out of 24 hours, while OMR, OntoK, ASE and WeSeE, threw an Exception during the matching process. CODI was evaluated in a different setting using only 7Gb and threw an exception related to insufficient memory when processing the smallest matching task. TOAST was not evaluated since it was only configured for the Anatomy track and it required a complex installation. LogMapLt, a string matcher that exploits the creation of an inverted file to efficiently compute correspondences, has been used as baseline.

Together with Precision, Recall, F-measure and runtimes we have also evaluated the coherence of alignments. We have reported (1) number of unsatisfiabilities when reasoning (using HermiT) with the input ontologies together with the computed mappings, (2) the ratio/degree of unsatisfiable classes with respect to the size of the merged ontology (based on the Unsatisfiability Measure proposed in [1]), and (3) an approximation of the root unsatisfiability. The root unsatisfiability aims at providing a more precise amount of errors, since many of the unsatisfiabilities may be derived (i.e., a subclass of an unsatisfiable class will also be reported as unsatisfiable). The provided approximation is based on LogMap's (incomplete) repair facility and shows the number of classes that this facility needed to repair in order to solve (most of) the unsatisfiabilities [2].

Precision, recall and F-measure have been computed with respect to the available UMLS based alignments. Systems have been ordered in terms of the average F-measure.

Note that GOMMA has also been evaluated with a configuration that exploits specialised background knowledge (GOMMA-bk). The background knowledge of GOMMA-bk involves the application of mapping composition techniques and the reuse of mappings from FMA-UMLS and NCI-UMLS. LogMap, MaasMatch and YAM++ also use different kinds of background knowledge. LogMap uses normalisations and spelling variants from the UMLS Lexicon. YAM++ and MaasMatch use the general purpose background knowledge provided by WordNet.

LogMap has also been evaluated with two configurations. LogMap's default algorithm computes an estimation of the overlapping between the input ontologies before the matching process, while LogMap-noe has this feature deactivated.

The error-free "Large BioMed 2012 silver standard" reference alignment computed by "harmonising" the output of the participating matching systems will be available soon. We will also perform a debugging of all mapping outputs using Alcomo [3] and LogMap's repair facility [2].

Results OAEI 2012 FMA-NCI matching problem

FMA-NCI small fragments

This year we obtained very high level participation and 11 systems/configurations obtained, on average, an F-measure over 0.80 for the matching problem involving the small fragments of FMA and NCI. GOMMA-bk obtained the best results in terms of both recall and F-measure while ServOMap provided the most precise alignments. LogMap and LogMap-noe provided the same results since the input ontologies are already small fragments of FMA and NCI and thus, the overlapping estimation performed by LogMap did not have any impact. In general, as expected, precision increases when comparing against the original UMLS mapping set, while recall decreases.

Our baseline provided very good results in terms of F-measure and outperformed 8 of the participating systems. MaasMatch and Hertuda provided competitive results in terms of recall, but the low precision damaged the final F-measure. MapSSS and AUTOMSv2 provided a set of mappings with high precision, however, the F-measure was damaged due to the low recall of their mappings.

The runtimes were very positive in general and 8 systems completed the task in less than 2 minutes. MapSSS required less than 10 minutes, while Hertuda and HotMatch required around 1 hour. Finally, MaasMatch, AUTOMSv2 and Wmatch needed 8, 17 and 18 hours to complete the task, respectively.

Regarding mapping coherence, only LogMap (with its two variants) generates an almost clean output. In the table, we can appreciate that even the most precise mappings (ServOMap or YAM++) lead to a huge amount of unsatisfiable classes when reasoning together with the input ontologies; and thus, it proves the importance of using techniques to assess the coherence of the generated alignments. Unfortunately, LogMap and CODI are the unique systems (participating in the OAEI 2012) that have shown to use such techniques.

System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Refined UMLS (Alcomo) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
GOMMA-Bk 26 2,843 0.961 0.903 0.931 0.932 0.914 0.923 0.914 0.922 0.918 0.936 0.913 0.924 6,204 60.92% 193
YAM++ 78 2,614 0.980 0.848 0.909 0.959 0.865 0.910 0.933 0.866 0.898 0.958 0.859 0.906 2,352 23.10% 92
LogMap/LogMap-noe 18 2,740 0.952 0.863 0.905 0.934 0.883 0.908 0.908 0.883 0.895 0.932 0.876 0.903 2 0,02% 0
GOMMA 26 2,626 0.973 0.845 0.904 0.945 0.856 0.898 0.928 0.865 0.896 0.949 0.855 0.900 2,130 20.92% 127
ServOMapL 20 2,468 0.988 0.806 0.888 0.964 0.821 0.887 0.936 0.819 0.873 0.962 0.815 0.883 5,778 56.74% 79
LogMapLt 8 2,483 0.969 0.796 0.874 0.942 0.807 0.869 0.924 0.814 0.866 0.945 0.806 0.870 2,104 20.66% 116
ServOMap 25 2,300 0.990 0.753 0.855 0.969 0.769 0.857 0.949 0.774 0.853 0.969 0.765 0.855 5,597 54.96% 50
HotMatch 4,271 2,280 0.971 0.732 0.835 0.951 0.748 0.838 0.947 0.766 0.847 0.957 0.749 0.840 285 2.78% 65
Wmatch 65,399 3,178 0.811 0.852 0.831 0.786 0.862 0.823 0.767 0.864 0.813 0.788 0.860 0.822 3,168 31.11% 482
AROMA 63 2,571 0.876 0.745 0.805 0.854 0.758 0.803 0.837 0.764 0.799 0.856 0.756 0.803 7,196 70.66% 421
Hertuda 3,327 4,309 0.598 0.852 0.703 0.578 0.860 0.691 0.564 0.862 0.682 0.580 0.858 0.692 2,675 26.27% 277
MaasMatch 27,157 3,696 0.622 0.765 0.686 0.606 0.778 0.681 0.597 0.788 0.679 0.608 0.777 0.682 9,598 94.25% 3,113
AUTOMSv2 62,407 1,809 0.821 0.491 0.615 0.802 0.501 0.617 0.709 0.507 0.618 0.804 0.500 0.616 5,346 52.49% 392
MapSSS 561 1,483 0.860 0.422 0.566 0.840 0.430 0.568 0.829 0.436 0.571 0.843 0.429 0.569 565 5.55% 94

FMA-NCI big fragments

AUTOMSv2, HotMatch, Hertuda, Wmatch and MaasMatch failed to complete the task involving the big fragments of FMA and NCI after more than 24 hours of execution. Runtimes were in line with the small matching task, apart from the ones for MapSSS and AROMA which suffered an important increase.

YAM++ provided the best results in terms of F-measure, whereas GOMMA-bk and ServOMap got the best recall and precision, respectively. F-measures have decreased considerably with respect to the small matching task. This is mostly due to the fact that this matching task involves more possible candidate mappings than the previous one. Nevertheless, seven systems outperformed our baseline and provided high quality mapping sets in terms of both precision and recall. Only, MapSSS and AROMA provided worse results in terms of both precision and recall than LogMapLt.

Regarding mapping coherence, as in the previous task, only LogMap (with its two variants) generates an almost clean output where the mappings together with the input ontologies only lead to 5 unsatisfiable classes.

System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Refined UMLS (Alcomo) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
YAM++ 245 2,688 0.923 0.821 0.869 0.904 0.838 0.870 0.878 0.838 0.857 0.902 0.832 0.866 22,402 35.49% 102
ServOMapL 95 2,640 0.914 0.798 0.852 0.892 0.812 0.850 0.866 0.811 0.838 0.891 0.807 0.847 22,315 35.41% 143
GOMMA 69 2,810 0.876 0.814 0.844 0.856 0.830 0.843 0.840 0.837 0.838 0.857 0.827 0.842 2,398 4.40% 116
GOMMA_Bk 83 3,116 0.832 0.857 0.844 0.814 0.875 0.843 0.796 0.880 0.836 0.814 0.871 0.841 4,609 8.46% 146
LogMap-noe 74 2,663 0.888 0.782 0.832 0.881 0.809 0.843 0.848 0.801 0.824 0.872 0.798 0.833 5 0.01% 0
LogMap 77 2,656 0.887 0.779 0.829 0.877 0.803 0.838 0.846 0.797 0.821 0.870 0.793 0.830 5 0.01% 0
ServOMap 98 2,413 0.933 0.744 0.828 0.913 0.760 0.829 0.894 0.766 0.825 0.913 0.757 0.828 21,688 34.03% 86
LogMapLt 29 3,219 0.748 0.796 0.771 0.726 0.807 0.764 0.713 0.814 0.760 0.729 0.806 0.766 12,682 23.29% 443
AROMA 7,538 3,856 0.541 0.689 0.606 0.526 0.700 0.601 0.514 0.703 0.594 0.527 0.698 0.600 20,054 24.07% 1600
MapSSS 30,575 2,584 0.392 0.335 0.362 0.384 0.342 0.362 0.377 0.345 0.360 0.384 0.341 0.361 21,893 40.21% 358
HotMatch - - - - - - - - - - - - - - - - -
Wmatch - - - - - - - - - - - - - - - - -
Hertuda - - - - - - - - - - - - - - - - -
MaasMatch - - - - - - - - - - - - - - - - -
AUTOMSv2 - - - - - - - - - - - - - - - - -

FMA-NCI whole ontologies

AROMA and MapSSS failed to complete the matching task involving the whole FMA and NCI ontologies in less than 24 hours.

As in the previous task, the remaining 7 matching systems generated high quality mapping sets. YAM++ provided the best results in terms of F-measure, whereas GOMMA-bk and ServOMap got the best recall and precision, respectively. LogMap with its two configurations provided an almost clean output and only 9 classes where unsatisfiable after reasoning with the input ontologies and the computed mappings.

Runtimes were also very positive. YAM++ was slightly slower than the other systems, which gave the outputs in less than 5 minutes, and required around 20 minutes to complete the task.

System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Refined UMLS (Alcomo) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
YAM++ 1,304 2,738 0.907 0.821 0.862 0.887 0.838 0.862 0.862 0.838 0.850 0.885 0.832 0.858 50,550 28.56% 141
GOMMA 217 2,843 0.865 0.813 0.839 0.846 0.830 0.837 0.829 0.836 0.833 0.847 0.826 0.836 5,574 3.83% 139
ServOMapL 251 2,700 0.891 0.796 0.841 0.869 0.810 0.839 0.844 0.808 0.826 0.868 0.805 0.835 50,334 28.48% 164
GOMMA_Bk 231 3,165 0.818 0.856 0.837 0.800 0.874 0.836 0.783 0.879 0.828 0.801 0.870 0.834 12,939 8.88% 245
LogMap-noe 206 2,646 0.882 0.771 0.823 0.875 0.799 0.835 0.842 0.790 0.815 0.866 0.787 0.825 9 0.01% 0
LogMap 131 2,652 0.875 0.768 0.818 0.868 0.795 0.830 0.836 0.786 0.810 0.860 0.783 0.819 9 0.01% 0
ServOMap 204 2,465 0.912 0.743 0.819 0.892 0.759 0.820 0.873 0.764 0.815 0.892 0.755 0.818 48,743 27.31% 114
LogMapLt 55 3,466 0.695 0.796 0.742 0.675 0.807 0.735 0.662 0.814 0.730 0.677 0.806 0.736 26,429 8.68% 778
AROMA - - - - - - - - - - - - - - - - -
MapSSS - - - - - - - - - - - - - - - - -
HotMatch - - - - - - - - - - - - - - - - -
Wmatch - - - - - - - - - - - - - - - - -
Hertuda - - - - - - - - - - - - - - - - -
MaasMatch - - - - - - - - - - - - - - - - -
AUTOMSv2 - - - - - - - - - - - - - - - - -

Results OAEI 2012 FMA-SNOMED matching problem

As it is depicted in the following tables, the FMA-SNOMED matching problem was harder than the FMA-NCI problem both in size and in complexity. Thus, matching systems required more time to complete the task and provided, in general, worse results in terms of F-measure. Furthermore, MaasMatch, Wmatch and AUTOMSv2, which were able to complete the small FMA-NCI task, failed to complete the small FMA-SNOMED task in less than 24 hours.

FMA-SNOMED small fragments

Six systems provided an on average an F-measure greater than 0.75. However, the other 6 systems that completed the task (including our baseline) failed to provide a recall higher than 0.4. GOMMA-bk provided the best results in terms of both recall and F-measure, while the baseline LogMapLt provided the best precision closely followed by ServOMapL. GOMMA-bk is a bit ahead with respect the other systems since managed to provide a mapping set with very high recall. The use of background knowledge was key in this matching task.

As in the FMA-NCI matching problem, precision tend to increase when comparing against the original UMLS mapping set, while recall decreases.

The runtimes were also very positive in general and 8 systems completed the task in less than 6 minutes. MapSSS required almost 1 hour, while Hertuda, HotMatch and AROMA needed 5, 9 and 14 hours to complete the task, respectively.

LogMap, unlike LogMap-noe, failed to detect and repair two unsatisfiable classes since they were outside the computed ontology fragments (overlapping). The rest of the systems, even when providing highly precise mappings like ServOMapL, generated mapping sets with a high incoherence degree.

System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Refined UMLS (Alcomo) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
GOMMA_Bk 148 8,598 0.958 0.914 0.935 0.860 0.912 0.885 0.862 0.912 0.886 0.893 0.913 0.903 13,685 58.06% 4,674
ServOMapL 39 6,346 0.985 0.694 0.814 0.884 0.691 0.776 0.892 0.696 0.782 0.920 0.694 0.791 10,584 44.91% 3,056
YAM++ 326 6,421 0.972 0.693 0.809 0.870 0.688 0.769 0.879 0.694 0.776 0.907 0.692 0.785 14,534 61.67% 3,150
LogMap-noe 63 6,363 0.964 0.681 0.799 0.877 0.688 0.771 0.889 0.696 0.781 0.910 0.688 0.784 0 0% 0
LogMap 65 6,164 0.965 0.660 0.784 0.876 0.666 0.756 0.889 0.674 0.767 0.910 0.667 0.769 2 0.01% 2
ServOMap 46 6,008 0.985 0.657 0.788 0.880 0.652 0.749 0.888 0.656 0.755 0.918 0.655 0.764 8,165 34.64% 2,721
GOMMA 54 3,667 0.926 0.377 0.536 0.834 0.377 0.520 0.865 0.390 0.538 0.875 0.381 0.531 2,058 8.73% 206
MapSSS 3,129 3,458 0.798 0.306 0.442 0.719 0.307 0.430 0.737 0.313 0.440 0.751 0.309 0.438 9,084 38.54% 389
AROMA 51,191 5,227 0.555 0.322 0.407 0.507 0.327 0.397 0.519 0.333 0.406 0.527 0.327 0.404 21,083 89.45% 2,296
HotMatch 31,718 2,139 0.875 0.208 0.336 0.812 0.214 0.339 0.842 0.222 0.351 0.843 0.214 0.342 907 3.85% 104
LogMapLt 14 1,645 0.975 0.178 0.301 0.902 0.183 0.304 0.936 0.189 0.315 0.938 0.183 0.307 773 3.28% 21
Hertuda 17,625 3,051 0.578 0.196 0.292 0.533 0.201 0.292 0.555 0.208 0.303 0.555 0.201 0.296 1,020 4.33% 47

FMA-SNOMED big fragments

MapSSS, HotMatch and Hertuda failed to complete the task involving the big fragments of FMA and SNOMED after more than 24 hours of execution.

ServOMapL provided the best results in terms of F-measure and precision, whereas GOMMA-bk got the best recall. As in the FMA-NCI matching task involving big fragments, the F-measures suffered, in general, a decrease with respect to the small matching task. The most important variations were suffered by GOMMA-bk and GOMMA where their average precision decreased from 0.893 and 0.875 to 0.571 and 0.389, respectively. This is an interesting fact, since the background knowledge used by GOMMA-bk could not avoid the decrease in precision while keeping a high recall. Furthermore, runtimes were from 4 to 10 times higher for all the systems, with the exception of AROMA's runtime that increased from 14 to 17 hours.

LogMap (with its two variants) generated a clean output where the mappings together with the input ontologies did not lead to any unsatisfiable class.

System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Refined UMLS (Alcomo) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
ServOMapL 234 6,563 0.945 0.689 0.797 0.847 0.686 0.758 0.857 0.692 0.766 0.883 0.689 0.774 55,970 32.36% 1,192
ServOMap 315 6,272 0.941 0.655 0.773 0.841 0.650 0.734 0.849 0.655 0.740 0.877 0.654 0.749 143,316 82.85% 1,320
YAM++ 3,780 7,003 0.879 0.684 0.769 0.787 0.679 0.729 0.797 0.686 0.737 0.821 0.683 0.746 69,345 40.09% 1,360
LogMap-noe 521 6,450 0.886 0.635 0.740 0.805 0.640 0.713 0.821 0.651 0.726 0.837 0.642 0.727 0 0% 0
LogMap 484 6,292 0.883 0.617 0.726 0.800 0.621 0.699 0.815 0.631 0.711 0.833 0.623 0.712 0 0% 0
GOMMA_Bk 636 12,614 0.613 0.858 0.715 0.548 0.852 0.667 0.551 0.855 0.670 0.571 0.855 0.684 75,910 43.88% 3,344
GOMMA 437 5,591 0.412 0.256 0.316 0.370 0.255 0.302 0.386 0.265 0.314 0.389 0.259 0.311 7,343 4.25% 480
AROMA 62,801 2,497 0.684 0.190 0.297 0.638 0.197 0.300 0.660 0.203 0.310 0.661 0.196 0.303 54,459 31.48% 271
LogMapLt 96 1,819 0.882 0.178 0.296 0.816 0.183 0.299 0.846 0.189 0.309 0.848 0.183 0.302 2,994 1.73% 24
MapSSS - - - - - - - - - - - - - - - - -
HotMatch - - - - - - - - - - - - - - - - -
Hertuda - - - - - - - - - - - - - - - - -

FMA-SNOMED whole ontologies

AROMA failed to complete the matching task involving the whole FMA and SNOMED ontologies in less than 24 hours.

The results in terms of both precision and recall did not suffer important changes and, as in the previous task, ServOMapL provided the best results in terms of F-measure and precision while GOMMA-bk got the best recall.

Runtimes for ServOMap, ServOMapL LogMapLt and LogMap (with its two variations) were in line with the previous matching task; the computation times for GOMMA, GOMMA-bk and YAM++, however, suffered and important increase. GOMMA (with its two variations) required more than 30 minutes, while YAM++ required more than 6 hours.

LogMap and LogMap-noe mappings, as in previous tasks, had a very low incoherence degree.

System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Refined UMLS (Alcomo) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
ServOMapL 517 6,605 0.939 0.688 0.794 0.842 0.686 0.756 0.851 0.691 0.763 0.877 0.688 0.772 99,726 25.86% 2,862
ServOMap 532 6,320 0.933 0.655 0.770 0.835 0.650 0.731 0.842 0.655 0.737 0.870 0.653 0.746 273,242 70.87% 2,617
YAM++ 23,900 7,044 0.872 0.682 0.765 0.780 0.678 0.725 0.791 0.685 0.734 0.814 0.681 0.742 106,107 27.52% 3,393
LogMap 612 6,312 0.877 0.615 0.723 0.795 0.619 0.696 0.811 0.629 0.708 0.828 0.621 0.710 10 0.003% 0
LogMap-noe 791 6,406 0.866 0.616 0.720 0.782 0.617 0.690 0.801 0.631 0.706 0.816 0.621 0.706 10 0.003% 0
GOMMA_Bk 1,893 12,829 0.602 0.858 0.708 0.538 0.852 0.660 0.542 0.855 0.663 0.561 0.855 0.677 119,657 31.03% 5,289
LogMapLt 171 1,823 0.880 0.178 0.296 0.814 0.183 0.299 0.844 0.189 0.309 0.846 0.183 0.301 4,938 1.28% 37
GOMMA 1,994 5,823 0.370 0.239 0.291 0.332 0.239 0.278 0.347 0.248 0.289 0.350 0.242 0.286 10,752 2.79% 609
AROMA - - - - - - - - - - - - - - - - -
MapSSS - - - - - - - - - - - - - - - - -
HotMatch - - - - - - - - - - - - - - - - -
Hertuda - - - - - - - - - - - - - - - - -

Results OAEI 2012 SNOMED-NCI matching problem

The matching outputs in the SNOMED-NCI matching problem have only been compared against the original UMLS mapping and the refined subset computed by LogMap's repair facility. We could not compute a refined UMLS alignment set with Alcomo debugging system since, at the time of creating the datasets, it could not cope with the integration of SNOMED and NCI via mappings. The new version of Alcomo, however, has shown to be able to provide such refined set.

The satisfiability results, since currently no OWL 2 reasoner has shown to cope with the integration of SNOMED and NCI via mappings [url], have been estimated using the Dowling-Gallier algorithm [url] for propositional Horn satisfiability (implemented in LogMap's repair facility).

SNOMED-NCI small fragments

The SNOMED-NCI matching problem moves to a next level of difficulty with respect to the FMA-SNOMED matching problem and, in general, runtimes and results are slightly worse. Furthermore, Hertuda and HotMatch, which were able to complete the small FMA-NCI and the small FMA-SNOMED tasks, failed to complete the small SNOMED-NCI task in less than 24 hours.

Six systems provided an F-measure higher than our baseline LogMapLt and their F-measures were very close to each other. On the other hand, GOMMA, MapSSS and AROMA failed to top LogMapLt results. LogMap-noe provided the best results in terms of recall and F-measure while ServOMap generated the most precise mappings.

As in the FMA-NCI and FMA-SNOMED matching problems, precision tend to increase when comparing against the original UMLS mapping set, while recall decreases.

The runtimes were also positive in general and 7 systems completed the task in less than 4 minutes. YAM++ required more than 30 minutes, while AROMA and MapSSS needed 4 and 8 hours to complete the task, respectively.

LogMap (with its two variants) generated a set of output mappings that did not lead to any unsatisfiable class when reasoning (using Dowling-Gallier algorithm) together with the input ontologies. The rest of the systems generated mapping sets that lead to a degree of incoherence greater than 50%.

System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
LogMap-noe 211 13,525 0.897 0.644 0.750 0.893 0.659 0.758 0.895 0.652 0.754 0 0% 0
LogMap 221 13,454 0.899 0.642 0.749 0.895 0.657 0.758 0.897 0.649 0.753 0 0% 0
GOMMA_Bk 226 12,294 0.946 0.617 0.747 0.931 0.625 0.748 0.939 0.621 0.747 48,681 64.83% 863
YAM++ 1,901 11,961 0.951 0.604 0.739 0.940 0.614 0.743 0.946 0.609 0.741 50,089 66.71% 471
ServOMapL 147 11,730 0.960 0.598 0.737 0.947 0.606 0.739 0.954 0.602 0.738 62,367 83.06% 657
ServOMap 153 10,829 0.972 0.558 0.709 0.959 0.567 0.713 0.965 0.563 0.711 51,020 67.95% 467
LogMapLt 54 10,947 0.953 0.554 0.700 0.938 0.560 0.701 0.945 0.557 0.701 61,269 81.60% 801
GOMMA 197 10,555 0.948 0.531 0.680 0.931 0.536 0.680 0.939 0.533 0.680 42,813 57.02% 851
AROMA 15,624 11,783 0.861 0.538 0.662 0.848 0.545 0.664 0.854 0.542 0.663 70,491 93.88% 1,286
MapSSS 27,381 9,608 0.795 0.405 0.537 0.783 0.411 0.539 0.789 0.408 0.538 46,083 61.37% 794

SNOMED-NCI big fragments

MapSSS and AROMA failed to complete the task involving the big fragments of FMA and SNOMED after more than 24 hours of execution.

There were not big differences, in general, in terms of F-measure with respect to the small SNOMED-NCI task. Only LogMap decreased their recall and lost its second position and GOMMA-bk generated less precise mappings and was relegated to the sixth position. As in previous task, LogMap-noe provided the best results in terms of recall and F-measure while ServOMap generated the most precise mappings.

Runtimes were between 2 and 3 orders of magnitude bigger than in the small task, but in the most of the cases the task was finished in less than 10 minutes.

Regarding mapping coherence, LogMap-noe provided a clean output while LogMap, since it computes an estimation of the overlapping (fragments) between the input ontologies, failed to detect and repair 3 unsatisfiable classes, which were outside the computed ontology fragments.

System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
LogMap-noe 575 13,184 0.882 0.617 0.726 0.877 0.631 0.734 0.879 0.624 0.730 0 0% 0
YAM++ 6,127 13,083 0.864 0.600 0.708 0.854 0.610 0.712 0.859 0.605 0.710 104,492 60.66% 618
ServOMapL 363 12,784 0.870 0.590 0.703 0.858 0.599 0.705 0.864 0.594 0.704 136,909 79.48% 1,101
LogMap 514 12,142 0.877 0.565 0.687 0.872 0.578 0.695 0.874 0.571 0.691 3 0.002% 2
ServOMap 282 11,632 0.896 0.553 0.684 0.885 0.562 0.687 0.891 0.558 0.686 110,253 64.00% 820
GOMMA_Bk 638 15,644 0.730 0.606 0.662 0.718 0.613 0.662 0.724 0.610 0.662 116,451 67.60% 2,741
LogMapLt 104 12,741 0.819 0.553 0.660 0.805 0.560 0.661 0.812 0.557 0.661 131,073 76.09% 2,201
GOMMA 527 12,320 0.802 0.524 0.634 0.787 0.529 0.633 0.795 0.527 0.634 96,945 56.28% 1,621
AROMA - - - - - - - - - - - - - -
MapSSS - - - - - - - - - - - - - -

SNOMED-NCI whole ontologies

The precision and recall slightly decreased in all systems and none of them could reach an F-measure of 0.7. YAM++ produced the best mapping set in terms of F-measure, while ServOMap and GOMMA-bk generated the mappings with best precision and recall, respectively. LogMap-noe lost its first position since it provided less comprehensive mappings.

ServOMap, ServOMapL and LogMapwere the fastest tools and required 11, 12 and 16 minutes respectively. GOMMA (with its two variations) required more than 30 minutes, while YAM++ required more than 8 hours.

As in previous task, LogMap-noe provided a clean output while LogMap failed to detect and repair a few unsatisfiable classes due to the computation of the overlapping between the input ontologies.

System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
YAM++ 30,155 14,103 0.794 0.594 0.680 0.785 0.604 0.683 0.790 0.599 0.681 238,593 63.91% 979
ServOMapL 738 13,964 0.796 0.590 0.678 0.785 0.598 0.679 0.791 0.594 0.678 286,790 76.82% 1,557
LogMap 955 13,011 0.816 0.564 0.667 0.812 0.577 0.674 0.814 0.570 0.671 16 0.004% 10
LogMap-noe 1,505 13,058 0.813 0.563 0.666 0.809 0.577 0.673 0.811 0.570 0.670 0 0% 0
ServOMap 654 12,462 0.835 0.552 0.664 0.824 0.560 0.667 0.829 0.556 0.666 230,055 61.63% 1,546
GOMMA_Bk 1,940 17,045 0.669 0.605 0.635 0.658 0.612 0.634 0.663 0.608 0.635 239,708 64.21% 4,297
LogMapLt 178 14,043 0.743 0.553 0.634 0.731 0.560 0.634 0.737 0.557 0.634 305,648 81.87% 3,160
GOMMA 1,820 13,693 0.720 0.523 0.606 0.707 0.528 0.605 0.714 0.526 0.606 215,959 57.85% 2,614
AROMA - - - - - - - - - - - - - -
MapSSS - - - - - - - - - - - - - -

Result summaryk (Top 8 systems)

The following table summarises the results for the 8 systems that completed all 9 tasks in the Large BioMed Track. The table shows the average precision, recall, F-measure and incoherence degree; and the total time to complete the tasks.

The systems have been ordered according to the average F-measure. YAM++ obtained the best average F-measure, GOMMA-Bk the best recall and ServOMap computed the most precise mappings. The first 6 systems obtained very close results in terms of F-measure and there were only a gap of 0.024 between the first (YAM++) and the sixth (ServOMap).

Regarding mapping incoherence, LogMap and LogMap-noe were the unique systems providing mapping sets leading to a small number of unsatisfiable classes.

Finally, LogMapLt, since it implements basic and efficient string similarity techniques, was the fastest system. The rest of the tools, apart from YAM++, were also very fast and only needed between 38 and 97 minutes to complete the tasks. YAM++ was the counterexample and required almost 19 hours to complete the nine tasks.

System Total Time (s) Average
Precision  Recall  F-measure Incoherence
YAM++ 67,817 0.876 0.710 0.782 45.30%
ServOMapL 2,405 0.890 0.699 0.780 51.46%
LogMap-noe 3,964 0.869 0.695 0.770 0.004%
GOMMA_Bk 5,821 0.767 0.791 0.768 45.32%
LogMap 3,077 0.869 0.684 0.762 0.006%
ServOMap 2,310 0.903 0.657 0.758 55.36%
GOMMA 5,341 0.746 0.553 0.625 24.01%
LogMapLt 711 0.831 0.515 0.586 33.17%

Harmonization of the mapping outputs

Mapping repair evaluation

References

[1] Christian Meilicke and Heiner Stuckenschmidt. Incoherence as a basis for measuring the quality of ontology mappings. In Proc. of 3rd International Workshop on Ontology Matching (OM), 2008. [url]

[2] Ernesto Jimenez-Ruiz and Bernardo Cuenca Grau. LogMap: Logic-based and scalable ontology matching. In Proc. of 10th International Semantic Web Conference (ISWC), 2011. [url]

[3] Christian Meilicke. Alignment Incoherence in Ontology Matching. University of Mannheim, Chair of Artificial Intelligence (2011) [url]

Contact

This track is organised by Ernesto Jimenez Ruiz, Bernardo Cuenca Grau and Ian Horrocks, and supported by the SEALS and LogMap projects. If you have any question/suggestion related to the results of this track, feel free to write an email to ernesto [at] cs [.] ox [.] ac [.] uk or ernesto [.] jimenez [.] ruiz [at] gmail [.] com

Original page: http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2012/results2012.html [cached: 24/06/2014]