We have run the evaluation in a high performance server with 16 CPUs and allocating 15 Gb RAM. In total, 15 out of 23 participating systems/configurations have been able to cope with at least one of the tasks of the track matching problem. Optima and MEDLEY failed to complete the smallest task with a time out of 24 hours, while OMR, OntoK, ASE and WeSeE, threw an Exception during the matching process. CODI was evaluated in a different setting using only 7Gb and threw an exception related to insufficient memory when processing the smallest matching task. TOAST was not evaluated since it was only configured for the Anatomy track and it required a complex installation. LogMapLt, a string matcher that exploits the creation of an inverted file to efficiently compute correspondences, has been used as baseline.
Together with Precision, Recall, F-measure and runtimes we have also evaluated the coherence of alignments. We have reported (1) number of unsatisfiabilities when reasoning (using HermiT) with the input ontologies together with the computed mappings, (2) the ratio/degree of unsatisfiable classes with respect to the size of the merged ontology (based on the Unsatisfiability Measure proposed in [1]), and (3) an approximation of the root unsatisfiability. The root unsatisfiability aims at providing a more precise amount of errors, since many of the unsatisfiabilities may be derived (i.e., a subclass of an unsatisfiable class will also be reported as unsatisfiable). The provided approximation is based on LogMap's (incomplete) repair facility and shows the number of classes that this facility needed to repair in order to solve (most of) the unsatisfiabilities [2].
Precision, recall and F-measure have been computed with respect to the available UMLS based alignments. Systems have been ordered in terms of the average F-measure.
Note that GOMMA has also been evaluated with a configuration that exploits specialised background knowledge (GOMMA-bk). The background knowledge of GOMMA-bk involves the application of mapping composition techniques and the reuse of mappings from FMA-UMLS and NCI-UMLS. LogMap, MaasMatch and YAM++ also use different kinds of background knowledge. LogMap uses normalisations and spelling variants from the UMLS Lexicon. YAM++ and MaasMatch use the general purpose background knowledge provided by WordNet.
LogMap has also been evaluated with two configurations. LogMap's default algorithm computes an estimation of the overlapping between the input ontologies before the matching process, while LogMap-noe has this feature deactivated.
The error-free "Large BioMed 2012 silver standard" reference alignment computed by "harmonising" the output of the participating matching systems will be available soon. We will also perform a debugging of all mapping outputs using Alcomo [3] and LogMap's repair facility [2].
This year we obtained very high level participation and 11 systems/configurations obtained, on average, an F-measure over 0.80 for the matching problem involving the small fragments of FMA and NCI. GOMMA-bk obtained the best results in terms of both recall and F-measure while ServOMap provided the most precise alignments. LogMap and LogMap-noe provided the same results since the input ontologies are already small fragments of FMA and NCI and thus, the overlapping estimation performed by LogMap did not have any impact. In general, as expected, precision increases when comparing against the original UMLS mapping set, while recall decreases.
Our baseline provided very good results in terms of F-measure and outperformed 8 of the participating systems. MaasMatch and Hertuda provided competitive results in terms of recall, but the low precision damaged the final F-measure. MapSSS and AUTOMSv2 provided a set of mappings with high precision, however, the F-measure was damaged due to the low recall of their mappings.
The runtimes were very positive in general and 8 systems completed the task in less than 2 minutes. MapSSS required less than 10 minutes, while Hertuda and HotMatch required around 1 hour. Finally, MaasMatch, AUTOMSv2 and Wmatch needed 8, 17 and 18 hours to complete the task, respectively.
Regarding mapping coherence, only LogMap (with its two variants) generates an almost clean output. In the table, we can appreciate that even the most precise mappings (ServOMap or YAM++) lead to a huge amount of unsatisfiable classes when reasoning together with the input ontologies; and thus, it proves the importance of using techniques to assess the coherence of the generated alignments. Unfortunately, LogMap and CODI are the unique systems (participating in the OAEI 2012) that have shown to use such techniques.
System | Time (s) | # Mappings | Original UMLS | Refined UMLS (LogMap) | Refined UMLS (Alcomo) | Average | Incoherence Analysis | ||||||||||
Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | All Unsat. | Degree | Root Unsat. | |||
GOMMA-Bk | 26 | 2,843 | 0.961 | 0.903 | 0.931 | 0.932 | 0.914 | 0.923 | 0.914 | 0.922 | 0.918 | 0.936 | 0.913 | 0.924 | 6,204 | 60.92% | 193 |
YAM++ | 78 | 2,614 | 0.980 | 0.848 | 0.909 | 0.959 | 0.865 | 0.910 | 0.933 | 0.866 | 0.898 | 0.958 | 0.859 | 0.906 | 2,352 | 23.10% | 92 |
LogMap/LogMap-noe | 18 | 2,740 | 0.952 | 0.863 | 0.905 | 0.934 | 0.883 | 0.908 | 0.908 | 0.883 | 0.895 | 0.932 | 0.876 | 0.903 | 2 | 0,02% | 0 |
GOMMA | 26 | 2,626 | 0.973 | 0.845 | 0.904 | 0.945 | 0.856 | 0.898 | 0.928 | 0.865 | 0.896 | 0.949 | 0.855 | 0.900 | 2,130 | 20.92% | 127 |
ServOMapL | 20 | 2,468 | 0.988 | 0.806 | 0.888 | 0.964 | 0.821 | 0.887 | 0.936 | 0.819 | 0.873 | 0.962 | 0.815 | 0.883 | 5,778 | 56.74% | 79 |
LogMapLt | 8 | 2,483 | 0.969 | 0.796 | 0.874 | 0.942 | 0.807 | 0.869 | 0.924 | 0.814 | 0.866 | 0.945 | 0.806 | 0.870 | 2,104 | 20.66% | 116 |
ServOMap | 25 | 2,300 | 0.990 | 0.753 | 0.855 | 0.969 | 0.769 | 0.857 | 0.949 | 0.774 | 0.853 | 0.969 | 0.765 | 0.855 | 5,597 | 54.96% | 50 |
HotMatch | 4,271 | 2,280 | 0.971 | 0.732 | 0.835 | 0.951 | 0.748 | 0.838 | 0.947 | 0.766 | 0.847 | 0.957 | 0.749 | 0.840 | 285 | 2.78% | 65 |
Wmatch | 65,399 | 3,178 | 0.811 | 0.852 | 0.831 | 0.786 | 0.862 | 0.823 | 0.767 | 0.864 | 0.813 | 0.788 | 0.860 | 0.822 | 3,168 | 31.11% | 482 |
AROMA | 63 | 2,571 | 0.876 | 0.745 | 0.805 | 0.854 | 0.758 | 0.803 | 0.837 | 0.764 | 0.799 | 0.856 | 0.756 | 0.803 | 7,196 | 70.66% | 421 |
Hertuda | 3,327 | 4,309 | 0.598 | 0.852 | 0.703 | 0.578 | 0.860 | 0.691 | 0.564 | 0.862 | 0.682 | 0.580 | 0.858 | 0.692 | 2,675 | 26.27% | 277 |
MaasMatch | 27,157 | 3,696 | 0.622 | 0.765 | 0.686 | 0.606 | 0.778 | 0.681 | 0.597 | 0.788 | 0.679 | 0.608 | 0.777 | 0.682 | 9,598 | 94.25% | 3,113 |
AUTOMSv2 | 62,407 | 1,809 | 0.821 | 0.491 | 0.615 | 0.802 | 0.501 | 0.617 | 0.709 | 0.507 | 0.618 | 0.804 | 0.500 | 0.616 | 5,346 | 52.49% | 392 |
MapSSS | 561 | 1,483 | 0.860 | 0.422 | 0.566 | 0.840 | 0.430 | 0.568 | 0.829 | 0.436 | 0.571 | 0.843 | 0.429 | 0.569 | 565 | 5.55% | 94 |
AUTOMSv2, HotMatch, Hertuda, Wmatch and MaasMatch failed to complete the task involving the big fragments of FMA and NCI after more than 24 hours of execution. Runtimes were in line with the small matching task, apart from the ones for MapSSS and AROMA which suffered an important increase.
YAM++ provided the best results in terms of F-measure, whereas GOMMA-bk and ServOMap got the best recall and precision, respectively. F-measures have decreased considerably with respect to the small matching task. This is mostly due to the fact that this matching task involves more possible candidate mappings than the previous one. Nevertheless, seven systems outperformed our baseline and provided high quality mapping sets in terms of both precision and recall. Only, MapSSS and AROMA provided worse results in terms of both precision and recall than LogMapLt.
Regarding mapping coherence, as in the previous task, only LogMap (with its two variants) generates an almost clean output where the mappings together with the input ontologies only lead to 5 unsatisfiable classes.
System | Time (s) | # Mappings | Original UMLS | Refined UMLS (LogMap) | Refined UMLS (Alcomo) | Average | Incoherence Analysis | ||||||||||
Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | All Unsat. | Degree | Root Unsat. | |||
YAM++ | 245 | 2,688 | 0.923 | 0.821 | 0.869 | 0.904 | 0.838 | 0.870 | 0.878 | 0.838 | 0.857 | 0.902 | 0.832 | 0.866 | 22,402 | 35.49% | 102 |
ServOMapL | 95 | 2,640 | 0.914 | 0.798 | 0.852 | 0.892 | 0.812 | 0.850 | 0.866 | 0.811 | 0.838 | 0.891 | 0.807 | 0.847 | 22,315 | 35.41% | 143 |
GOMMA | 69 | 2,810 | 0.876 | 0.814 | 0.844 | 0.856 | 0.830 | 0.843 | 0.840 | 0.837 | 0.838 | 0.857 | 0.827 | 0.842 | 2,398 | 4.40% | 116 |
GOMMA_Bk | 83 | 3,116 | 0.832 | 0.857 | 0.844 | 0.814 | 0.875 | 0.843 | 0.796 | 0.880 | 0.836 | 0.814 | 0.871 | 0.841 | 4,609 | 8.46% | 146 |
LogMap-noe | 74 | 2,663 | 0.888 | 0.782 | 0.832 | 0.881 | 0.809 | 0.843 | 0.848 | 0.801 | 0.824 | 0.872 | 0.798 | 0.833 | 5 | 0.01% | 0 |
LogMap | 77 | 2,656 | 0.887 | 0.779 | 0.829 | 0.877 | 0.803 | 0.838 | 0.846 | 0.797 | 0.821 | 0.870 | 0.793 | 0.830 | 5 | 0.01% | 0 |
ServOMap | 98 | 2,413 | 0.933 | 0.744 | 0.828 | 0.913 | 0.760 | 0.829 | 0.894 | 0.766 | 0.825 | 0.913 | 0.757 | 0.828 | 21,688 | 34.03% | 86 |
LogMapLt | 29 | 3,219 | 0.748 | 0.796 | 0.771 | 0.726 | 0.807 | 0.764 | 0.713 | 0.814 | 0.760 | 0.729 | 0.806 | 0.766 | 12,682 | 23.29% | 443 |
AROMA | 7,538 | 3,856 | 0.541 | 0.689 | 0.606 | 0.526 | 0.700 | 0.601 | 0.514 | 0.703 | 0.594 | 0.527 | 0.698 | 0.600 | 20,054 | 24.07% | 1600 |
MapSSS | 30,575 | 2,584 | 0.392 | 0.335 | 0.362 | 0.384 | 0.342 | 0.362 | 0.377 | 0.345 | 0.360 | 0.384 | 0.341 | 0.361 | 21,893 | 40.21% | 358 |
HotMatch | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Wmatch | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Hertuda | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
MaasMatch | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
AUTOMSv2 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
AROMA and MapSSS failed to complete the matching task involving the whole FMA and NCI ontologies in less than 24 hours.
As in the previous task, the remaining 7 matching systems generated high quality mapping sets. YAM++ provided the best results in terms of F-measure, whereas GOMMA-bk and ServOMap got the best recall and precision, respectively. LogMap with its two configurations provided an almost clean output and only 9 classes where unsatisfiable after reasoning with the input ontologies and the computed mappings.
Runtimes were also very positive. YAM++ was slightly slower than the other systems, which gave the outputs in less than 5 minutes, and required around 20 minutes to complete the task.
System | Time (s) | # Mappings | Original UMLS | Refined UMLS (LogMap) | Refined UMLS (Alcomo) | Average | Incoherence Analysis | ||||||||||
Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | All Unsat. | Degree | Root Unsat. | |||
YAM++ | 1,304 | 2,738 | 0.907 | 0.821 | 0.862 | 0.887 | 0.838 | 0.862 | 0.862 | 0.838 | 0.850 | 0.885 | 0.832 | 0.858 | 50,550 | 28.56% | 141 |
GOMMA | 217 | 2,843 | 0.865 | 0.813 | 0.839 | 0.846 | 0.830 | 0.837 | 0.829 | 0.836 | 0.833 | 0.847 | 0.826 | 0.836 | 5,574 | 3.83% | 139 |
ServOMapL | 251 | 2,700 | 0.891 | 0.796 | 0.841 | 0.869 | 0.810 | 0.839 | 0.844 | 0.808 | 0.826 | 0.868 | 0.805 | 0.835 | 50,334 | 28.48% | 164 |
GOMMA_Bk | 231 | 3,165 | 0.818 | 0.856 | 0.837 | 0.800 | 0.874 | 0.836 | 0.783 | 0.879 | 0.828 | 0.801 | 0.870 | 0.834 | 12,939 | 8.88% | 245 |
LogMap-noe | 206 | 2,646 | 0.882 | 0.771 | 0.823 | 0.875 | 0.799 | 0.835 | 0.842 | 0.790 | 0.815 | 0.866 | 0.787 | 0.825 | 9 | 0.01% | 0 |
LogMap | 131 | 2,652 | 0.875 | 0.768 | 0.818 | 0.868 | 0.795 | 0.830 | 0.836 | 0.786 | 0.810 | 0.860 | 0.783 | 0.819 | 9 | 0.01% | 0 |
ServOMap | 204 | 2,465 | 0.912 | 0.743 | 0.819 | 0.892 | 0.759 | 0.820 | 0.873 | 0.764 | 0.815 | 0.892 | 0.755 | 0.818 | 48,743 | 27.31% | 114 |
LogMapLt | 55 | 3,466 | 0.695 | 0.796 | 0.742 | 0.675 | 0.807 | 0.735 | 0.662 | 0.814 | 0.730 | 0.677 | 0.806 | 0.736 | 26,429 | 8.68% | 778 |
AROMA | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
MapSSS | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
HotMatch | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Wmatch | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Hertuda | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
MaasMatch | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
AUTOMSv2 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
As it is depicted in the following tables, the FMA-SNOMED matching problem was harder than the FMA-NCI problem both in size and in complexity. Thus, matching systems required more time to complete the task and provided, in general, worse results in terms of F-measure. Furthermore, MaasMatch, Wmatch and AUTOMSv2, which were able to complete the small FMA-NCI task, failed to complete the small FMA-SNOMED task in less than 24 hours.
Six systems provided an on average an F-measure greater than 0.75. However, the other 6 systems that completed the task (including our baseline) failed to provide a recall higher than 0.4. GOMMA-bk provided the best results in terms of both recall and F-measure, while the baseline LogMapLt provided the best precision closely followed by ServOMapL. GOMMA-bk is a bit ahead with respect the other systems since managed to provide a mapping set with very high recall. The use of background knowledge was key in this matching task.
As in the FMA-NCI matching problem, precision tend to increase when comparing against the original UMLS mapping set, while recall decreases.
The runtimes were also very positive in general and 8 systems completed the task in less than 6 minutes. MapSSS required almost 1 hour, while Hertuda, HotMatch and AROMA needed 5, 9 and 14 hours to complete the task, respectively.
LogMap, unlike LogMap-noe, failed to detect and repair two unsatisfiable classes since they were outside the computed ontology fragments (overlapping). The rest of the systems, even when providing highly precise mappings like ServOMapL, generated mapping sets with a high incoherence degree.
System | Time (s) | # Mappings | Original UMLS | Refined UMLS (LogMap) | Refined UMLS (Alcomo) | Average | Incoherence Analysis | ||||||||||
Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | All Unsat. | Degree | Root Unsat. | |||
GOMMA_Bk | 148 | 8,598 | 0.958 | 0.914 | 0.935 | 0.860 | 0.912 | 0.885 | 0.862 | 0.912 | 0.886 | 0.893 | 0.913 | 0.903 | 13,685 | 58.06% | 4,674 |
ServOMapL | 39 | 6,346 | 0.985 | 0.694 | 0.814 | 0.884 | 0.691 | 0.776 | 0.892 | 0.696 | 0.782 | 0.920 | 0.694 | 0.791 | 10,584 | 44.91% | 3,056 |
YAM++ | 326 | 6,421 | 0.972 | 0.693 | 0.809 | 0.870 | 0.688 | 0.769 | 0.879 | 0.694 | 0.776 | 0.907 | 0.692 | 0.785 | 14,534 | 61.67% | 3,150 |
LogMap-noe | 63 | 6,363 | 0.964 | 0.681 | 0.799 | 0.877 | 0.688 | 0.771 | 0.889 | 0.696 | 0.781 | 0.910 | 0.688 | 0.784 | 0 | 0% | 0 |
LogMap | 65 | 6,164 | 0.965 | 0.660 | 0.784 | 0.876 | 0.666 | 0.756 | 0.889 | 0.674 | 0.767 | 0.910 | 0.667 | 0.769 | 2 | 0.01% | 2 |
ServOMap | 46 | 6,008 | 0.985 | 0.657 | 0.788 | 0.880 | 0.652 | 0.749 | 0.888 | 0.656 | 0.755 | 0.918 | 0.655 | 0.764 | 8,165 | 34.64% | 2,721 |
GOMMA | 54 | 3,667 | 0.926 | 0.377 | 0.536 | 0.834 | 0.377 | 0.520 | 0.865 | 0.390 | 0.538 | 0.875 | 0.381 | 0.531 | 2,058 | 8.73% | 206 |
MapSSS | 3,129 | 3,458 | 0.798 | 0.306 | 0.442 | 0.719 | 0.307 | 0.430 | 0.737 | 0.313 | 0.440 | 0.751 | 0.309 | 0.438 | 9,084 | 38.54% | 389 |
AROMA | 51,191 | 5,227 | 0.555 | 0.322 | 0.407 | 0.507 | 0.327 | 0.397 | 0.519 | 0.333 | 0.406 | 0.527 | 0.327 | 0.404 | 21,083 | 89.45% | 2,296 |
HotMatch | 31,718 | 2,139 | 0.875 | 0.208 | 0.336 | 0.812 | 0.214 | 0.339 | 0.842 | 0.222 | 0.351 | 0.843 | 0.214 | 0.342 | 907 | 3.85% | 104 |
LogMapLt | 14 | 1,645 | 0.975 | 0.178 | 0.301 | 0.902 | 0.183 | 0.304 | 0.936 | 0.189 | 0.315 | 0.938 | 0.183 | 0.307 | 773 | 3.28% | 21 |
Hertuda | 17,625 | 3,051 | 0.578 | 0.196 | 0.292 | 0.533 | 0.201 | 0.292 | 0.555 | 0.208 | 0.303 | 0.555 | 0.201 | 0.296 | 1,020 | 4.33% | 47 |
MapSSS, HotMatch and Hertuda failed to complete the task involving the big fragments of FMA and SNOMED after more than 24 hours of execution.
ServOMapL provided the best results in terms of F-measure and precision, whereas GOMMA-bk got the best recall. As in the FMA-NCI matching task involving big fragments, the F-measures suffered, in general, a decrease with respect to the small matching task. The most important variations were suffered by GOMMA-bk and GOMMA where their average precision decreased from 0.893 and 0.875 to 0.571 and 0.389, respectively. This is an interesting fact, since the background knowledge used by GOMMA-bk could not avoid the decrease in precision while keeping a high recall. Furthermore, runtimes were from 4 to 10 times higher for all the systems, with the exception of AROMA's runtime that increased from 14 to 17 hours.
LogMap (with its two variants) generated a clean output where the mappings together with the input ontologies did not lead to any unsatisfiable class.
System | Time (s) | # Mappings | Original UMLS | Refined UMLS (LogMap) | Refined UMLS (Alcomo) | Average | Incoherence Analysis | ||||||||||
Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | All Unsat. | Degree | Root Unsat. | |||
ServOMapL | 234 | 6,563 | 0.945 | 0.689 | 0.797 | 0.847 | 0.686 | 0.758 | 0.857 | 0.692 | 0.766 | 0.883 | 0.689 | 0.774 | 55,970 | 32.36% | 1,192 |
ServOMap | 315 | 6,272 | 0.941 | 0.655 | 0.773 | 0.841 | 0.650 | 0.734 | 0.849 | 0.655 | 0.740 | 0.877 | 0.654 | 0.749 | 143,316 | 82.85% | 1,320 |
YAM++ | 3,780 | 7,003 | 0.879 | 0.684 | 0.769 | 0.787 | 0.679 | 0.729 | 0.797 | 0.686 | 0.737 | 0.821 | 0.683 | 0.746 | 69,345 | 40.09% | 1,360 |
LogMap-noe | 521 | 6,450 | 0.886 | 0.635 | 0.740 | 0.805 | 0.640 | 0.713 | 0.821 | 0.651 | 0.726 | 0.837 | 0.642 | 0.727 | 0 | 0% | 0 |
LogMap | 484 | 6,292 | 0.883 | 0.617 | 0.726 | 0.800 | 0.621 | 0.699 | 0.815 | 0.631 | 0.711 | 0.833 | 0.623 | 0.712 | 0 | 0% | 0 |
GOMMA_Bk | 636 | 12,614 | 0.613 | 0.858 | 0.715 | 0.548 | 0.852 | 0.667 | 0.551 | 0.855 | 0.670 | 0.571 | 0.855 | 0.684 | 75,910 | 43.88% | 3,344 |
GOMMA | 437 | 5,591 | 0.412 | 0.256 | 0.316 | 0.370 | 0.255 | 0.302 | 0.386 | 0.265 | 0.314 | 0.389 | 0.259 | 0.311 | 7,343 | 4.25% | 480 |
AROMA | 62,801 | 2,497 | 0.684 | 0.190 | 0.297 | 0.638 | 0.197 | 0.300 | 0.660 | 0.203 | 0.310 | 0.661 | 0.196 | 0.303 | 54,459 | 31.48% | 271 |
LogMapLt | 96 | 1,819 | 0.882 | 0.178 | 0.296 | 0.816 | 0.183 | 0.299 | 0.846 | 0.189 | 0.309 | 0.848 | 0.183 | 0.302 | 2,994 | 1.73% | 24 |
MapSSS | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
HotMatch | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Hertuda | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
AROMA failed to complete the matching task involving the whole FMA and SNOMED ontologies in less than 24 hours.
The results in terms of both precision and recall did not suffer important changes and, as in the previous task, ServOMapL provided the best results in terms of F-measure and precision while GOMMA-bk got the best recall.
Runtimes for ServOMap, ServOMapL LogMapLt and LogMap (with its two variations) were in line with the previous matching task; the computation times for GOMMA, GOMMA-bk and YAM++, however, suffered and important increase. GOMMA (with its two variations) required more than 30 minutes, while YAM++ required more than 6 hours.
LogMap and LogMap-noe mappings, as in previous tasks, had a very low incoherence degree.
System | Time (s) | # Mappings | Original UMLS | Refined UMLS (LogMap) | Refined UMLS (Alcomo) | Average | Incoherence Analysis | ||||||||||
Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | All Unsat. | Degree | Root Unsat. | |||
ServOMapL | 517 | 6,605 | 0.939 | 0.688 | 0.794 | 0.842 | 0.686 | 0.756 | 0.851 | 0.691 | 0.763 | 0.877 | 0.688 | 0.772 | 99,726 | 25.86% | 2,862 |
ServOMap | 532 | 6,320 | 0.933 | 0.655 | 0.770 | 0.835 | 0.650 | 0.731 | 0.842 | 0.655 | 0.737 | 0.870 | 0.653 | 0.746 | 273,242 | 70.87% | 2,617 |
YAM++ | 23,900 | 7,044 | 0.872 | 0.682 | 0.765 | 0.780 | 0.678 | 0.725 | 0.791 | 0.685 | 0.734 | 0.814 | 0.681 | 0.742 | 106,107 | 27.52% | 3,393 |
LogMap | 612 | 6,312 | 0.877 | 0.615 | 0.723 | 0.795 | 0.619 | 0.696 | 0.811 | 0.629 | 0.708 | 0.828 | 0.621 | 0.710 | 10 | 0.003% | 0 |
LogMap-noe | 791 | 6,406 | 0.866 | 0.616 | 0.720 | 0.782 | 0.617 | 0.690 | 0.801 | 0.631 | 0.706 | 0.816 | 0.621 | 0.706 | 10 | 0.003% | 0 |
GOMMA_Bk | 1,893 | 12,829 | 0.602 | 0.858 | 0.708 | 0.538 | 0.852 | 0.660 | 0.542 | 0.855 | 0.663 | 0.561 | 0.855 | 0.677 | 119,657 | 31.03% | 5,289 |
LogMapLt | 171 | 1,823 | 0.880 | 0.178 | 0.296 | 0.814 | 0.183 | 0.299 | 0.844 | 0.189 | 0.309 | 0.846 | 0.183 | 0.301 | 4,938 | 1.28% | 37 |
GOMMA | 1,994 | 5,823 | 0.370 | 0.239 | 0.291 | 0.332 | 0.239 | 0.278 | 0.347 | 0.248 | 0.289 | 0.350 | 0.242 | 0.286 | 10,752 | 2.79% | 609 |
AROMA | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
MapSSS | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
HotMatch | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Hertuda | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
The matching outputs in the SNOMED-NCI matching problem have only been compared against the original UMLS mapping and the refined subset computed by LogMap's repair facility. We could not compute a refined UMLS alignment set with Alcomo debugging system since, at the time of creating the datasets, it could not cope with the integration of SNOMED and NCI via mappings. The new version of Alcomo, however, has shown to be able to provide such refined set.
The satisfiability results, since currently no OWL 2 reasoner has shown to cope with the integration of SNOMED and NCI via mappings [url], have been estimated using the Dowling-Gallier algorithm [url] for propositional Horn satisfiability (implemented in LogMap's repair facility).
The SNOMED-NCI matching problem moves to a next level of difficulty with respect to the FMA-SNOMED matching problem and, in general, runtimes and results are slightly worse. Furthermore, Hertuda and HotMatch, which were able to complete the small FMA-NCI and the small FMA-SNOMED tasks, failed to complete the small SNOMED-NCI task in less than 24 hours.
Six systems provided an F-measure higher than our baseline LogMapLt and their F-measures were very close to each other. On the other hand, GOMMA, MapSSS and AROMA failed to top LogMapLt results. LogMap-noe provided the best results in terms of recall and F-measure while ServOMap generated the most precise mappings.
As in the FMA-NCI and FMA-SNOMED matching problems, precision tend to increase when comparing against the original UMLS mapping set, while recall decreases.
The runtimes were also positive in general and 7 systems completed the task in less than 4 minutes. YAM++ required more than 30 minutes, while AROMA and MapSSS needed 4 and 8 hours to complete the task, respectively.
LogMap (with its two variants) generated a set of output mappings that did not lead to any unsatisfiable class when reasoning (using Dowling-Gallier algorithm) together with the input ontologies. The rest of the systems generated mapping sets that lead to a degree of incoherence greater than 50%.
System | Time (s) | # Mappings | Original UMLS | Refined UMLS (LogMap) | Average | Incoherence Analysis | ||||||||
Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | All Unsat. | Degree | Root Unsat. | |||
LogMap-noe | 211 | 13,525 | 0.897 | 0.644 | 0.750 | 0.893 | 0.659 | 0.758 | 0.895 | 0.652 | 0.754 | 0 | 0% | 0 |
LogMap | 221 | 13,454 | 0.899 | 0.642 | 0.749 | 0.895 | 0.657 | 0.758 | 0.897 | 0.649 | 0.753 | 0 | 0% | 0 |
GOMMA_Bk | 226 | 12,294 | 0.946 | 0.617 | 0.747 | 0.931 | 0.625 | 0.748 | 0.939 | 0.621 | 0.747 | 48,681 | 64.83% | 863 |
YAM++ | 1,901 | 11,961 | 0.951 | 0.604 | 0.739 | 0.940 | 0.614 | 0.743 | 0.946 | 0.609 | 0.741 | 50,089 | 66.71% | 471 |
ServOMapL | 147 | 11,730 | 0.960 | 0.598 | 0.737 | 0.947 | 0.606 | 0.739 | 0.954 | 0.602 | 0.738 | 62,367 | 83.06% | 657 |
ServOMap | 153 | 10,829 | 0.972 | 0.558 | 0.709 | 0.959 | 0.567 | 0.713 | 0.965 | 0.563 | 0.711 | 51,020 | 67.95% | 467 |
LogMapLt | 54 | 10,947 | 0.953 | 0.554 | 0.700 | 0.938 | 0.560 | 0.701 | 0.945 | 0.557 | 0.701 | 61,269 | 81.60% | 801 |
GOMMA | 197 | 10,555 | 0.948 | 0.531 | 0.680 | 0.931 | 0.536 | 0.680 | 0.939 | 0.533 | 0.680 | 42,813 | 57.02% | 851 |
AROMA | 15,624 | 11,783 | 0.861 | 0.538 | 0.662 | 0.848 | 0.545 | 0.664 | 0.854 | 0.542 | 0.663 | 70,491 | 93.88% | 1,286 |
MapSSS | 27,381 | 9,608 | 0.795 | 0.405 | 0.537 | 0.783 | 0.411 | 0.539 | 0.789 | 0.408 | 0.538 | 46,083 | 61.37% | 794 |
MapSSS and AROMA failed to complete the task involving the big fragments of FMA and SNOMED after more than 24 hours of execution.
There were not big differences, in general, in terms of F-measure with respect to the small SNOMED-NCI task. Only LogMap decreased their recall and lost its second position and GOMMA-bk generated less precise mappings and was relegated to the sixth position. As in previous task, LogMap-noe provided the best results in terms of recall and F-measure while ServOMap generated the most precise mappings.
Runtimes were between 2 and 3 orders of magnitude bigger than in the small task, but in the most of the cases the task was finished in less than 10 minutes.
Regarding mapping coherence, LogMap-noe provided a clean output while LogMap, since it computes an estimation of the overlapping (fragments) between the input ontologies, failed to detect and repair 3 unsatisfiable classes, which were outside the computed ontology fragments.
System | Time (s) | # Mappings | Original UMLS | Refined UMLS (LogMap) | Average | Incoherence Analysis | ||||||||
Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | All Unsat. | Degree | Root Unsat. | |||
LogMap-noe | 575 | 13,184 | 0.882 | 0.617 | 0.726 | 0.877 | 0.631 | 0.734 | 0.879 | 0.624 | 0.730 | 0 | 0% | 0 |
YAM++ | 6,127 | 13,083 | 0.864 | 0.600 | 0.708 | 0.854 | 0.610 | 0.712 | 0.859 | 0.605 | 0.710 | 104,492 | 60.66% | 618 |
ServOMapL | 363 | 12,784 | 0.870 | 0.590 | 0.703 | 0.858 | 0.599 | 0.705 | 0.864 | 0.594 | 0.704 | 136,909 | 79.48% | 1,101 |
LogMap | 514 | 12,142 | 0.877 | 0.565 | 0.687 | 0.872 | 0.578 | 0.695 | 0.874 | 0.571 | 0.691 | 3 | 0.002% | 2 |
ServOMap | 282 | 11,632 | 0.896 | 0.553 | 0.684 | 0.885 | 0.562 | 0.687 | 0.891 | 0.558 | 0.686 | 110,253 | 64.00% | 820 |
GOMMA_Bk | 638 | 15,644 | 0.730 | 0.606 | 0.662 | 0.718 | 0.613 | 0.662 | 0.724 | 0.610 | 0.662 | 116,451 | 67.60% | 2,741 |
LogMapLt | 104 | 12,741 | 0.819 | 0.553 | 0.660 | 0.805 | 0.560 | 0.661 | 0.812 | 0.557 | 0.661 | 131,073 | 76.09% | 2,201 |
GOMMA | 527 | 12,320 | 0.802 | 0.524 | 0.634 | 0.787 | 0.529 | 0.633 | 0.795 | 0.527 | 0.634 | 96,945 | 56.28% | 1,621 |
AROMA | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
MapSSS | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
The precision and recall slightly decreased in all systems and none of them could reach an F-measure of 0.7. YAM++ produced the best mapping set in terms of F-measure, while ServOMap and GOMMA-bk generated the mappings with best precision and recall, respectively. LogMap-noe lost its first position since it provided less comprehensive mappings.
ServOMap, ServOMapL and LogMapwere the fastest tools and required 11, 12 and 16 minutes respectively. GOMMA (with its two variations) required more than 30 minutes, while YAM++ required more than 8 hours.
As in previous task, LogMap-noe provided a clean output while LogMap failed to detect and repair a few unsatisfiable classes due to the computation of the overlapping between the input ontologies.
System | Time (s) | # Mappings | Original UMLS | Refined UMLS (LogMap) | Average | Incoherence Analysis | ||||||||
Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | All Unsat. | Degree | Root Unsat. | |||
YAM++ | 30,155 | 14,103 | 0.794 | 0.594 | 0.680 | 0.785 | 0.604 | 0.683 | 0.790 | 0.599 | 0.681 | 238,593 | 63.91% | 979 |
ServOMapL | 738 | 13,964 | 0.796 | 0.590 | 0.678 | 0.785 | 0.598 | 0.679 | 0.791 | 0.594 | 0.678 | 286,790 | 76.82% | 1,557 |
LogMap | 955 | 13,011 | 0.816 | 0.564 | 0.667 | 0.812 | 0.577 | 0.674 | 0.814 | 0.570 | 0.671 | 16 | 0.004% | 10 |
LogMap-noe | 1,505 | 13,058 | 0.813 | 0.563 | 0.666 | 0.809 | 0.577 | 0.673 | 0.811 | 0.570 | 0.670 | 0 | 0% | 0 |
ServOMap | 654 | 12,462 | 0.835 | 0.552 | 0.664 | 0.824 | 0.560 | 0.667 | 0.829 | 0.556 | 0.666 | 230,055 | 61.63% | 1,546 |
GOMMA_Bk | 1,940 | 17,045 | 0.669 | 0.605 | 0.635 | 0.658 | 0.612 | 0.634 | 0.663 | 0.608 | 0.635 | 239,708 | 64.21% | 4,297 |
LogMapLt | 178 | 14,043 | 0.743 | 0.553 | 0.634 | 0.731 | 0.560 | 0.634 | 0.737 | 0.557 | 0.634 | 305,648 | 81.87% | 3,160 |
GOMMA | 1,820 | 13,693 | 0.720 | 0.523 | 0.606 | 0.707 | 0.528 | 0.605 | 0.714 | 0.526 | 0.606 | 215,959 | 57.85% | 2,614 |
AROMA | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
MapSSS | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
The following table summarises the results for the 8 systems that completed all 9 tasks in the Large BioMed Track. The table shows the average precision, recall, F-measure and incoherence degree; and the total time to complete the tasks.
The systems have been ordered according to the average F-measure. YAM++ obtained the best average F-measure, GOMMA-Bk the best recall and ServOMap computed the most precise mappings. The first 6 systems obtained very close results in terms of F-measure and there were only a gap of 0.024 between the first (YAM++) and the sixth (ServOMap).
Regarding mapping incoherence, LogMap and LogMap-noe were the unique systems providing mapping sets leading to a small number of unsatisfiable classes.
Finally, LogMapLt, since it implements basic and efficient string similarity techniques, was the fastest system. The rest of the tools, apart from YAM++, were also very fast and only needed between 38 and 97 minutes to complete the tasks. YAM++ was the counterexample and required almost 19 hours to complete the nine tasks.
System | Total Time (s) | Average | |||
Precision | Recall | F-measure | Incoherence | ||
YAM++ | 67,817 | 0.876 | 0.710 | 0.782 | 45.30% |
ServOMapL | 2,405 | 0.890 | 0.699 | 0.780 | 51.46% |
LogMap-noe | 3,964 | 0.869 | 0.695 | 0.770 | 0.004% |
GOMMA_Bk | 5,821 | 0.767 | 0.791 | 0.768 | 45.32% |
LogMap | 3,077 | 0.869 | 0.684 | 0.762 | 0.006% |
ServOMap | 2,310 | 0.903 | 0.657 | 0.758 | 55.36% |
GOMMA | 5,341 | 0.746 | 0.553 | 0.625 | 24.01% |
LogMapLt | 711 | 0.831 | 0.515 | 0.586 | 33.17% |
[1] Christian Meilicke and Heiner Stuckenschmidt. Incoherence as a basis for measuring the quality of ontology mappings. In Proc. of 3rd International Workshop on Ontology Matching (OM), 2008. [url]
[2] Ernesto Jimenez-Ruiz and Bernardo Cuenca Grau. LogMap: Logic-based and scalable ontology matching. In Proc. of 10th International Semantic Web Conference (ISWC), 2011. [url]
[3] Christian Meilicke. Alignment Incoherence in Ontology Matching. University of Mannheim, Chair of Artificial Intelligence (2011) [url]
This track is organised by Ernesto Jimenez Ruiz, Bernardo Cuenca Grau and Ian Horrocks, and supported by the SEALS and LogMap projects. If you have any question/suggestion related to the results of this track, feel free to write an email to ernesto [at] cs [.] ox [.] ac [.] uk or ernesto [.] jimenez [.] ruiz [at] gmail [.] com