We have run the evaluation in a Ubuntu Laptop with an Intel Core i7-4600U CPU @ 2.10GHz x 4 and allocating 15Gb of RAM.
Precision, Recall and F-measure have been computed with respect to a UMLS-based reference alignment. Systems have been ordered in terms of F-measure.
In the OAEI 2014 largebio track 11 out of 14 participating OAEI 2014 systems have been able to cope with at least one of the tasks of the largebio track.
RiMOM-IM, InsMT and InsMTL are systems focusing in the instance matching track and they did not produce any alignment for the largebio track.
LogMap-Bio uses BioPortal as mediating ontology provider, that is, it retrieves from BoipPortal the most suitable top-5 ontologies for the matching task.
Together with Precision, Recall, F-measure and Runtimes we have also evaluated the coherence of alignments. We have reported (1) number of unsatisfiabilities when reasoning with the input ontologies together with the computed mappings, and (2) the ratio/degree of unsatisfiable classes with respect to the size of the union of the input ontologies.
We have used the OWL 2 reasoner HermiT to compute the number of unsatisfiable classes. For the cases in which HermiT could not cope with the input ontologies and the mappings (in less than 2 hours) we have provided a lower bound on the number of unsatisfiable classes (indicated by ≥) using the OWL 2 EL reasoner ELK.
In this OAEI edition, only two systems have shown mapping repair facilities, namely: AML and LogMap (including LogMap-Bio and LogMap-C variants). The results show that even the most precise alignment sets may lead to a huge amount of unsatisfiable classes. This proves the importance of using techniques to assess the coherence of the generated alignments.
Table 1 shows which systems were able to complete each of the matching tasks in less than 10 hours and the required computation times. Systems have been ordered with respect to the number of completed task and the average time required to complete them. Times are reported in seconds.
The last column reports the number of tasks that a system could complete. For example, 6 system were able to complete all six tasks. The last row shows the number of systems that could finish each of the tasks. The tasks involving SNOMED were also harder with respect to both computation times and the number of systems that completed the tasks.
System | FMA-NCI | FMA-SNOMED | SNOMED-NCI | Average | # Tasks | |||
Task 1 | Task 2 | Task 3 | Task 4 | Task 5 | Task 6 | |||
LogMapLite | 5 | 44 | 13 | 90 | 76 | 89 | 53 | 6 |
XMap | 17 | 144 | 35 | 390 | 182 | 490 | 210 | 6 |
LogMap | 14 | 106 | 63 | 388 | 263 | 917 | 292 | 6 |
AML | 27 | 112 | 126 | 251 | 831 | 497 | 307 | 6 |
LogMap-C | 81 | 289 | 119 | 571 | 2,723 | 2,548 | 1,055 | 6 |
LogMap-Bio | 975 | 1,226 | 1,060 | 1,449 | 1,379 | 2,545 | 1,439 | 6 |
OMReasoner | 82 | 36,369 | 691 | - | 5,206 | - | 10,587 | 4 |
MaasMtch | 1,460 | - | 4,605 | - | - | - | 3,033 | 2 |
RSDLWB | 2,216 | - | - | - | - | - | 2,216 | 1 |
AOT | 9,341 | - | - | - | - | - | 9,341 | 1 |
AOTL | 20,908 | - | - | - | - | - | 20,908 | 1 |
# Systems | 11 | 7 | 8 | 6 | 7 | 6 | 4,495 | 45 |
The following tables summarize the results for the tasks in the FMA-NCI matching problem.
LogMap-Bio and AML provided the best results in terms of both Recall and F-measure in Task 1 and Task 2, respectively. OMReasoner provided the best results in terms of precision, although its recall was below average. From the last year participants, XMap and MaasMatch improved considerably their performance with respect to both runtime and F-measure. AML and LogMap obtained again very good results. LogMap-Bio improves LogMap's recall in both tasks, however precision is damaged specially in Task 2.
Note that efficiency in Task 2 has decreased with respect to Task 1. This is mostly due to the fact that larger ontologies also involves more possible candidate alignments and it is harder to keep high precision values without damaging recall, and vice versa. Furthermore, AOT, AOTL, RSDLWB and MaasMatch could no complete Task 2. The first three did not finish in less than 10 hours while MaasMatch rose an "out of memory" exception.
System | Time (s) | # Mappings | Scores | Incoherence Analysis | |||
Precision | Recall | F-measure | Unsat. | Degree | |||
AML | 27 | 2,690 | 0.960 | 0.899 | 0.928 | 2 | 0.02% |
LogMap | 14 | 2,738 | 0.946 | 0.897 | 0.921 | 2 | 0.02% |
LogMap-Bio | 975 | 2,892 | 0.914 | 0.918 | 0.916 | 467 | 4.5% |
XMap | 17 | 2,657 | 0.932 | 0.848 | 0.888 | 3,905 | 38.0% |
LogMapLite | 5 | 2,479 | 0.967 | 0.819 | 0.887 | 2,103 | 20.5% |
LogMap-C | 81 | 2,153 | 0.962 | 0.724 | 0.826 | 2 | 0.02% |
MaasMatch | 1,460 | 2,981 | 0.808 | 0.840 | 0.824 | 8,767 | 85.3% |
Average | 3,193 | 2,287 | 0.910 | 0.704 | 0.757 | 2,277 | 22.2% |
AOT | 9,341 | 3,696 | 0.662 | 0.855 | 0.746 | 8,373 | 81.4% |
OMReasoner | 82 | 1,362 | 0.995 | 0.466 | 0.635 | 56 | 0.5% |
RSDLWB | 2,216 | 728 | 0.962 | 0.236 | 0.380 | 22 | 0.2% |
AOTL | 20,908 | 790 | 0.902 | 0.237 | 0.375 | 1,356 | 13.2% |
System | Time (s) | # Mappings | Scores | Incoherence Analysis | |||
Precision | Recall | F-measure | Unsat. | Degree | |||
AML | 112 | 2,931 | 0.832 | 0.856 | 0.844 | 10 | 0.007% |
LogMap | 106 | 2,678 | 0.863 | 0.808 | 0.834 | 13 | 0.009% |
LogMap-Bio | 1,226 | 3,412 | 0.724 | 0.874 | 0.792 | 40 | 0.027% |
XMap | 144 | 2,571 | 0.835 | 0.745 | 0.787 | 9,218 | 6.3% |
Average | 5,470 | 2,655 | 0.824 | 0.746 | 0.768 | 5,122 | 3.5% |
LogMap-C | 289 | 2,124 | 0.877 | 0.650 | 0.747 | 9 | 0.006% |
LogMapLite | 44 | 3,467 | 0.675 | 0.819 | 0.740 | 26,441 | 18.1% |
OMReasoner | 36,369 | 1,403 | 0.964 | 0.466 | 0.628 | 123 | 0.084% |
The following tables summarize the results for the tasks in the FMA-SNOMED matching problem.
AML provided the best results in terms of F-measure on both Task 3 and Task 4. AML also provided the best Recall and Precision in Task 3 and Task 4, respectively; while LogMapLite provided the best Precision in Task 3 and LogMap-Bio the best Recall in Task 4.
Overall, the results were less positive than in the FMA-NCI matching problem. As in the FMA-NCI matching problem, efficiency also decreases as the ontology size increases. The most important variations were suffered by LogMap-Lite and XMap in terms of precision. Furthermore, AOT, AOTL, RSDLWB could no complete neither Task 3 nor Task 4 in less than 10 hours. MaasMatch rose an "out of memory" exception in Task 4, while OMReasoner could not complete Task 4 within the permitted time.
System | Time (s) | # Mappings | Scores | Incoherence Analysis | |||
Precision | Recall | F-measure | Unsat. | Degree | |||
AML | 126 | 6,791 | 0.926 | 0.742 | 0.824 | 0 | 0.0% |
LogMap-Bio | 1,060 | 6,444 | 0.932 | 0.710 | 0.806 | 0 | 0.0% |
LogMap | 63 | 6,242 | 0.950 | 0.695 | 0.803 | 0 | 0.0% |
XMap | 35 | 7,443 | 0.858 | 0.737 | 0.793 | 13,429 | 56.9% |
LogMap-C | 119 | 4,536 | 0.958 | 0.508 | 0.664 | 0 | 0.0% |
MaasMatch | 4,605 | 8,117 | 0.655 | 0.674 | 0.664 | 21,946 | 92.9% |
Average | 839 | 5,342 | 0.870 | 0.554 | 0.644 | 4,578 | 19.4% |
LogMapLite | 13 | 1,645 | 0.968 | 0.208 | 0.343 | 773 | 3.3% |
OMReasoner | 691 | 1,520 | 0.713 | 0.156 | 0.256 | 478 | 2.0% |
System | Time (s) | # Mappings | Scores | Incoherence Analysis | |||
Precision | Recall | F-measure | Unsat. | Degree | |||
AML | 251 | 6,192 | 0.891 | 0.647 | 0.749 | 0 | 0.0% |
LogMap | 388 | 6,141 | 0.831 | 0.623 | 0.712 | 0 | 0.0% |
LogMap-Bio | 1,449 | 6,853 | 0.756 | 0.651 | 0.700 | 0 | 0.0% |
Average | 523 | 5,760 | 0.790 | 0.540 | 0.617 | 11,823 | 5.9% |
LogMap-C | 571 | 4,630 | 0.853 | 0.476 | 0.611 | 98 | 0.049% |
XMap | 390 | 8,926 | 0.558 | 0.633 | 0.593 | 66,448 | 33.0% |
LogMapLite | 90 | 1,823 | 0.852 | 0.208 | 0.335 | 4,393 | 2.2% |
The following tables summarize the results for the tasks in the SNOMED-NCI matching problem.
AML provided the best results in terms of both Recall and F-measure in Task 5, while OMReasoner provided the best results in terms of precision. Task 6 was completely dominated by AML.
As in the previous matching problems, efficiency decreases as the ontology size increases. Furthermore, AOT, AOTL, RSDLWB could no complete neither Task 5 nor Task 6 in less than 10 hours. MaasMatch rose a "stack overflow" exception in Task 5 and an "out of memory" exception in Task 6, while OMReasoner could not complete Task 6 within the permitted time.
System | Time (s) | # Mappings | Scores | Incoherence Analysis | |||
Precision | Recall | F-measure | Unsat. | Degree | |||
AML | 831 | 14,131 | 0.917 | 0.724 | 0.809 | ≥0 | ≥0.0% |
LogMap-Bio | 1,379 | 14,360 | 0.880 | 0.709 | 0.786 | ≥23 | ≥0.031% |
LogMap | 263 | 14,011 | 0.889 | 0.699 | 0.783 | ≥23 | ≥0.031% |
XMap | 182 | 14,223 | 0.849 | 0.665 | 0.746 | ≥65,512 | ≥87.1% |
Average | 1,522 | 12,177 | 0.911 | 0.611 | 0.722 | 23,078 | 30.7% |
LogMapLite | 76 | 10,962 | 0.948 | 0.567 | 0.710 | ≥60,426 | ≥80.3% |
LogMap-C | 2,723 | 10,432 | 0.909 | 0.531 | 0.670 | ≥0 | ≥0.0% |
OMReasoner | 5,206 | 7,120 | 0.983 | 0.383 | 0.551 | ≥35,568 | ≥47.3% |
System | Time (s) | # Mappings | Scores | Incoherence Analysis | |||
Precision | Recall | F-measure | Unsat. | Degree | |||
AML | 497 | 12,626 | 0.912 | 0.645 | 0.756 | ≥0 | ≥0.0% |
LogMap-Bio | 2,545 | 12,507 | 0.852 | 0.599 | 0.703 | ≥37 | ≥0.020% |
LogMap | 917 | 12,167 | 0.863 | 0.590 | 0.701 | ≥36 | ≥0.019% |
XMap | 490 | 12,525 | 0.843 | 0.584 | 0.690 | ≥134,622 | ≥71.1% |
Average | 1,181 | 12,024 | 0.858 | 0.575 | 0.687 | 47,578 | 25.1% |
LogMapLite | 89 | 12,907 | 0.798 | 0.567 | 0.663 | ≥150,776 | ≥79.6% |
LogMap-C | 2,548 | 9,414 | 0.880 | 0.464 | 0.608 | ≥1 | ≥0.001% |
The following table summarises the results for the systems that completed all 6 tasks of the Large BioMed Track. The table shows the total time in seconds to complete all tasks and averages for Precision, Recall, F-measure and Incoherence degree. The systems have been ordered according to the average F-measure and Incoherence degree.
AML was a step ahead and obtained the best average Recall and F-measure, and the second best average Precision.
LogMap-C obtained the best average Precision while LogMap-Bio obtained the second best average Recall.
Regarding mapping incoherence, AML also computed, on average, the mapping sets leading to the smallest number of unsatisfiable classes. LogMap variants also obtained very good results in terms of mapping coherence.
Finally, LogMapLt was the fastest system. The rest of the tools were also very fast and only needed between 21 and 144 minutes to complete all 6 tasks.
System | Total Time (s) | Average | |||
Precision | Recall | F-measure | Incoherence | ||
AML | 1,844 | 0.906 | 0.752 | 0.819 | 0.0045% |
LogMap | 1,751 | 0.890 | 0.719 | 0.792 | 0.0131% |
LogMap-Bio | 8,634 | 0.843 | 0.744 | 0.784 | 0.8% |
XMap | 1,258 | 0.813 | 0.702 | 0.750 | 48.7% |
LogMap-C | 6,331 | 0.907 | 0.559 | 0.688 | 0.0125% |
LogMapLite | 317 | 0.868 | 0.532 | 0.613 | 34.0% |
Task remaining to be done: