Ontology Alignment Evaluation Initiative - OAEI-2014 Campaign

Large Biomedical Track: results

Evaluation setting

We have run the evaluation in a Ubuntu Laptop with an Intel Core i7-4600U CPU @ 2.10GHz x 4 and allocating 15Gb of RAM.

Precision, Recall and F-measure have been computed with respect to a UMLS-based reference alignment. Systems have been ordered in terms of F-measure.

Participation and success

In the OAEI 2014 largebio track 11 out of 14 participating OAEI 2014 systems have been able to cope with at least one of the tasks of the largebio track.

RiMOM-IM, InsMT and InsMTL are systems focusing in the instance matching track and they did not produce any alignment for the largebio track.

Use of background knowledge

LogMap-Bio uses BioPortal as mediating ontology provider, that is, it retrieves from BoipPortal the most suitable top-5 ontologies for the matching task.

Alignment coherence

Together with Precision, Recall, F-measure and Runtimes we have also evaluated the coherence of alignments. We have reported (1) number of unsatisfiabilities when reasoning with the input ontologies together with the computed mappings, and (2) the ratio/degree of unsatisfiable classes with respect to the size of the union of the input ontologies.

We have used the OWL 2 reasoner HermiT to compute the number of unsatisfiable classes. For the cases in which HermiT could not cope with the input ontologies and the mappings (in less than 2 hours) we have provided a lower bound on the number of unsatisfiable classes (indicated by ≥) using the OWL 2 EL reasoner ELK.

In this OAEI edition, only two systems have shown mapping repair facilities, namely: AML and LogMap (including LogMap-Bio and LogMap-C variants). The results show that even the most precise alignment sets may lead to a huge amount of unsatisfiable classes. This proves the importance of using techniques to assess the coherence of the generated alignments.

Runtimes and task completion

Table 1 shows which systems were able to complete each of the matching tasks in less than 10 hours and the required computation times. Systems have been ordered with respect to the number of completed task and the average time required to complete them. Times are reported in seconds.

The last column reports the number of tasks that a system could complete. For example, 6 system were able to complete all six tasks. The last row shows the number of systems that could finish each of the tasks. The tasks involving SNOMED were also harder with respect to both computation times and the number of systems that completed the tasks.

System FMA-NCI FMA-SNOMED SNOMED-NCI Average # Tasks
Task 1 Task 2 Task 3 Task 4 Task 5 Task 6
LogMapLite 5 44 13 90 76 89 53 6
XMap 17 144 35 390 182 490 210 6
LogMap 14 106 63 388 263 917 292 6
AML 27 112 126 251 831 497 307 6
LogMap-C 81 289 119 571 2,723 2,548 1,055 6
LogMap-Bio 975 1,226 1,060 1,449 1,379 2,545 1,439 6
OMReasoner 82 36,369 691 - 5,206 - 10,587 4
MaasMtch 1,460 - 4,605 - - - 3,033 2
RSDLWB 2,216 - - - - - 2,216 1
AOT 9,341 - - - - - 9,341 1
AOTL 20,908 - - - - - 20,908 1
# Systems 11 7 8 6 7 6 4,495 45
Table 1: System runtimes (s) and task completion.

Results of the FMA-NCI matching problem

The following tables summarize the results for the tasks in the FMA-NCI matching problem.

LogMap-Bio and AML provided the best results in terms of both Recall and F-measure in Task 1 and Task 2, respectively. OMReasoner provided the best results in terms of precision, although its recall was below average. From the last year participants, XMap and MaasMatch improved considerably their performance with respect to both runtime and F-measure. AML and LogMap obtained again very good results. LogMap-Bio improves LogMap's recall in both tasks, however precision is damaged specially in Task 2.

Note that efficiency in Task 2 has decreased with respect to Task 1. This is mostly due to the fact that larger ontologies also involves more possible candidate alignments and it is harder to keep high precision values without damaging recall, and vice versa. Furthermore, AOT, AOTL, RSDLWB and MaasMatch could no complete Task 2. The first three did not finish in less than 10 hours while MaasMatch rose an "out of memory" exception.

Task 1: FMA-NCI small fragments

System Time (s) # Mappings Scores Incoherence Analysis
Precision  Recall  F-measure Unsat. Degree
AML 27 2,690 0.960 0.899 0.928 2 0.02%
LogMap 14 2,738 0.946 0.897 0.921 2 0.02%
LogMap-Bio 975 2,892 0.914 0.918 0.916 467 4.5%
XMap 17 2,657 0.932 0.848 0.888 3,905 38.0%
LogMapLite 5 2,479 0.967 0.819 0.887 2,103 20.5%
LogMap-C 81 2,153 0.962 0.724 0.826 2 0.02%
MaasMatch 1,460 2,981 0.808 0.840 0.824 8,767 85.3%
Average 3,193 2,287 0.910 0.704 0.757 2,277 22.2%
AOT 9,341 3,696 0.662 0.855 0.746 8,373 81.4%
OMReasoner 82 1,362 0.995 0.466 0.635 56 0.5%
RSDLWB 2,216 728 0.962 0.236 0.380 22 0.2%
AOTL 20,908 790 0.902 0.237 0.375 1,356 13.2%
Table 1: Results for the largebio task 1.

Task 2: FMA-NCI whole ontologies

System Time (s) # Mappings Scores Incoherence Analysis
Precision  Recall  F-measure Unsat. Degree
AML 112 2,931 0.832 0.856 0.844 10 0.007%
LogMap 106 2,678 0.863 0.808 0.834 13 0.009%
LogMap-Bio 1,226 3,412 0.724 0.874 0.792 40 0.027%
XMap 144 2,571 0.835 0.745 0.787 9,218 6.3%
Average 5,470 2,655 0.824 0.746 0.768 5,122 3.5%
LogMap-C 289 2,124 0.877 0.650 0.747 9 0.006%
LogMapLite 44 3,467 0.675 0.819 0.740 26,441 18.1%
OMReasoner 36,369 1,403 0.964 0.466 0.628 123 0.084%
Table 2: Results for the largebio task 2.

Results of the FMA-SNOMED matching problem

The following tables summarize the results for the tasks in the FMA-SNOMED matching problem.

AML provided the best results in terms of F-measure on both Task 3 and Task 4. AML also provided the best Recall and Precision in Task 3 and Task 4, respectively; while LogMapLite provided the best Precision in Task 3 and LogMap-Bio the best Recall in Task 4.

Overall, the results were less positive than in the FMA-NCI matching problem. As in the FMA-NCI matching problem, efficiency also decreases as the ontology size increases. The most important variations were suffered by LogMap-Lite and XMap in terms of precision. Furthermore, AOT, AOTL, RSDLWB could no complete neither Task 3 nor Task 4 in less than 10 hours. MaasMatch rose an "out of memory" exception in Task 4, while OMReasoner could not complete Task 4 within the permitted time.

Task 3: FMA-SNOMED small fragments

System Time (s) # Mappings Scores Incoherence Analysis
Precision  Recall  F-measure Unsat. Degree
AML 126 6,791 0.926 0.742 0.824 0 0.0%
LogMap-Bio 1,060 6,444 0.932 0.710 0.806 0 0.0%
LogMap 63 6,242 0.950 0.695 0.803 0 0.0%
XMap 35 7,443 0.858 0.737 0.793 13,429 56.9%
LogMap-C 119 4,536 0.958 0.508 0.664 0 0.0%
MaasMatch 4,605 8,117 0.655 0.674 0.664 21,946 92.9%
Average 839 5,342 0.870 0.554 0.644 4,578 19.4%
LogMapLite 13 1,645 0.968 0.208 0.343 773 3.3%
OMReasoner 691 1,520 0.713 0.156 0.256 478 2.0%
Table 1: Results for the largebio task 3.

Task 4: FMA whole ontology with SNOMED large fragment

System Time (s) # Mappings Scores Incoherence Analysis
Precision  Recall  F-measure Unsat. Degree
AML 251 6,192 0.891 0.647 0.749 0 0.0%
LogMap 388 6,141 0.831 0.623 0.712 0 0.0%
LogMap-Bio 1,449 6,853 0.756 0.651 0.700 0 0.0%
Average 523 5,760 0.790 0.540 0.617 11,823 5.9%
LogMap-C 571 4,630 0.853 0.476 0.611 98 0.049%
XMap 390 8,926 0.558 0.633 0.593 66,448 33.0%
LogMapLite 90 1,823 0.852 0.208 0.335 4,393 2.2%
Table 2: Results for the largebio task 4.

Results of the SNOMED-NCI matching problem

The following tables summarize the results for the tasks in the SNOMED-NCI matching problem.

AML provided the best results in terms of both Recall and F-measure in Task 5, while OMReasoner provided the best results in terms of precision. Task 6 was completely dominated by AML.

As in the previous matching problems, efficiency decreases as the ontology size increases. Furthermore, AOT, AOTL, RSDLWB could no complete neither Task 5 nor Task 6 in less than 10 hours. MaasMatch rose a "stack overflow" exception in Task 5 and an "out of memory" exception in Task 6, while OMReasoner could not complete Task 6 within the permitted time.

Task 5: SNOMED-NCI small fragments

System Time (s) # Mappings Scores Incoherence Analysis
Precision  Recall  F-measure Unsat. Degree
AML 831 14,131 0.917 0.724 0.809 ≥0 ≥0.0%
LogMap-Bio 1,379 14,360 0.880 0.709 0.786 ≥23 ≥0.031%
LogMap 263 14,011 0.889 0.699 0.783 ≥23 ≥0.031%
XMap 182 14,223 0.849 0.665 0.746 ≥65,512 ≥87.1%
Average 1,522 12,177 0.911 0.611 0.722 23,078 30.7%
LogMapLite 76 10,962 0.948 0.567 0.710 ≥60,426 ≥80.3%
LogMap-C 2,723 10,432 0.909 0.531 0.670 ≥0 ≥0.0%
OMReasoner 5,206 7,120 0.983 0.383 0.551 ≥35,568 ≥47.3%
Table 1: Results for the largebio task 5.

Task 6: NCI whole ontology with SNOMED large fragment

System Time (s) # Mappings Scores Incoherence Analysis
Precision  Recall  F-measure Unsat. Degree
AML 497 12,626 0.912 0.645 0.756 ≥0 ≥0.0%
LogMap-Bio 2,545 12,507 0.852 0.599 0.703 ≥37 ≥0.020%
LogMap 917 12,167 0.863 0.590 0.701 ≥36 ≥0.019%
XMap 490 12,525 0.843 0.584 0.690 ≥134,622 ≥71.1%
Average 1,181 12,024 0.858 0.575 0.687 47,578 25.1%
LogMapLite 89 12,907 0.798 0.567 0.663 ≥150,776 ≥79.6%
LogMap-C 2,548 9,414 0.880 0.464 0.608 ≥1 ≥0.001%
Table 2: Results for the largebio task 6.

Summary Results (top systems)

The following table summarises the results for the systems that completed all 6 tasks of the Large BioMed Track. The table shows the total time in seconds to complete all tasks and averages for Precision, Recall, F-measure and Incoherence degree. The systems have been ordered according to the average F-measure and Incoherence degree.

AML was a step ahead and obtained the best average Recall and F-measure, and the second best average Precision.

LogMap-C obtained the best average Precision while LogMap-Bio obtained the second best average Recall.

Regarding mapping incoherence, AML also computed, on average, the mapping sets leading to the smallest number of unsatisfiable classes. LogMap variants also obtained very good results in terms of mapping coherence.

Finally, LogMapLt was the fastest system. The rest of the tools were also very fast and only needed between 21 and 144 minutes to complete all 6 tasks.

System Total Time (s) Average
Precision  Recall  F-measure Incoherence
AML 1,844 0.906 0.752 0.819 0.0045%
LogMap 1,751 0.890 0.719 0.792 0.0131%
LogMap-Bio 8,634 0.843 0.744 0.784 0.8%
XMap 1,258 0.813 0.702 0.750 48.7%
LogMap-C 6,331 0.907 0.559 0.688 0.0125%
LogMapLite 317 0.868 0.532 0.613 34.0%

Task remaining to be done:

  1. Harmonization of the mapping outputs
  2. Mapping repair evaluation

Contact

If you have any question/suggestion related to the results of this track or if you notice any kind of error (wrong numbers, incorrect information on a matching system, etc.), feel free to write an email to ernesto [at] cs [.] ox [.] ac [.] uk or ernesto [.] jimenez [.] ruiz [at] gmail [.] com

Original page: http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2014/results2014.html [cached: 13/05/2016]