Large Biomedical Track: results

Evaluation setting

We have run the evaluation in a Ubuntu Laptop with an Intel Core i7-4600U CPU @ 2.10GHz x 4 and allocating 15Gb of RAM.

Precision, Recall and F-measure have been computed with respect to a UMLS-based reference alignment. Systems have been ordered in terms of F-measure.

Participation and success

In the OAEI 2014 largebio track 11 out of 14 participating OAEI 2014 systems have been able to cope with at least one of the tasks of the largebio track.

RiMOM-IM, InsMT and InsMTL are systems focusing in the instance matching track and they did not produce any alignment for the largebio track.

Use of background knowledge

LogMap-Bio uses BioPortal as mediating ontology provider, that is, it retrieves from BoipPortal the most suitable top-5 ontologies for the matching task.

Alignment coherence

Together with Precision, Recall, F-measure and Runtimes we have also evaluated the coherence of alignments. We have reported (1) number of unsatisfiabilities when reasoning with the input ontologies together with the computed mappings, and (2) the ratio/degree of unsatisfiable classes with respect to the size of the union of the input ontologies.

We have used the OWL 2 reasoner HermiT to compute the number of unsatisfiable classes. For the cases in which HermiT could not cope with the input ontologies and the mappings (in less than 2 hours) we have provided a lower bound on the number of unsatisfiable classes (indicated by ≥) using the OWL 2 EL reasoner ELK.

In this OAEI edition, only two systems have shown mapping repair facilities, namely: AML and LogMap (including LogMap-Bio and LogMap-C variants). The results show that even the most precise alignment sets may lead to a huge amount of unsatisfiable classes. This proves the importance of using techniques to assess the coherence of the generated alignments.

Runtimes and task completion

Table 1 shows which systems were able to complete each of the matching tasks in less than 10 hours and the required computation times. Systems have been ordered with respect to the number of completed task and the average time required to complete them. Times are reported in seconds.

The last column reports the number of tasks that a system could complete. For example, 6 system were able to complete all six tasks. The last row shows the number of systems that could finish each of the tasks. The tasks involving SNOMED were also harder with respect to both computation times and the number of systems that completed the tasks.

**Table 1:** System runtimes (s) and task completion.
System	FMA-NCI		FMA-SNOMED		SNOMED-NCI		Average	# Tasks
System	Task 1	Task 2	Task 3	Task 4	Task 5	Task 6	Average	# Tasks
LogMapLite	5	44	13	90	76	89	53	6
XMap	17	144	35	390	182	490	210	6
LogMap	14	106	63	388	263	917	292	6
AML	27	112	126	251	831	497	307	6
LogMap-C	81	289	119	571	2,723	2,548	1,055	6
LogMap-Bio	975	1,226	1,060	1,449	1,379	2,545	1,439	6
OMReasoner	82	36,369	691	-	5,206	-	10,587	4
MaasMtch	1,460	-	4,605	-	-	-	3,033	2
RSDLWB	2,216	-	-	-	-	-	2,216	1
AOT	9,341	-	-	-	-	-	9,341	1
AOTL	20,908	-	-	-	-	-	20,908	1
# Systems	11	7	8	6	7	6	4,495	45

Results of the FMA-NCI matching problem

The following tables summarize the results for the tasks in the FMA-NCI matching problem.

LogMap-Bio and AML provided the best results in terms of both Recall and F-measure in Task 1 and Task 2, respectively. OMReasoner provided the best results in terms of precision, although its recall was below average. From the last year participants, XMap and MaasMatch improved considerably their performance with respect to both runtime and F-measure. AML and LogMap obtained again very good results. LogMap-Bio improves LogMap's recall in both tasks, however precision is damaged specially in Task 2.

Note that efficiency in Task 2 has decreased with respect to Task 1. This is mostly due to the fact that larger ontologies also involves more possible candidate alignments and it is harder to keep high precision values without damaging recall, and vice versa. Furthermore, AOT, AOTL, RSDLWB and MaasMatch could no complete Task 2. The first three did not finish in less than 10 hours while MaasMatch rose an "out of memory" exception.

Task 1: FMA-NCI small fragments

**Table 1:** Results for the largebio task 1.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
AML	27	2,690	0.960	0.899	0.928	2	0.02%
LogMap	14	2,738	0.946	0.897	0.921	2	0.02%
LogMap-Bio	975	2,892	0.914	0.918	0.916	467	4.5%
XMap	17	2,657	0.932	0.848	0.888	3,905	38.0%
LogMapLite	5	2,479	0.967	0.819	0.887	2,103	20.5%
LogMap-C	81	2,153	0.962	0.724	0.826	2	0.02%
MaasMatch	1,460	2,981	0.808	0.840	0.824	8,767	85.3%
Average	3,193	2,287	0.910	0.704	0.757	2,277	22.2%
AOT	9,341	3,696	0.662	0.855	0.746	8,373	81.4%
OMReasoner	82	1,362	0.995	0.466	0.635	56	0.5%
RSDLWB	2,216	728	0.962	0.236	0.380	22	0.2%
AOTL	20,908	790	0.902	0.237	0.375	1,356	13.2%

Task 2: FMA-NCI whole ontologies

**Table 2:** Results for the largebio task 2.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
AML	112	2,931	0.832	0.856	0.844	10	0.007%
LogMap	106	2,678	0.863	0.808	0.834	13	0.009%
LogMap-Bio	1,226	3,412	0.724	0.874	0.792	40	0.027%
XMap	144	2,571	0.835	0.745	0.787	9,218	6.3%
Average	5,470	2,655	0.824	0.746	0.768	5,122	3.5%
LogMap-C	289	2,124	0.877	0.650	0.747	9	0.006%
LogMapLite	44	3,467	0.675	0.819	0.740	26,441	18.1%
OMReasoner	36,369	1,403	0.964	0.466	0.628	123	0.084%

Results of the FMA-SNOMED matching problem

The following tables summarize the results for the tasks in the FMA-SNOMED matching problem.

AML provided the best results in terms of F-measure on both Task 3 and Task 4. AML also provided the best Recall and Precision in Task 3 and Task 4, respectively; while LogMapLite provided the best Precision in Task 3 and LogMap-Bio the best Recall in Task 4.

Overall, the results were less positive than in the FMA-NCI matching problem. As in the FMA-NCI matching problem, efficiency also decreases as the ontology size increases. The most important variations were suffered by LogMap-Lite and XMap in terms of precision. Furthermore, AOT, AOTL, RSDLWB could no complete neither Task 3 nor Task 4 in less than 10 hours. MaasMatch rose an "out of memory" exception in Task 4, while OMReasoner could not complete Task 4 within the permitted time.

Task 3: FMA-SNOMED small fragments

**Table 1:** Results for the largebio task 3.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
AML	126	6,791	0.926	0.742	0.824	0	0.0%
LogMap-Bio	1,060	6,444	0.932	0.710	0.806	0	0.0%
LogMap	63	6,242	0.950	0.695	0.803	0	0.0%
XMap	35	7,443	0.858	0.737	0.793	13,429	56.9%
LogMap-C	119	4,536	0.958	0.508	0.664	0	0.0%
MaasMatch	4,605	8,117	0.655	0.674	0.664	21,946	92.9%
Average	839	5,342	0.870	0.554	0.644	4,578	19.4%
LogMapLite	13	1,645	0.968	0.208	0.343	773	3.3%
OMReasoner	691	1,520	0.713	0.156	0.256	478	2.0%

Task 4: FMA whole ontology with SNOMED large fragment

**Table 2:** Results for the largebio task 4.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
AML	251	6,192	0.891	0.647	0.749	0	0.0%
LogMap	388	6,141	0.831	0.623	0.712	0	0.0%
LogMap-Bio	1,449	6,853	0.756	0.651	0.700	0	0.0%
Average	523	5,760	0.790	0.540	0.617	11,823	5.9%
LogMap-C	571	4,630	0.853	0.476	0.611	98	0.049%
XMap	390	8,926	0.558	0.633	0.593	66,448	33.0%
LogMapLite	90	1,823	0.852	0.208	0.335	4,393	2.2%

Results of the SNOMED-NCI matching problem

The following tables summarize the results for the tasks in the SNOMED-NCI matching problem.

AML provided the best results in terms of both Recall and F-measure in Task 5, while OMReasoner provided the best results in terms of precision. Task 6 was completely dominated by AML.

As in the previous matching problems, efficiency decreases as the ontology size increases. Furthermore, AOT, AOTL, RSDLWB could no complete neither Task 5 nor Task 6 in less than 10 hours. MaasMatch rose a "stack overflow" exception in Task 5 and an "out of memory" exception in Task 6, while OMReasoner could not complete Task 6 within the permitted time.

Task 5: SNOMED-NCI small fragments

**Table 1:** Results for the largebio task 5.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
AML	831	14,131	0.917	0.724	0.809	≥0	≥0.0%
LogMap-Bio	1,379	14,360	0.880	0.709	0.786	≥23	≥0.031%
LogMap	263	14,011	0.889	0.699	0.783	≥23	≥0.031%
XMap	182	14,223	0.849	0.665	0.746	≥65,512	≥87.1%
Average	1,522	12,177	0.911	0.611	0.722	23,078	30.7%
LogMapLite	76	10,962	0.948	0.567	0.710	≥60,426	≥80.3%
LogMap-C	2,723	10,432	0.909	0.531	0.670	≥0	≥0.0%
OMReasoner	5,206	7,120	0.983	0.383	0.551	≥35,568	≥47.3%

Task 6: NCI whole ontology with SNOMED large fragment

**Table 2:** Results for the largebio task 6.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
AML	497	12,626	0.912	0.645	0.756	≥0	≥0.0%
LogMap-Bio	2,545	12,507	0.852	0.599	0.703	≥37	≥0.020%
LogMap	917	12,167	0.863	0.590	0.701	≥36	≥0.019%
XMap	490	12,525	0.843	0.584	0.690	≥134,622	≥71.1%
Average	1,181	12,024	0.858	0.575	0.687	47,578	25.1%
LogMapLite	89	12,907	0.798	0.567	0.663	≥150,776	≥79.6%
LogMap-C	2,548	9,414	0.880	0.464	0.608	≥1	≥0.001%

Summary Results (top systems)

The following table summarises the results for the systems that completed all 6 tasks of the Large BioMed Track. The table shows the total time in seconds to complete all tasks and averages for Precision, Recall, F-measure and Incoherence degree. The systems have been ordered according to the average F-measure and Incoherence degree.

AML was a step ahead and obtained the best average Recall and F-measure, and the second best average Precision.

LogMap-C obtained the best average Precision while LogMap-Bio obtained the second best average Recall.

Regarding mapping incoherence, AML also computed, on average, the mapping sets leading to the smallest number of unsatisfiable classes. LogMap variants also obtained very good results in terms of mapping coherence.

Finally, LogMapLt was the fastest system. The rest of the tools were also very fast and only needed between 21 and 144 minutes to complete all 6 tasks.

System	Total Time (s)	Average
System	Total Time (s)	Precision	Recall	F-measure	Incoherence
AML	1,844	0.906	0.752	0.819	0.0045%
LogMap	1,751	0.890	0.719	0.792	0.0131%
LogMap-Bio	8,634	0.843	0.744	0.784	0.8%
XMap	1,258	0.813	0.702	0.750	48.7%
LogMap-C	6,331	0.907	0.559	0.688	0.0125%
LogMapLite	317	0.868	0.532	0.613	34.0%

Task remaining to be done:

Harmonization of the mapping outputs
Mapping repair evaluation

Contact

If you have any question/suggestion related to the results of this track or if you notice any kind of error (wrong numbers, incorrect information on a matching system, etc.), feel free to write an email to ernesto [at] cs [.] ox [.] ac [.] uk or ernesto [.] jimenez [.] ruiz [at] gmail [.] com

Original page: http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2014/results2014.html [cached: 13/05/2016]