Results of the Large Biomedical Ontology Track

General results

Evaluation setting

We have run the evaluation in a Ubuntu Laptop with an Intel Core i7-4600U CPU @ 2.10GHz x 4 and allocating 15Gb of RAM.

Precision, Recall and F-measure have been computed with respect to a UMLS-based reference alignment. Systems have been ordered in terms of F-measure.

Participation and success

In the OAEI 2015 largebio track 12 (13 considering XMAP variant) out of 22 participating OAEI 2015 systems have been able to cope with at least one of the tasks of the largebio track.

Note that RiMOM-IM, InsMT+, STRIM, EXONA, CLONA and LYAM++ are systems focusing on either the instance matching track or the multifarm track, and they did not produce any alignment for the largebio track. COMMAND and Mamba did not finish the smallest LaregBio task within the given 12 hours timeout, while GMap and JarvisOM gave an "error exception" when dealing with the smallest LargeBio task.

Use of background knowledge

LogMapBio uses BioPortal as mediating ontology provider, that is, it retrieves from BioPortal the most suitable top-10 ontologies for the matching task.

LogMap uses normalisations and spelling variants from the general (biomedical) purpose UMLS Lexicon.

AML has three sources of background knowledge which can be used as mediators between the input ontologies: the Uber Anatomy Ontology (Uberon), the Human Disease Ontology (DOID) and the Medical Subject Headings (MeSH).

XMAP has been evaluated with two variants: XMAP-BK and XMAP. XMAP-BK uses synonyms provided by the UMLS Metathesaurus., while XMAP has this feature deactivated. Note that matching systems using UMLS-Metathesaurus as background knowledge will have a notable advantage since the largebio reference alignment is also based on the UMLS-Metathesaurus. Nevertheless, it is still interesting to evaluate the performance of a system with and without the use of the UMLS-Metathesaurus.

Alignment coherence

Together with Precision, Recall, F-measure and Runtimes we have also evaluated the coherence of alignments. We have reported (1) number of unsatisfiabilities when reasoning with the input ontologies together with the computed mappings, and (2) the ratio/degree of unsatisfiable classes with respect to the size of the union of the input ontologies.

We have used the HermiT OWL 2 reasoner to compute the number of unsatisfiable classes. For the cases in which HermiT could not cope with the input ontologies and the mappings (in less than 2 hours) we have provided a lower bound on the number of unsatisfiable classes (indicated by ≥) using the OWL 2 EL reasoner ELK.

In this OAEI edition, only two systems have shown mapping repair facilities, namely: AML and LogMap (including LogMapBio and LogMapC variants). The results show that even the most precise alignment sets may lead to a huge amount of unsatisfiable classes. This proves the importance of using techniques to assess the coherence of the generated alignments.

Runtimes and task completion

Table 1 shows which systems were able to complete each of the matching tasks in less than 12 hours and the required computation times. Systems have been ordered with respect to the number of completed task and the average time required to complete them. Times are reported in seconds.

The last column reports the number of tasks that a system could complete. For example, 8 system were able to complete all six tasks. The last row shows the number of systems that could finish each of the tasks. The tasks involving SNOMED were also harder with respect to both computation times and the number of systems that completed the tasks.

* Uses background knowledge based on the UMLS-Metathesaurus as the LargeBio reference alignments.

**Table 1:** System runtimes (s) and task completion.
System	FMA-NCI		FMA-SNOMED		SNOMED-NCI		Average	# Tasks
System	Task 1	Task 2	Task 3	Task 4	Task 5	Task 6	Average	# Tasks
LogMapLite	16	213	36	419	212	427	221	6
RSDLWB	17	211	36	413	221	436	222	6
AML	36	262	79	509	470	584	323	6
XMAP	26	302	46	698	394	905	395	6
XMAP-BK *	31	337	49	782	396	925	420	6
LogMap	25	265	78	768	410	1,062	435	6
LogMapC	106	569	156	1,195	3,039	3,553	1,436	6
LogMapBio	1,053	1,581	1,204	3,248	3,298	3,327	2,285	6
ServOMBI	234	-	532	-	-	-	383	2
CroMatcher	2,248	-	13,057	-	-	-	7,653	2
Lily	740	-	-	-	-	-	740	1
DKP-AOM	1,491	-	-	-	-	-	1,491	1
DKP-AOM-Lite	1,579	-	-	-	-	-	1,579	1
# Systems	13	8	10	8	8	8	1,353	55

Results of the FMA-NCI matching problem

The following tables summarize the results for the tasks in the FMA-NCI matching problem.

XMAP-BK and AML provided the best results in terms of F-measure in Task 1 and Task 2. Note that, the use of background knowledge based on the UML-Metathesaurus has an important impact in the performance of XMAP-BK. LogMapBio improves LogMap's recall in both tasks, however precision is damaged specially in Task 2.

Note that efficiency in Task 2 has decreased with respect to Task 1. This is mostly due to the fact that larger ontologies also involves more possible candidate alignments and it is harder to keep high precision values without damaging recall, and vice versa. Furthermore, ServOMBI, CroMacther, LiLy, DKP-AOM-Lite and DKP-AOM could not complete Task 2.

* Uses background knowledge based on the UMLS-Metathesaurus as the LargeBio reference alignments.

Task 1: FMA-NCI small fragments

**Table 2:** Results for the largebio task 1.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
XMAP-BK *	31	2,714	0.971	0.902	0.935	2,319	22.6%
AML	36	2,690	0.960	0.899	0.928	2	0.019%
LogMap	25	2,747	0.949	0.901	0.924	2	0.019%
LogMapBio	1,053	2,866	0.926	0.917	0.921	2	0.019%
LogMapLite	16	2,483	0.967	0.819	0.887	2,045	19.9%
ServOMBI	234	2,420	0.970	0.806	0.881	3,216	31.3%
XMAP	26	2,376	0.970	0.784	0.867	2,219	21.6%
LogMapC	106	2,110	0.963	0.710	0.817	2	0.019%
Average	584	2,516	0.854	0.733	0.777	2,497	24.3%
Lily	740	3,374	0.602	0.720	0.656	9,279	90.2%
DKP-AOM-Lite	1,579	2,665	0.640	0.603	0.621	2,139	20.8%
DKP-AOM	1,491	2,501	0.653	0.575	0.611	1,921	18.7%
CroMatcher	2,248	2,806	0.570	0.570	0.570	9,301	90.3%
RSDLWB	17	961	0.964	0.321	0.482	25	0.2%

Task 2: FMA-NCI whole ontologies

**Table 3:** Results for the largebio task 2.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
XMAP-BK *	337	2,802	0.872	0.849	0.860	1,222	0.8%
AML	262	2,931	0.832	0.856	0.844	10	0.007%
LogMap	265	2,693	0.854	0.802	0.827	9	0.006%
LogMapBio	1,581	3,127	0.773	0.848	0.809	9	0.006%
XMAP	302	2,478	0.866	0.743	0.800	1,124	0.8%
Average	467	2,588	0.818	0.735	0.759	3,742	2.6%
LogMapC	569	2,108	0.879	0.653	0.750	9	0.006%
LogMapLite	213	3,477	0.673	0.820	0.739	26,478	18.1%
RSDLWB	211	1,094	0.798	0.307	0.443	1,082	0.7%

Results of the FMA-SNOMED matching problem

The following tables summarize the results for the tasks in the FMA-SNOMED matching problem.

XMAP-BK provided the best results in terms of both Recall and F-measure in Task 3 and Task 4. Precision of XMAP-BK in Task 2 was lower than the other top systems but Recall was much higher than the others.

As in the FMA-NCI tasks, the use of the UMLS-Metathesaurus in XMAP-BK has an important impact.

Overall, the results were less positive than in the FMA-NCI matching problem. As in the FMA-NCI matching problem, efficiency also decreases as the ontology size increases. The most important variations were suffered by LogMapBio and XMap in terms of precision. Furthermore, LiLy, DKP-AOM-Lite and DKP-AOM could not complete neither Task 3 nor Task 4, while ServOMBI and CroMatcher could not complete Task 4 within the permitted time.

* Uses background knowledge based on the UMLS-Metathesaurus as the LargeBio reference alignments.

Task 3: FMA-SNOMED small fragments

**Table 4:** Results for the largebio task 3.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
XMAP-BK *	49	7,920	0.968	0.847	0.903	12,848	54.4%
AML	79	6,791	0.926	0.742	0.824	0	0.000%
LogMapBio	1,204	6,485	0.935	0.700	0.801	1	0.004%
LogMap	78	6,282	0.948	0.690	0.799	1	0.004%
ServOMBI	532	6,329	0.960	0.664	0.785	12,155	51.5%
XMAP	46	6,133	0.958	0.647	0.772	12,368	52.4%
Average	1,527	5,328	0.919	0.561	0.664	5,902	25.0%
LogMapC	156	4,535	0.956	0.505	0.661	0	0.000%
CroMatcher	13,057	6,232	0.586	0.479	0.527	20,609	87.1%
LogMapLite	36	1,644	0.968	0.209	0.343	771	3.3%
RSDLWB	36	933	0.980	0.128	0.226	271	1.1%

Task 4: FMA whole ontology with SNOMED large fragment

**Table 5:** Results for the largebio task 4.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
XMAP-BK *	782	9,243	0.769	0.844	0.805	44,019	21.8%
AML	509	6,228	0.889	0.650	0.751	0	0.000%
LogMap	768	6,281	0.839	0.634	0.722	0	0.000%
LogMapBio	3,248	6,869	0.776	0.650	0.707	0	0.000%
XMAP	698	7,061	0.720	0.609	0.660	40,056	19.9%
LogMapC	1,195	4,693	0.852	0.479	0.613	98	0.049%
Average	1,004	5,395	0.829	0.525	0.602	11,157	5.5%
LogMapLite	419	1,822	0.852	0.209	0.335	4,389	2.2%
RSDLWB	413	968	0.933	0.127	0.224	698	0.3%

Results of the SNOMED-NCI matching problem

The following tables summarize the results for the tasks in the SNOMED-NCI matching problem.

AML provided the best results in terms of both Recall and F-measure in Task 5 and 6, while RSDLWB and XMAP provided the best results in terms of precision in Task 5 and 6, respectively.

Unlike in the FMA-NCI and FMA-SNOMED mathcing problems, the use of the UML-Metathesaurus did not have an impact in the performance of XMAP-BK, which obtained almost identical results as XMAP.

As in the previous matching problems, efficiency decreases as the ontology size increases. Furthermore, LiLy, DKP-AOM-Lite, DKP-AOM, ServOMBI and CroMatcher could not complete neither Task 5 nor Task 6 in less than 12 hours.

* Uses background knowledge based on the UMLS-Metathesaurus as the LargeBio reference alignments.

Task 5: SNOMED-NCI small fragments

**Table 6:** Results for the largebio task 5.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
AML	470	14,141	0.917	0.724	0.809	≥0	≥0.000%
LogMapBio	3,298	12,855	0.940	0.674	0.785	≥0	≥0.000%
LogMap	410	12,384	0.958	0.663	0.783	≥0	≥0.000%
XMAP-BK *	396	11,674	0.928	0.606	0.733	≥1	≥0.001%
XMAP	394	11,674	0.928	0.606	0.733	≥1	≥0.001%
LogMapLite	212	10,942	0.949	0.567	0.710	≥60,450	≥80.4%
Average	1,055	11,092	0.938	0.577	0.703	12,262	16.3%
LogMapC	3,039	9,975	0.914	0.510	0.655	≥0	≥0.000%
RSDLWB	221	5,096	0.967	0.267	0.418	≥37,647	≥50.0%

Task 6: NCI whole ontology with SNOMED large fragment

**Table 7:** Results for the largebio task 6.
System	Time (s)	# Mappings	Scores			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Unsat.	Degree
AML	584	12,821	0.904	0.650	0.756	≥2	≥0.001%
LogMapBio	3,327	12,745	0.853	0.609	0.711	≥4	≥0.002%
LogMap	1,062	12,222	0.870	0.596	0.708	≥4	≥0.002%
XMAP-BK *	925	10,454	0.913	0.536	0.675	≥0	≥0.000%
XMAP	905	10,454	0.913	0.535	0.675	≥0	≥0.000%
LogMapLite	427	12,894	0.797	0.567	0.663	≥150,656	≥79.5%
Average	1,402	10,764	0.878	0.526	0.649	29,971	15.8%
LogMapC	3,553	9,100	0.882	0.450	0.596	≥2	≥0.001%
RSDLWB	436	5,427	0.894	0.265	0.408	≥89,106	≥47.0%

Summary Results (top systems)

The following table summarises the results for the systems that completed all 6 tasks of the Large BioMed Track. The table shows the total time in seconds to complete all tasks and averages for Precision, Recall, F-measure and Incoherence degree. The systems have been ordered according to the average F-measure and Incoherence degree.

AML and XMAP-BK were a step ahead and obtained the best average Recall and F-measure.

RSDLWB and LogMapC were the best systems in terms of precision.

Regarding mapping incoherence, AML and LogMap variants (excluding LogMapLite) computed mapping sets leading to very small number of unsatisfiable classes.

Finally, LogMapLt and RSDLWB were the fastest system. Total computation times were slightly higher this year than previous years due to the (extra) overload of downloading the ontologies from the new SEALS repository.

* Uses background knowledge based on the UMLS-Metathesaurus as the LargeBio reference alignments.

System	Total Time (s)	Average
System	Total Time (s)	Precision	Recall	F-measure	Incoherence
AML	1,940	0.905	0.754	0.819	0.0046%
XMAP-BK *	2,520	0.904	0.764	0.819	16.6%
LogMap	2,608	0.903	0.714	0.794	0.0053%
LogMapBio	13,711	0.867	0.733	0.789	0.0053%
XMAP	2,371	0.892	0.654	0.751	15.8%
LogMapC	8,618	0.907	0.551	0.682	0.0125%
LogMapLite	1,323	0.868	0.532	0.613	33.9%
RSDLWB	1,334	0.923	0.236	0.367	16.6%

6. Harmonization of the mapping outputs

7. Mapping repair evaluation

Contact

If you have any question/suggestion related to the results of this track or if you notice any kind of error (wrong numbers, incorrect information on a matching system, etc.), feel free to write an email to ernesto [at] cs [.] ox [.] ac [.] uk or ernesto [.] jimenez [.] ruiz [at] gmail [.] com

Original page: http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/ [cached: 13/05/2016]