Large biomedical ontology results

We have run the evaluation in a high performance server with 16 CPUs and allocating 15 Gb RAM. In total, 15 out of 23 participating systems/configurations have been able to cope with at least one of the tasks of the track matching problem. Optima and MEDLEY failed to complete the smallest task with a time out of 24 hours, while OMR, OntoK, ASE and WeSeE, threw an Exception during the matching process. CODI was evaluated in a different setting using only 7Gb and threw an exception related to insufficient memory when processing the smallest matching task. TOAST was not evaluated since it was only configured for the Anatomy track and it required a complex installation. LogMapLt, a string matcher that exploits the creation of an inverted file to efficiently compute correspondences, has been used as baseline.

Together with Precision, Recall, F-measure and runtimes we have also evaluated the coherence of alignments. We have reported (1) number of unsatisfiabilities when reasoning (using HermiT) with the input ontologies together with the computed mappings, (2) the ratio/degree of unsatisfiable classes with respect to the size of the merged ontology (based on the Unsatisfiability Measure proposed in [1]), and (3) an approximation of the root unsatisfiability. The root unsatisfiability aims at providing a more precise amount of errors, since many of the unsatisfiabilities may be derived (i.e., a subclass of an unsatisfiable class will also be reported as unsatisfiable). The provided approximation is based on LogMap's (incomplete) repair facility and shows the number of classes that this facility needed to repair in order to solve (most of) the unsatisfiabilities [2].

Precision, recall and F-measure have been computed with respect to the available UMLS based alignments. Systems have been ordered in terms of the average F-measure.

Note that GOMMA has also been evaluated with a configuration that exploits specialised background knowledge (GOMMA-bk). The background knowledge of GOMMA-bk involves the application of mapping composition techniques and the reuse of mappings from FMA-UMLS and NCI-UMLS. LogMap, MaasMatch and YAM++ also use different kinds of background knowledge. LogMap uses normalisations and spelling variants from the UMLS Lexicon. YAM++ and MaasMatch use the general purpose background knowledge provided by WordNet.

LogMap has also been evaluated with two configurations. LogMap's default algorithm computes an estimation of the overlapping between the input ontologies before the matching process, while LogMap-noe has this feature deactivated.

The error-free "Large BioMed 2012 silver standard" reference alignment computed by "harmonising" the output of the participating matching systems will be available soon. We will also perform a debugging of all mapping outputs using Alcomo [3] and LogMap's repair facility [2].

Results OAEI 2012 FMA-NCI matching problem

FMA-NCI small fragments

This year we obtained very high level participation and 11 systems/configurations obtained, on average, an F-measure over 0.80 for the matching problem involving the small fragments of FMA and NCI. GOMMA-bk obtained the best results in terms of both recall and F-measure while ServOMap provided the most precise alignments. LogMap and LogMap-noe provided the same results since the input ontologies are already small fragments of FMA and NCI and thus, the overlapping estimation performed by LogMap did not have any impact. In general, as expected, precision increases when comparing against the original UMLS mapping set, while recall decreases.

Our baseline provided very good results in terms of F-measure and outperformed 8 of the participating systems. MaasMatch and Hertuda provided competitive results in terms of recall, but the low precision damaged the final F-measure. MapSSS and AUTOMSv2 provided a set of mappings with high precision, however, the F-measure was damaged due to the low recall of their mappings.

The runtimes were very positive in general and 8 systems completed the task in less than 2 minutes. MapSSS required less than 10 minutes, while Hertuda and HotMatch required around 1 hour. Finally, MaasMatch, AUTOMSv2 and Wmatch needed 8, 17 and 18 hours to complete the task, respectively.

Regarding mapping coherence, only LogMap (with its two variants) generates an almost clean output. In the table, we can appreciate that even the most precise mappings (ServOMap or YAM++) lead to a huge amount of unsatisfiable classes when reasoning together with the input ontologies; and thus, it proves the importance of using techniques to assess the coherence of the generated alignments. Unfortunately, LogMap and CODI are the unique systems (participating in the OAEI 2012) that have shown to use such techniques.

System	Time (s)	# Mappings	Original UMLS			Refined UMLS (LogMap)			Refined UMLS (Alcomo)			Average			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	All Unsat.	Degree	Root Unsat.
GOMMA-Bk	26	2,843	0.961	0.903	0.931	0.932	0.914	0.923	0.914	0.922	0.918	0.936	0.913	0.924	6,204	60.92%	193
YAM++	78	2,614	0.980	0.848	0.909	0.959	0.865	0.910	0.933	0.866	0.898	0.958	0.859	0.906	2,352	23.10%	92
LogMap/LogMap-noe	18	2,740	0.952	0.863	0.905	0.934	0.883	0.908	0.908	0.883	0.895	0.932	0.876	0.903	2	0,02%	0
GOMMA	26	2,626	0.973	0.845	0.904	0.945	0.856	0.898	0.928	0.865	0.896	0.949	0.855	0.900	2,130	20.92%	127
ServOMapL	20	2,468	0.988	0.806	0.888	0.964	0.821	0.887	0.936	0.819	0.873	0.962	0.815	0.883	5,778	56.74%	79
LogMapLt	8	2,483	0.969	0.796	0.874	0.942	0.807	0.869	0.924	0.814	0.866	0.945	0.806	0.870	2,104	20.66%	116
ServOMap	25	2,300	0.990	0.753	0.855	0.969	0.769	0.857	0.949	0.774	0.853	0.969	0.765	0.855	5,597	54.96%	50
HotMatch	4,271	2,280	0.971	0.732	0.835	0.951	0.748	0.838	0.947	0.766	0.847	0.957	0.749	0.840	285	2.78%	65
Wmatch	65,399	3,178	0.811	0.852	0.831	0.786	0.862	0.823	0.767	0.864	0.813	0.788	0.860	0.822	3,168	31.11%	482
AROMA	63	2,571	0.876	0.745	0.805	0.854	0.758	0.803	0.837	0.764	0.799	0.856	0.756	0.803	7,196	70.66%	421
Hertuda	3,327	4,309	0.598	0.852	0.703	0.578	0.860	0.691	0.564	0.862	0.682	0.580	0.858	0.692	2,675	26.27%	277
MaasMatch	27,157	3,696	0.622	0.765	0.686	0.606	0.778	0.681	0.597	0.788	0.679	0.608	0.777	0.682	9,598	94.25%	3,113
AUTOMSv2	62,407	1,809	0.821	0.491	0.615	0.802	0.501	0.617	0.709	0.507	0.618	0.804	0.500	0.616	5,346	52.49%	392
MapSSS	561	1,483	0.860	0.422	0.566	0.840	0.430	0.568	0.829	0.436	0.571	0.843	0.429	0.569	565	5.55%	94

FMA-NCI big fragments

AUTOMSv2, HotMatch, Hertuda, Wmatch and MaasMatch failed to complete the task involving the big fragments of FMA and NCI after more than 24 hours of execution. Runtimes were in line with the small matching task, apart from the ones for MapSSS and AROMA which suffered an important increase.

YAM++ provided the best results in terms of F-measure, whereas GOMMA-bk and ServOMap got the best recall and precision, respectively. F-measures have decreased considerably with respect to the small matching task. This is mostly due to the fact that this matching task involves more possible candidate mappings than the previous one. Nevertheless, seven systems outperformed our baseline and provided high quality mapping sets in terms of both precision and recall. Only, MapSSS and AROMA provided worse results in terms of both precision and recall than LogMapLt.

Regarding mapping coherence, as in the previous task, only LogMap (with its two variants) generates an almost clean output where the mappings together with the input ontologies only lead to 5 unsatisfiable classes.

System	Time (s)	# Mappings	Original UMLS			Refined UMLS (LogMap)			Refined UMLS (Alcomo)			Average			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	All Unsat.	Degree	Root Unsat.
YAM++	245	2,688	0.923	0.821	0.869	0.904	0.838	0.870	0.878	0.838	0.857	0.902	0.832	0.866	22,402	35.49%	102
ServOMapL	95	2,640	0.914	0.798	0.852	0.892	0.812	0.850	0.866	0.811	0.838	0.891	0.807	0.847	22,315	35.41%	143
GOMMA	69	2,810	0.876	0.814	0.844	0.856	0.830	0.843	0.840	0.837	0.838	0.857	0.827	0.842	2,398	4.40%	116
GOMMA_Bk	83	3,116	0.832	0.857	0.844	0.814	0.875	0.843	0.796	0.880	0.836	0.814	0.871	0.841	4,609	8.46%	146
LogMap-noe	74	2,663	0.888	0.782	0.832	0.881	0.809	0.843	0.848	0.801	0.824	0.872	0.798	0.833	5	0.01%	0
LogMap	77	2,656	0.887	0.779	0.829	0.877	0.803	0.838	0.846	0.797	0.821	0.870	0.793	0.830	5	0.01%	0
ServOMap	98	2,413	0.933	0.744	0.828	0.913	0.760	0.829	0.894	0.766	0.825	0.913	0.757	0.828	21,688	34.03%	86
LogMapLt	29	3,219	0.748	0.796	0.771	0.726	0.807	0.764	0.713	0.814	0.760	0.729	0.806	0.766	12,682	23.29%	443
AROMA	7,538	3,856	0.541	0.689	0.606	0.526	0.700	0.601	0.514	0.703	0.594	0.527	0.698	0.600	20,054	24.07%	1600
MapSSS	30,575	2,584	0.392	0.335	0.362	0.384	0.342	0.362	0.377	0.345	0.360	0.384	0.341	0.361	21,893	40.21%	358
HotMatch	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Wmatch	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Hertuda	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
MaasMatch	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
AUTOMSv2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-

FMA-NCI whole ontologies

AROMA and MapSSS failed to complete the matching task involving the whole FMA and NCI ontologies in less than 24 hours.

As in the previous task, the remaining 7 matching systems generated high quality mapping sets. YAM++ provided the best results in terms of F-measure, whereas GOMMA-bk and ServOMap got the best recall and precision, respectively. LogMap with its two configurations provided an almost clean output and only 9 classes where unsatisfiable after reasoning with the input ontologies and the computed mappings.

Runtimes were also very positive. YAM++ was slightly slower than the other systems, which gave the outputs in less than 5 minutes, and required around 20 minutes to complete the task.

System	Time (s)	# Mappings	Original UMLS			Refined UMLS (LogMap)			Refined UMLS (Alcomo)			Average			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	All Unsat.	Degree	Root Unsat.
YAM++	1,304	2,738	0.907	0.821	0.862	0.887	0.838	0.862	0.862	0.838	0.850	0.885	0.832	0.858	50,550	28.56%	141
GOMMA	217	2,843	0.865	0.813	0.839	0.846	0.830	0.837	0.829	0.836	0.833	0.847	0.826	0.836	5,574	3.83%	139
ServOMapL	251	2,700	0.891	0.796	0.841	0.869	0.810	0.839	0.844	0.808	0.826	0.868	0.805	0.835	50,334	28.48%	164
GOMMA_Bk	231	3,165	0.818	0.856	0.837	0.800	0.874	0.836	0.783	0.879	0.828	0.801	0.870	0.834	12,939	8.88%	245
LogMap-noe	206	2,646	0.882	0.771	0.823	0.875	0.799	0.835	0.842	0.790	0.815	0.866	0.787	0.825	9	0.01%	0
LogMap	131	2,652	0.875	0.768	0.818	0.868	0.795	0.830	0.836	0.786	0.810	0.860	0.783	0.819	9	0.01%	0
ServOMap	204	2,465	0.912	0.743	0.819	0.892	0.759	0.820	0.873	0.764	0.815	0.892	0.755	0.818	48,743	27.31%	114
LogMapLt	55	3,466	0.695	0.796	0.742	0.675	0.807	0.735	0.662	0.814	0.730	0.677	0.806	0.736	26,429	8.68%	778
AROMA	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
MapSSS	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
HotMatch	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Wmatch	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Hertuda	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
MaasMatch	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
AUTOMSv2	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-

Results OAEI 2012 FMA-SNOMED matching problem

As it is depicted in the following tables, the FMA-SNOMED matching problem was harder than the FMA-NCI problem both in size and in complexity. Thus, matching systems required more time to complete the task and provided, in general, worse results in terms of F-measure. Furthermore, MaasMatch, Wmatch and AUTOMSv2, which were able to complete the small FMA-NCI task, failed to complete the small FMA-SNOMED task in less than 24 hours.

FMA-SNOMED small fragments

Six systems provided an on average an F-measure greater than 0.75. However, the other 6 systems that completed the task (including our baseline) failed to provide a recall higher than 0.4. GOMMA-bk provided the best results in terms of both recall and F-measure, while the baseline LogMapLt provided the best precision closely followed by ServOMapL. GOMMA-bk is a bit ahead with respect the other systems since managed to provide a mapping set with very high recall. The use of background knowledge was key in this matching task.

As in the FMA-NCI matching problem, precision tend to increase when comparing against the original UMLS mapping set, while recall decreases.

The runtimes were also very positive in general and 8 systems completed the task in less than 6 minutes. MapSSS required almost 1 hour, while Hertuda, HotMatch and AROMA needed 5, 9 and 14 hours to complete the task, respectively.

LogMap, unlike LogMap-noe, failed to detect and repair two unsatisfiable classes since they were outside the computed ontology fragments (overlapping). The rest of the systems, even when providing highly precise mappings like ServOMapL, generated mapping sets with a high incoherence degree.

System	Time (s)	# Mappings	Original UMLS			Refined UMLS (LogMap)			Refined UMLS (Alcomo)			Average			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	All Unsat.	Degree	Root Unsat.
GOMMA_Bk	148	8,598	0.958	0.914	0.935	0.860	0.912	0.885	0.862	0.912	0.886	0.893	0.913	0.903	13,685	58.06%	4,674
ServOMapL	39	6,346	0.985	0.694	0.814	0.884	0.691	0.776	0.892	0.696	0.782	0.920	0.694	0.791	10,584	44.91%	3,056
YAM++	326	6,421	0.972	0.693	0.809	0.870	0.688	0.769	0.879	0.694	0.776	0.907	0.692	0.785	14,534	61.67%	3,150
LogMap-noe	63	6,363	0.964	0.681	0.799	0.877	0.688	0.771	0.889	0.696	0.781	0.910	0.688	0.784	0	0%	0
LogMap	65	6,164	0.965	0.660	0.784	0.876	0.666	0.756	0.889	0.674	0.767	0.910	0.667	0.769	2	0.01%	2
ServOMap	46	6,008	0.985	0.657	0.788	0.880	0.652	0.749	0.888	0.656	0.755	0.918	0.655	0.764	8,165	34.64%	2,721
GOMMA	54	3,667	0.926	0.377	0.536	0.834	0.377	0.520	0.865	0.390	0.538	0.875	0.381	0.531	2,058	8.73%	206
MapSSS	3,129	3,458	0.798	0.306	0.442	0.719	0.307	0.430	0.737	0.313	0.440	0.751	0.309	0.438	9,084	38.54%	389
AROMA	51,191	5,227	0.555	0.322	0.407	0.507	0.327	0.397	0.519	0.333	0.406	0.527	0.327	0.404	21,083	89.45%	2,296
HotMatch	31,718	2,139	0.875	0.208	0.336	0.812	0.214	0.339	0.842	0.222	0.351	0.843	0.214	0.342	907	3.85%	104
LogMapLt	14	1,645	0.975	0.178	0.301	0.902	0.183	0.304	0.936	0.189	0.315	0.938	0.183	0.307	773	3.28%	21
Hertuda	17,625	3,051	0.578	0.196	0.292	0.533	0.201	0.292	0.555	0.208	0.303	0.555	0.201	0.296	1,020	4.33%	47

FMA-SNOMED big fragments

MapSSS, HotMatch and Hertuda failed to complete the task involving the big fragments of FMA and SNOMED after more than 24 hours of execution.

ServOMapL provided the best results in terms of F-measure and precision, whereas GOMMA-bk got the best recall. As in the FMA-NCI matching task involving big fragments, the F-measures suffered, in general, a decrease with respect to the small matching task. The most important variations were suffered by GOMMA-bk and GOMMA where their average precision decreased from 0.893 and 0.875 to 0.571 and 0.389, respectively. This is an interesting fact, since the background knowledge used by GOMMA-bk could not avoid the decrease in precision while keeping a high recall. Furthermore, runtimes were from 4 to 10 times higher for all the systems, with the exception of AROMA's runtime that increased from 14 to 17 hours.

LogMap (with its two variants) generated a clean output where the mappings together with the input ontologies did not lead to any unsatisfiable class.

System	Time (s)	# Mappings	Original UMLS			Refined UMLS (LogMap)			Refined UMLS (Alcomo)			Average			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	All Unsat.	Degree	Root Unsat.
ServOMapL	234	6,563	0.945	0.689	0.797	0.847	0.686	0.758	0.857	0.692	0.766	0.883	0.689	0.774	55,970	32.36%	1,192
ServOMap	315	6,272	0.941	0.655	0.773	0.841	0.650	0.734	0.849	0.655	0.740	0.877	0.654	0.749	143,316	82.85%	1,320
YAM++	3,780	7,003	0.879	0.684	0.769	0.787	0.679	0.729	0.797	0.686	0.737	0.821	0.683	0.746	69,345	40.09%	1,360
LogMap-noe	521	6,450	0.886	0.635	0.740	0.805	0.640	0.713	0.821	0.651	0.726	0.837	0.642	0.727	0	0%	0
LogMap	484	6,292	0.883	0.617	0.726	0.800	0.621	0.699	0.815	0.631	0.711	0.833	0.623	0.712	0	0%	0
GOMMA_Bk	636	12,614	0.613	0.858	0.715	0.548	0.852	0.667	0.551	0.855	0.670	0.571	0.855	0.684	75,910	43.88%	3,344
GOMMA	437	5,591	0.412	0.256	0.316	0.370	0.255	0.302	0.386	0.265	0.314	0.389	0.259	0.311	7,343	4.25%	480
AROMA	62,801	2,497	0.684	0.190	0.297	0.638	0.197	0.300	0.660	0.203	0.310	0.661	0.196	0.303	54,459	31.48%	271
LogMapLt	96	1,819	0.882	0.178	0.296	0.816	0.183	0.299	0.846	0.189	0.309	0.848	0.183	0.302	2,994	1.73%	24
MapSSS	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
HotMatch	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Hertuda	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-

FMA-SNOMED whole ontologies

AROMA failed to complete the matching task involving the whole FMA and SNOMED ontologies in less than 24 hours.

The results in terms of both precision and recall did not suffer important changes and, as in the previous task, ServOMapL provided the best results in terms of F-measure and precision while GOMMA-bk got the best recall.

Runtimes for ServOMap, ServOMapL LogMapLt and LogMap (with its two variations) were in line with the previous matching task; the computation times for GOMMA, GOMMA-bk and YAM++, however, suffered and important increase. GOMMA (with its two variations) required more than 30 minutes, while YAM++ required more than 6 hours.

LogMap and LogMap-noe mappings, as in previous tasks, had a very low incoherence degree.

System	Time (s)	# Mappings	Original UMLS			Refined UMLS (LogMap)			Refined UMLS (Alcomo)			Average			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	All Unsat.	Degree	Root Unsat.
ServOMapL	517	6,605	0.939	0.688	0.794	0.842	0.686	0.756	0.851	0.691	0.763	0.877	0.688	0.772	99,726	25.86%	2,862
ServOMap	532	6,320	0.933	0.655	0.770	0.835	0.650	0.731	0.842	0.655	0.737	0.870	0.653	0.746	273,242	70.87%	2,617
YAM++	23,900	7,044	0.872	0.682	0.765	0.780	0.678	0.725	0.791	0.685	0.734	0.814	0.681	0.742	106,107	27.52%	3,393
LogMap	612	6,312	0.877	0.615	0.723	0.795	0.619	0.696	0.811	0.629	0.708	0.828	0.621	0.710	10	0.003%	0
LogMap-noe	791	6,406	0.866	0.616	0.720	0.782	0.617	0.690	0.801	0.631	0.706	0.816	0.621	0.706	10	0.003%	0
GOMMA_Bk	1,893	12,829	0.602	0.858	0.708	0.538	0.852	0.660	0.542	0.855	0.663	0.561	0.855	0.677	119,657	31.03%	5,289
LogMapLt	171	1,823	0.880	0.178	0.296	0.814	0.183	0.299	0.844	0.189	0.309	0.846	0.183	0.301	4,938	1.28%	37
GOMMA	1,994	5,823	0.370	0.239	0.291	0.332	0.239	0.278	0.347	0.248	0.289	0.350	0.242	0.286	10,752	2.79%	609
AROMA	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
MapSSS	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
HotMatch	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Hertuda	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-

Results OAEI 2012 SNOMED-NCI matching problem

The matching outputs in the SNOMED-NCI matching problem have only been compared against the original UMLS mapping and the refined subset computed by LogMap's repair facility. We could not compute a refined UMLS alignment set with Alcomo debugging system since, at the time of creating the datasets, it could not cope with the integration of SNOMED and NCI via mappings. The new version of Alcomo, however, has shown to be able to provide such refined set.

The satisfiability results, since currently no OWL 2 reasoner has shown to cope with the integration of SNOMED and NCI via mappings [url], have been estimated using the Dowling-Gallier algorithm [url] for propositional Horn satisfiability (implemented in LogMap's repair facility).

SNOMED-NCI small fragments

The SNOMED-NCI matching problem moves to a next level of difficulty with respect to the FMA-SNOMED matching problem and, in general, runtimes and results are slightly worse. Furthermore, Hertuda and HotMatch, which were able to complete the small FMA-NCI and the small FMA-SNOMED tasks, failed to complete the small SNOMED-NCI task in less than 24 hours.

Six systems provided an F-measure higher than our baseline LogMapLt and their F-measures were very close to each other. On the other hand, GOMMA, MapSSS and AROMA failed to top LogMapLt results. LogMap-noe provided the best results in terms of recall and F-measure while ServOMap generated the most precise mappings.

As in the FMA-NCI and FMA-SNOMED matching problems, precision tend to increase when comparing against the original UMLS mapping set, while recall decreases.

The runtimes were also positive in general and 7 systems completed the task in less than 4 minutes. YAM++ required more than 30 minutes, while AROMA and MapSSS needed 4 and 8 hours to complete the task, respectively.

LogMap (with its two variants) generated a set of output mappings that did not lead to any unsatisfiable class when reasoning (using Dowling-Gallier algorithm) together with the input ontologies. The rest of the systems generated mapping sets that lead to a degree of incoherence greater than 50%.

System	Time (s)	# Mappings	Original UMLS			Refined UMLS (LogMap)			Average			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	All Unsat.	Degree	Root Unsat.
LogMap-noe	211	13,525	0.897	0.644	0.750	0.893	0.659	0.758	0.895	0.652	0.754	0	0%	0
LogMap	221	13,454	0.899	0.642	0.749	0.895	0.657	0.758	0.897	0.649	0.753	0	0%	0
GOMMA_Bk	226	12,294	0.946	0.617	0.747	0.931	0.625	0.748	0.939	0.621	0.747	48,681	64.83%	863
YAM++	1,901	11,961	0.951	0.604	0.739	0.940	0.614	0.743	0.946	0.609	0.741	50,089	66.71%	471
ServOMapL	147	11,730	0.960	0.598	0.737	0.947	0.606	0.739	0.954	0.602	0.738	62,367	83.06%	657
ServOMap	153	10,829	0.972	0.558	0.709	0.959	0.567	0.713	0.965	0.563	0.711	51,020	67.95%	467
LogMapLt	54	10,947	0.953	0.554	0.700	0.938	0.560	0.701	0.945	0.557	0.701	61,269	81.60%	801
GOMMA	197	10,555	0.948	0.531	0.680	0.931	0.536	0.680	0.939	0.533	0.680	42,813	57.02%	851
AROMA	15,624	11,783	0.861	0.538	0.662	0.848	0.545	0.664	0.854	0.542	0.663	70,491	93.88%	1,286
MapSSS	27,381	9,608	0.795	0.405	0.537	0.783	0.411	0.539	0.789	0.408	0.538	46,083	61.37%	794

SNOMED-NCI big fragments

MapSSS and AROMA failed to complete the task involving the big fragments of FMA and SNOMED after more than 24 hours of execution.

There were not big differences, in general, in terms of F-measure with respect to the small SNOMED-NCI task. Only LogMap decreased their recall and lost its second position and GOMMA-bk generated less precise mappings and was relegated to the sixth position. As in previous task, LogMap-noe provided the best results in terms of recall and F-measure while ServOMap generated the most precise mappings.

Runtimes were between 2 and 3 orders of magnitude bigger than in the small task, but in the most of the cases the task was finished in less than 10 minutes.

Regarding mapping coherence, LogMap-noe provided a clean output while LogMap, since it computes an estimation of the overlapping (fragments) between the input ontologies, failed to detect and repair 3 unsatisfiable classes, which were outside the computed ontology fragments.

System	Time (s)	# Mappings	Original UMLS			Refined UMLS (LogMap)			Average			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	All Unsat.	Degree	Root Unsat.
LogMap-noe	575	13,184	0.882	0.617	0.726	0.877	0.631	0.734	0.879	0.624	0.730	0	0%	0
YAM++	6,127	13,083	0.864	0.600	0.708	0.854	0.610	0.712	0.859	0.605	0.710	104,492	60.66%	618
ServOMapL	363	12,784	0.870	0.590	0.703	0.858	0.599	0.705	0.864	0.594	0.704	136,909	79.48%	1,101
LogMap	514	12,142	0.877	0.565	0.687	0.872	0.578	0.695	0.874	0.571	0.691	3	0.002%	2
ServOMap	282	11,632	0.896	0.553	0.684	0.885	0.562	0.687	0.891	0.558	0.686	110,253	64.00%	820
GOMMA_Bk	638	15,644	0.730	0.606	0.662	0.718	0.613	0.662	0.724	0.610	0.662	116,451	67.60%	2,741
LogMapLt	104	12,741	0.819	0.553	0.660	0.805	0.560	0.661	0.812	0.557	0.661	131,073	76.09%	2,201
GOMMA	527	12,320	0.802	0.524	0.634	0.787	0.529	0.633	0.795	0.527	0.634	96,945	56.28%	1,621
AROMA	-	-	-	-	-	-	-	-	-	-	-	-	-	-
MapSSS	-	-	-	-	-	-	-	-	-	-	-	-	-	-

SNOMED-NCI whole ontologies

The precision and recall slightly decreased in all systems and none of them could reach an F-measure of 0.7. YAM++ produced the best mapping set in terms of F-measure, while ServOMap and GOMMA-bk generated the mappings with best precision and recall, respectively. LogMap-noe lost its first position since it provided less comprehensive mappings.

ServOMap, ServOMapL and LogMapwere the fastest tools and required 11, 12 and 16 minutes respectively. GOMMA (with its two variations) required more than 30 minutes, while YAM++ required more than 8 hours.

As in previous task, LogMap-noe provided a clean output while LogMap failed to detect and repair a few unsatisfiable classes due to the computation of the overlapping between the input ontologies.

System	Time (s)	# Mappings	Original UMLS			Refined UMLS (LogMap)			Average			Incoherence Analysis
System	Time (s)	# Mappings	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure	All Unsat.	Degree	Root Unsat.
YAM++	30,155	14,103	0.794	0.594	0.680	0.785	0.604	0.683	0.790	0.599	0.681	238,593	63.91%	979
ServOMapL	738	13,964	0.796	0.590	0.678	0.785	0.598	0.679	0.791	0.594	0.678	286,790	76.82%	1,557
LogMap	955	13,011	0.816	0.564	0.667	0.812	0.577	0.674	0.814	0.570	0.671	16	0.004%	10
LogMap-noe	1,505	13,058	0.813	0.563	0.666	0.809	0.577	0.673	0.811	0.570	0.670	0	0%	0
ServOMap	654	12,462	0.835	0.552	0.664	0.824	0.560	0.667	0.829	0.556	0.666	230,055	61.63%	1,546
GOMMA_Bk	1,940	17,045	0.669	0.605	0.635	0.658	0.612	0.634	0.663	0.608	0.635	239,708	64.21%	4,297
LogMapLt	178	14,043	0.743	0.553	0.634	0.731	0.560	0.634	0.737	0.557	0.634	305,648	81.87%	3,160
GOMMA	1,820	13,693	0.720	0.523	0.606	0.707	0.528	0.605	0.714	0.526	0.606	215,959	57.85%	2,614
AROMA	-	-	-	-	-	-	-	-	-	-	-	-	-	-
MapSSS	-	-	-	-	-	-	-	-	-	-	-	-	-	-

Result summaryk (Top 8 systems)

The following table summarises the results for the 8 systems that completed all 9 tasks in the Large BioMed Track. The table shows the average precision, recall, F-measure and incoherence degree; and the total time to complete the tasks.

The systems have been ordered according to the average F-measure. YAM++ obtained the best average F-measure, GOMMA-Bk the best recall and ServOMap computed the most precise mappings. The first 6 systems obtained very close results in terms of F-measure and there were only a gap of 0.024 between the first (YAM++) and the sixth (ServOMap).

Regarding mapping incoherence, LogMap and LogMap-noe were the unique systems providing mapping sets leading to a small number of unsatisfiable classes.

Finally, LogMapLt, since it implements basic and efficient string similarity techniques, was the fastest system. The rest of the tools, apart from YAM++, were also very fast and only needed between 38 and 97 minutes to complete the tasks. YAM++ was the counterexample and required almost 19 hours to complete the nine tasks.

System	Total Time (s)	Average
System	Total Time (s)	Precision	Recall	F-measure	Incoherence
YAM++	67,817	0.876	0.710	0.782	45.30%
ServOMapL	2,405	0.890	0.699	0.780	51.46%
LogMap-noe	3,964	0.869	0.695	0.770	0.004%
GOMMA_Bk	5,821	0.767	0.791	0.768	45.32%
LogMap	3,077	0.869	0.684	0.762	0.006%
ServOMap	2,310	0.903	0.657	0.758	55.36%
GOMMA	5,341	0.746	0.553	0.625	24.01%
LogMapLt	711	0.831	0.515	0.586	33.17%

Harmonization of the mapping outputs

Mapping repair evaluation

References

[1] Christian Meilicke and Heiner Stuckenschmidt. Incoherence as a basis for measuring the quality of ontology mappings. In Proc. of 3rd International Workshop on Ontology Matching (OM), 2008. [url]

[2] Ernesto Jimenez-Ruiz and Bernardo Cuenca Grau. LogMap: Logic-based and scalable ontology matching. In Proc. of 10th International Semantic Web Conference (ISWC), 2011. [url]

[3] Christian Meilicke. Alignment Incoherence in Ontology Matching. University of Mannheim, Chair of Artificial Intelligence (2011) [url]

Contact

This track is organised by Ernesto Jimenez Ruiz, Bernardo Cuenca Grau and Ian Horrocks, and supported by the SEALS and LogMap projects. If you have any question/suggestion related to the results of this track, feel free to write an email to ernesto [at] cs [.] ox [.] ac [.] uk or ernesto [.] jimenez [.] ruiz [at] gmail [.] com

Original page: http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2012/results2012.html [cached: 24/06/2014]