Results OAEI 2018::Disease and Phenotype Track

Contact

If you have any question/suggestion related to the results of this track or if you notice any kind of error (wrong numbers, incorrect information on a matching system, etc.), feel free to write an email to ernesto [.] jimenez [.] ruiz [at] gmail [.] com or ianharrowconsulting [at] gmail [dot] com

Evaluation setting

We have run the evaluation in a Ubuntu 18 Laptop with an Intel Core i9-8950HK CPU @ 2.90GHz x 12 and allocating 25Gb of RAM.

Systems have been evaluated according to the following criteria:

Precision and Recall with respect to a voted reference alignment or consensus alignment that has been automatically built by merging/voting the outputs of the participating systems in 2016, 2017 and 2018. We have used vote=3.

We have used the OWL 2 reasoner HermiT to compute the number of unsatisfiable classes.

Check out the supporting scripts to reproduce the evaluation: https://github.com/ernestojimenezruiz/oaei-evaluation

Participation and success

In the OAEI 2018 phenotype track 9 out of 18 participating OAEI 2018 systems have been able to complete at least one of the tasks with a 6 hours timeout (see Table 1).

**Table 1:** System runtimes (s) and task completion.
System	HP-MP	DOID-ORDO	Average	# Tasks
LogMapLt	7	7	7	2
XMap	15	19	17	2
DOME	46	10	28	2
LogMap	31	25	28	2
AML	70	136	103	2
LogMapBio	821	1,891	1,356	2
POMAP++	1,668	2,265	1,967	2
Lily	4,749	2,847	3,798	2
KEPLER	-	2,746	2,746	1
# Systems	8	9	1,117	17

Use of background knowledge

LogMapBio uses BioPortal as mediating ontology provider, that is, it retrieves from BioPortal the most suitable top-10 ontologies for the matching task.

LogMap uses normalisations and spelling variants from the general (biomedical) purpose SPECIALIST Lexicon.

AML has three sources of background knowledge which can be used as mediators between the input ontologies: the Uber Anatomy Ontology (Uberon), the Human Disease Ontology (DOID) and the Medical Subject Headings (MeSH).

XMAP and Lily use a dictionary of synonyms (pre)extracted from the UMLS Metathesaurus . In addition Lily also uses a dictionary of synonyms (pre)extracted from BioPortal.

Results against the consensus alignments with vote 3

Tables 2 and 3 show the results achieved by each of the participating systems against the consensus alignment with vote=3. Note that systems participating with different variants only contributed once in the voting, that is, the voting was done by family of systems/variants rather than by individual systems.

Since the consensus alignments only allow us to assess how systems perform in comparison with one another the proposed ranking is only a reference. Note that, one one hand, some of the mappings in the consensus alignment may be erroneous (false positives), as all it takes for that is that 3 systems agree on part of the erroneous mappings they find. On the other hand, the consensus alignments are not complete, as there will likely be correct mappings that no system is able to find, and there are a number of mappings found by only one system (and therefore not in the consensus alignments) which may be correct.

Nevertheless, the results with respect to the consensus alignments do provide some insights into the performance of the systems. For example, LogMap is the system that provides the closest set of mappings to the consensus with vote=3 (not necessarily the best system), while AML outputs a large set of unique mappings, that is, mappings that are not proposed by any other system. LogMap has a small set of unique mappings as most of its mappings are also suggested by its variant LogMapBio and viceversa.

HP-MP task

**Table 2:** Results for the HP-MP.
System	Time (s)	# Mappings	# Unique	Scores			Incoherence Analysis
System	Time (s)	# Mappings	# Unique	Precision	Recall	F-measure	Unsat.	Degree
LogMap	31	2,130	1	0.875	0.835	0.855	0	0%
LogMapBio	821	2,178	37	0.862	0.841	0.851	0	0%
AML	70	2,010	279	0.889	0.801	0.843	0	0%
LogMapLt	7	1,370	3	0.993	0.609	0.755	0	0%
POMAP++	1,668	1,502	214	0.855	0.575	0.688	0	0%
Lily	4,749	2,118	733	0.682	0.647	0.664	0	0%
XMap	20	704	2	0.994	0.314	0.477	0	0%
DOME	46	689	0	0.997	0.308	0.471	0	0%

DOID-ORDO task

**Table 3:** Results for the DOID-ORDO task.
System	Time (s)	# Mappings	# Unique	Scores			Incoherence Analysis
System	Time (s)	# Mappings	# Unique	Precision	Recall	F-measure	Unsat.	Degree
LogMap	25	2,323	0	0.937	0.775	0.848	0	0%
LogMapBio	1,891	2,499	91	0.898	0.799	0.846	0	0%
POMAP++	2,264	2,563	174	0.874	0.798	0.834	0	0%
LogMapLt	7	1,747	16	0.988	0.615	0.758	0	0%
XMap	15	1,587	37	0.969	0.548	0.700	0	0%
KEPLER	2,746	1,824	158	0.883	0.573	0.695	0	0%
Lily	2,847	3,738	1,167	0.589	0.783	0.672	206	1.9%
AML	135	4,749	1,886	0.514	0.870	0.646	0	0%
DOME	10	1,232	2	0.996	0.437	0.607	0	0%

Related publications

Paper describing the experiences and results in the OAEI 2016 Disease and Phenotype track.

Ian Harrow, Ernesto Jimenez-Ruiz, Andrea Splendiani, Martin Romacker, Peter Woollard, Scott Markel, Yasmin Alam-Faruque, Martin Koch, James Malone, and Arild Waaler. Matching Disease and Phenotype Ontologies in the Ontology Alignment Evaluation Initiative. Journal of Biomedical Semantics, 2018. [pdf] [pdf@Springer] [paper@JBS]