Ontology Alignment Evaluation Initiative - OAEI-2016 Campaign

Disease and Phenotype Track

Results OAEI 2016::Disease and Phenotype Track


If you have any question/suggestion related to the results of this track or if you notice any kind of error (wrong numbers, incorrect information on a matching system, etc.), feel free to write an email to ernesto [.] jimenez [.] ruiz [at] gmail [.] com or ianharrowconsulting [at] gmail [dot] com

Evaluation setting

We have run the evaluation in a Ubuntu Laptop with an Intel Core i7-4600U CPU @ 2.10GHz x 4 and allocating 15Gb of RAM.

Systems have been evaluated according to the following criteria:

We have used the OWL 2 reasoner HermiT to calculate the semantic precision and recall. For example, a positive hit will mean that a mapping in the reference has been (explicitly) included in the output mappings or it can be inferred using reasoning with the input ontologies and the output mappings. The use of semantic values for precision and recall also allowed us to provide a fair comparison for the systems PhenoMF, PhenoMM and PhenoMP which discover many subsumption mappings that are not explicitly in the reference alignments but they may still be valid.

Participation and success

In the OAEI 2016 phenotype track 11 out of 21 participating OAEI 2016 systems have been able to cope with at least one of the tasks with a 24 hours timeout.

Use of background knowledge

LogMapBio uses BioPortal as mediating ontology provider, that is, it retrieves from BioPortal the most suitable top-10 ontologies for the matching task.

LogMap uses normalisations and spelling variants from the general (biomedical) purpose UMLS Lexicon.

AML has three sources of background knowledge which can be used as mediators between the input ontologies: the Uber Anatomy Ontology (Uberon), the Human Disease Ontology (DOID) and the Medical Subject Headings (MeSH).

XMAP uses synonyms provided by the UMLS Metathesaurus.

PhenoMM, PhenoMF and PhenoMP rely on different versions of the PhenomeNET ontology with variable complexity.


1. Results against the silver standard with vote 2 and 3

The silver standards for HP-MP with vote 2 and 3 contain 2,308 and 1,588 mappings, respectively; while for DOID-ORDO they include 1,883 and 1,617 mappings respectively. Table 1 shows the results achieved by each of the participating systems. We did not rank the systems on purpose since the silver standards only give a reference about how good you are with respect to the others. For example, 5 of the systems (the highlighted ones) produce results relatively close to the silver standards. For example, LogMap is the system closer to the mappings voted by at least 2 systems, while FCA-MAP produces results very close to the silver standard with vote 3.

Results against silver standard with vote 2 and 3
Table 1. Results against silver standard with vote 2 and 3.

2. Results against manually created mappings

The manually generated mappings for three areas (carbohydrate, obesity and breast cancer) include 29 mappings between HP and MP and 60 mappings between DOID and ORDO. Most of them representing subsumption relationships. Table 2 shows the results in terms of recall for each of the systems. PhenoMF, PhenoMP and PhenoMM achieve very good results for HP-MP since they discover a large number of subsumption mappings. However, for DOID-ORDO only LogMap, LogMapBio and DisMatch discover some of the mappings in the curated set.

Results against manually created mappings
Table 2. Recall against manually created mappings.

3. Manual assessment of unique mappings

Tables 3 and 4 show the results of the manual assessment (i.e. precision) of the unique mappings generated by the participating systems. Unique mappings are correspondences that no other system (explicitly) provided in the output. We manually evaluated up to 30 mappings and we focused the assessment on unique equivalence mappings.

For example DiSMatch output contains 291 unique mappings in the HP-MP task. The manual assessment revealed an (estimated) precision of 0.8333. In order to also take into account the number of unique mappings that a system is able to discover, Tables 3 and 4 also include the positive and negative contribution of the unique mappings with respect to the total unique mappings discovered by all participating systems.

Unique mappings in the HP-MP task.
Table 3. Unique mappings in the HP-MP task.
Unique mappings in the DOID-ORDO task.
Table 4. Unique mappings in the DOID-ORDO task.

4. Winners of the Prize! prize.

AML, FCA-Map, LogMap, LogMapBio and PhenoMF produced the most complete results according to both the automatic and manual evaluation. LogMap and LogMapBio were not selected for the prize since Ernesto is one of the developers and he is also co-organising this track.

The winners of the prize sponsored by the Pistoia Alliance were (in alphabetic order): AML, FCA-Map and PhenoMF. Each of them got $2,500, CONGRATULATIONS!