Ontology Alignment Evaluation Initiative - OAEI-2021 Campaign

Disease and Phenotype Track

Results OAEI 2021::Disease and Phenotype Track

Contact

If you have any question/suggestion related to the results of this track or if you notice any kind of error (wrong numbers, incorrect information on a matching system, etc.), feel free to write an email to ernesto [.] jimenez [.] ruiz [at] gmail [.] com or ianharrowconsulting [at] gmail [dot] com

Evaluation setting

We have run the evaluation in a Ubuntu 20 Laptop with an Intel Core i5-6300HQ CPU @ 2.30GHz × 4 and allocating 15Gb of RAM.

Systems have been evaluated according to the following criteria:

We have used the OWL 2 EL reasoner ELK to compute an approximate number of unsatisfiable classes.

Check out the supporting scripts to reproduce the evaluation: https://github.com/ernestojimenezruiz/oaei-evaluation

Participation and success

In the OAEI 2021 phenotype track 10 participating systems have been able to complete at least one of the tasks with a 8 hours timeout (see Table 1). ALOD2Vec and ATMacher could not complete the doid-ordo tasks due to a runtime error. GMap and AMD produced an "OutOfMemoryException" while Lily gave an error during the matching process.

System HP-MP DOID-ORDO Average # Tasks
KGMatcher 13 19 16 2
LogMapLt 21 27 24 2
LogMap 69 52 61 2
AML 117 231 174 2
LogMapBio 2,508 2,176 2,342 2
LSMatch 2,366 2,749 2,558 2
TOM 7,909 4,697 6,303 2
ATMatcher 28 - 28 1
Fine-TOM 306 - 306 1
ALOD2Vec 3,107 - 3,107 1
# Systems 10 7 1,492 17
Table 1: System runtimes (s) and task completion.

Use of background knowledge

LogMapBio uses BioPortal as mediating ontology provider, that is, it retrieves from BioPortal the most suitable top-10 ontologies for the matching task.

LogMap uses normalisations and spelling variants from the general (biomedical) purpose SPECIALIST Lexicon.

AML has three sources of background knowledge which can be used as mediators between the input ontologies: the Uber Anatomy Ontology (Uberon), the Human Disease Ontology (DOID) and the Medical Subject Headings (MeSH).

 

Results against the consensus alignments with vote 3

Tables 2 and 3 show the results achieved by each of the participating systems against the consensus alignment with vote=3. Note that systems participating with different variants only contributed once in the voting, that is, the voting was done by family of systems/variants rather than by individual systems.

Since the consensus alignments only allow us to assess how systems perform in comparison with one another the proposed ranking is only a reference. Note that, on one hand, some of the mappings in the consensus alignment may be erroneous (false positives), as all it takes for that is that 3 systems agree on part of the erroneous mappings they find. On the other hand, the consensus alignments are not complete, as there will likely be correct mappings that no system is able to find, and there are a number of mappings found by only one system (and therefore not in the consensus alignments) which may be correct.

Nevertheless, the results with respect to the consensus alignments do provide some insights into the performance of the systems. For example, LogMap is the system that provides the closest set of mappings to the consensus with vote=3 (not necessarily the best system), while AML outputs a large set of unique mappings, that is, mappings that are not proposed by any other system. LogMap has a small set of unique mappings as most of its mappings are also suggested by its variant LogMapBio and viceversa.

HP-MP task

System Time (s) # Mappings # Unique Scores Incoherence Analysis
Precision  Recall  F-measure Unsat. Degree
LogMap 69 2,136 5 0.900 0.749 0.818 ≥0 ≥0%
LogMapBio 2,508 2,285 125 0.857 0.763 0.807 ≥0 ≥0%
AML 117 2,029 357 0.911 0.720 0.804 ≥0 ≥0%
ATMatcher 28 769 19 0.984 0.295 0.454 ≥0 ≥0%
LogMapLt 21 725 1 0.999 0.282 0.440 ≥0 ≥0%
LSMatch 2,366 685 0 1.000 0.267 0.421 ≥0 ≥0%
Fine-TOM 306 2,997 1,148 0.111 0.130 0.120 ≥0 ≥0%
TOM 306 2,493 676 0.121 0.117 0.119 ≥0 ≥0%
ALOD2Vec 3,107 67,943 66,411 0.024 0.626 0.046 ≥0 ≥0%
KGMatcher 13 3 0 1.000 0.001 0.002 ≥0 ≥0%
Table 2: Results for the HP-MP.

DOID-ORDO task

System Time (s) # Mappings # Unique Scores Incoherence Analysis
Precision  Recall  F-measure Unsat. Degree
AML 231 4,781 2,457 0.691 0.833 0.755 ≥0 ≥0%
LogMapBio 2,176 2,684 237 0.903 0.611 0.729 ≥0 ≥0%
LogMap 52 2,287 0 0.974 0.562 0.713 ≥0 ≥0%
LogMapLt 27 1,251 5 0.995 0.314 0.477 ≥0 ≥0%
LSMatch 2,749 1,193 0 1.000 0.301 0.463 ≥0 ≥0%
KGMatcher 19 338 0 1.000 0.085 0.157 ≥0 ≥0%
TOM 21 3,191 2,683 0.169 0.136 0.151 ≥0 ≥0%
Table 3: Results for the DOID-ORDO task.