This track consists of finding alignments between the Foundational Model of Anatomy (FMA), SNOMED CT, and the National Cancer Institute Thesaurus (NCI). These ontologies are semantically rich and contain tens of thousands of classes.
UMLS Metathesaurus has been selected as the basis for the track reference alignments (see oaei2012_umls_reference for details). UMLS is currently the most comprehensive effort for integrating independently-developed medical thesauri and ontologies, including FMA, SNOMED CT, and NCI. The integration of new UMLS sources combines automatic techniques, expert assessment, and auditing protocols.
Note that, if you are using the OWL API, the following parameter "-DentityExpansionLimit=100000000" should be given to the JVM in order to be able to load large ontologies.
The Large BioMed Track consists of three matching problems. The complete datasets for the OAEI 2012 campaign can be downloaded as a zip file.
Note that ontologies have been normalised for the OAEI, as a result the synonyms of concept names are provided as "rdfs:label" annotations.
We have split the matching problems in three tasks involving different fragments of the ontologies. The reference alignments will be the same for the three tasks, however the complexity will be different, in terms of both performance and scalability, since larger ontologies will also involve more possible candidate mappings.
The complete datasets for the OAEI 2012 campaign can be downloaded as a zip file.
It contains the three following problems (at different scales).
There are 4 reference alignments for the FMA-NCI matching tasks. Three of them are UMLS-based. The fourth has been created by harmonising the outputs of the tools participating in the OAEI 2011.5 campaign.
Required input for SEALS OMT client:
This task consists of matching two (relatively) small fragments of FMA and NCI. The FMA fragment contains 3,696 classes (5% of FMA), while the NCI fragment contains 6,488 classes (10% of NCI).
This task consists of matching two (relatively) large fragments of FMA and NCI. The FMA fragment contains 28,861 classes (37% of FMA), while the NCI fragment contains 25,591 classes (38% of NCI).
This task consists of matching the whole FMA and NCI ontologies, which contains 78,989 and 66,724 classes, respectively.
There are 3 UMLS-based reference alignments for the FMA-SNOMED matching tasks.
Required input for SEALS OMT client:
This task consists of matching two (relatively) small fragments of FMA and SNOMED. The FMA fragment contains 10,157 classes (13% of FMA), while the SNOMED fragment contains 13,412 classes (5% of SNOMED).
This task consists of matching two (relatively) large fragments of FMA and SNOMED. The FMA fragment contains 50,523 classes (64% of FMA), while the SNOMED fragment contains 122,464 classes (40% of SNOMED).
This task consists of matching the whole FMA that contains 78,989 classes with a large SNOMED fragment that contains 122,464 classes (40% of SNOMED).
There are 2 UMLS-based reference alignments for the SNOMED-NCI matching tasks. Note that, at the time of creating the datasets, we could not compute a refined UMLS alignment set with Alcomo. The new version of Alcomo, however, has shown to be able to cope with SNOMED-NCI.
Required input for SEALS OMT client:
This task consists of matching two (relatively) small fragments of SNOMED and NCI. The SNOMED fragment contains 51,128 classes (17% of SNOMED), while the NCI fragment contains 23,958 classes (36% of NCI).
This task consists of matching two (relatively) large fragments of SNOMED and NCI. The SNOMED fragment contains 122,464 classes (40% of SNOMED), while the NCI fragment contains 49,795 classes (75% of NCI).
This task consists of matching the whole NCI that contains 66,724 classes with a large SNOMED fragment that contains 122,464 classes (40% of SNOMED).
This track has two main objectives. On the one hand, it intends to evaluate the performance of matching systems when matching real large scale ontologies. On the other hand, it aims at creating an error-free "silver standard" reference alignment by "harmonising" the output of different matching and debugging systems, together with the current UMLS mapping sets. See OAEI 2011.5 harmonisation.
Regarding the use of background knowledge, the OAEI rules state that a resource (i.e. a third biomedical ontology) especially designed for the test is not allowed. Particularly, matching systems using UMLS as background knowledge will have an advantage since the reference alignment is also based on UMLS. Nevertheless, it will be interesting to evaluate the performance of a system with and without specialised background knowledge. Moreover, matching systems using UMLS may be specially helpful in the creation of the proposed "silver standard" reference alignment.
For this modality the generated alignment should be an optimal solution to the matching problem with respect to both recall and precision. In the evaluation we will focus on the F-measure. Furthermore, we also motivate the creation of an error-free output, that is, the extracted mappings together with the ontologies should not lead to (many) unsatisfiabilities.
The evaluation of Modality 1 will be run with support of SEALS. This requires that you wrap your matching system in a way that allows us to execute it on the SEALS platform.
Mapping debugging systems are also welcome to provide a revised version of the original UMLS mappings, similar to the current provided refinements.
We aim at harmonising different revised subsets of the UMLS mappings together with the outputs of the participants from Modality 1 in order to create an error-free "silver standard" reference alignment. Participant outputs will also be compared against the silver standard in order to analyse how different they are w.r.t. the other systems.
Modality 2 will be optional and will be run in an 'off-line' way.
Ernesto Jimenez-Ruiz: ernesto [at] cs [.] ox [.] ac [.] uk