Large biomedical ontologies

General description

This track consists of finding alignments between the Foundational Model of Anatomy (FMA), SNOMED CT, and the National Cancer Institute Thesaurus (NCI). These ontologies are semantically rich and contain tens of thousands of classes.

UMLS Metathesaurus has been selected as the basis for the track reference alignments (see oaei2012_umls_reference for details). UMLS is currently the most comprehensive effort for integrating independently-developed medical thesauri and ontologies, including FMA, SNOMED CT, and NCI. The integration of new UMLS sources combines automatic techniques, expert assessment, and auditing protocols.

Data sets

Note that, if you are using the OWL API, the following parameter "-DentityExpansionLimit=100000000" should be given to the JVM in order to be able to load large ontologies.

The Large BioMed Track consists of three matching problems. The complete datasets for the OAEI 2012 campaign can be downloaded as a zip file.

Note that ontologies have been normalised for the OAEI, as a result the synonyms of concept names are provided as "rdfs:label" annotations.

We have split the matching problems in three tasks involving different fragments of the ontologies. The reference alignments will be the same for the three tasks, however the complexity will be different, in terms of both performance and scalability, since larger ontologies will also involve more possible candidate mappings.

The complete datasets for the OAEI 2012 campaign can be downloaded as a zip file.

It contains the three following problems (at different scales).

FMA-NCI matching problems

Reference alignments

There are 4 reference alignments for the FMA-NCI matching tasks. Three of them are UMLS-based. The fourth has been created by harmonising the outputs of the tools participating in the OAEI 2011.5 campaign.

Original UMLS mappings: 3,024 mappings ("=")
Refined UMLS mappings (LogMap): 2,898 mappings ("=", "<", ">")
Refined UMLS mappings (Alcomo): 2,819 mappings ("=")
Harmonised mappings OAEI 2011.5: 2,890 mappings ("="). See OAEI 2011.5 harmonisation for details.

Test Suite Information

Required input for SEALS OMT client:

Repository: http://seals-test.sti2.at/tdrs-web/
Suite-ID: cf0378d9-da30-4b58-b937-192028ed4961
Version-ID: see specific task

Task 1: FMA-NCI small fragments

This task consists of matching two (relatively) small fragments of FMA and NCI. The FMA fragment contains 3,696 classes (5% of FMA), while the NCI fragment contains 6,488 classes (10% of NCI).

Version-ID: 725ea909-bf89-432f-96cb-747ac4065a52

Task 2: FMA-NCI large fragments

This task consists of matching two (relatively) large fragments of FMA and NCI. The FMA fragment contains 28,861 classes (37% of FMA), while the NCI fragment contains 25,591 classes (38% of NCI).

Version-ID: 20d79a92-8655-4864-b9ad-8bbb84616a35

Task 3: FMA-NCI whole ontologies

This task consists of matching the whole FMA and NCI ontologies, which contains 78,989 and 66,724 classes, respectively.

Version-ID: 4ac6ed2e-8183-4daa-a0ff-bd50e6b6d307

FMA-SNOMED matching problems

Reference alignments

There are 3 UMLS-based reference alignments for the FMA-SNOMED matching tasks.

Original UMLS mappings: 9,008 mappings ("=")
Refined UMLS mappings (LogMap): 8,111 mappings ("=", "<", ">")
Refined UMLS mappings (Alcomo): 8,132 mappings ("=")

Test Suite Information

Required input for SEALS OMT client:

Repository: http://seals-test.sti2.at/tdrs-web/
Suite-ID: cf0378d9-da30-4b58-b937-192028ed4961
Version-ID: see specific task

Task 1: FMA-SNOMED small fragments

This task consists of matching two (relatively) small fragments of FMA and SNOMED. The FMA fragment contains 10,157 classes (13% of FMA), while the SNOMED fragment contains 13,412 classes (5% of SNOMED).

Version-ID: afa03f95-a864-4c58-95a1-f247011ad613

Task 2: FMA-SNOMED large fragments

This task consists of matching two (relatively) large fragments of FMA and SNOMED. The FMA fragment contains 50,523 classes (64% of FMA), while the SNOMED fragment contains 122,464 classes (40% of SNOMED).

Version-ID: 78811986-3e40-48a7-b75e-d263cfd44f51

Task 3: FMA whole ontology with SNOMED large fragment

This task consists of matching the whole FMA that contains 78,989 classes with a large SNOMED fragment that contains 122,464 classes (40% of SNOMED).

Version-ID: c36feab1-5fb2-4c52-8968-88c973938415

SNOMED-NCI matching problems

Reference alignments

There are 2 UMLS-based reference alignments for the SNOMED-NCI matching tasks. Note that, at the time of creating the datasets, we could not compute a refined UMLS alignment set with Alcomo. The new version of Alcomo, however, has shown to be able to cope with SNOMED-NCI.

Original UMLS mappings: 18,844 mappings ("=")
Refined UMLS mappings (LogMap): 18,324 mappings ("=", "<", ">")

Test Suite Information

Required input for SEALS OMT client:

Repository: http://seals-test.sti2.at/tdrs-web/
Suite-ID: cf0378d9-da30-4b58-b937-192028ed4961
Version-ID: see specific task

Task 1: SNOMED-NCI small fragments

This task consists of matching two (relatively) small fragments of SNOMED and NCI. The SNOMED fragment contains 51,128 classes (17% of SNOMED), while the NCI fragment contains 23,958 classes (36% of NCI).

Version-ID: d4721f5f-0bb1-4b59-8e81-e3c7ad38f06b

Task 2: SNOMED-NCI large fragments

This task consists of matching two (relatively) large fragments of SNOMED and NCI. The SNOMED fragment contains 122,464 classes (40% of SNOMED), while the NCI fragment contains 49,795 classes (75% of NCI).

Version-ID: 611c0450-5230-4b2c-a8fb-80280292e9e5

Task 3: SNOMED-NCI whole ontologies

This task consists of matching the whole NCI that contains 66,724 classes with a large SNOMED fragment that contains 122,464 classes (40% of SNOMED).

Version-ID: f85f75d6-2b63-440d-bfb8-bc239fa12f2c

Modalities

This track has two main objectives. On the one hand, it intends to evaluate the performance of matching systems when matching real large scale ontologies. On the other hand, it aims at creating an error-free "silver standard" reference alignment by "harmonising" the output of different matching and debugging systems, together with the current UMLS mapping sets. See OAEI 2011.5 harmonisation.

Regarding the use of background knowledge, the OAEI rules state that a resource (i.e. a third biomedical ontology) especially designed for the test is not allowed. Particularly, matching systems using UMLS as background knowledge will have an advantage since the reference alignment is also based on UMLS. Nevertheless, it will be interesting to evaluate the performance of a system with and without specialised background knowledge. Moreover, matching systems using UMLS may be specially helpful in the creation of the proposed "silver standard" reference alignment.

Modality 1: standard matching

For this modality the generated alignment should be an optimal solution to the matching problem with respect to both recall and precision. In the evaluation we will focus on the F-measure. Furthermore, we also motivate the creation of an error-free output, that is, the extracted mappings together with the ontologies should not lead to (many) unsatisfiabilities.

The evaluation of Modality 1 will be run with support of SEALS. This requires that you wrap your matching system in a way that allows us to execute it on the SEALS platform.

Modality 2: mapping debugging (optional)

Mapping debugging systems are also welcome to provide a revised version of the original UMLS mappings, similar to the current provided refinements.

We aim at harmonising different revised subsets of the UMLS mappings together with the outputs of the participants from Modality 1 in order to create an error-free "silver standard" reference alignment. Participant outputs will also be compared against the silver standard in order to analyse how different they are w.r.t. the other systems.

Modality 2 will be optional and will be run in an 'off-line' way.

Contact

Ernesto Jimenez-Ruiz: ernesto [at] cs [.] ox [.] ac [.] uk

Original page: http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2012/ [cached: 24/06/2014]