Anatomy

The Anatomy track of the 2009 campaign consists of finding alignments (at the moment 4 specific sub tasks, further subtasks will eventually be defined later on) between the Adult Mouse Anatomy and a part of the NCI Thesaurus (describing the human anatomy). The task is placed in a domain where we find large, carefully designed ontologies that are described in technical terms. Besides their large size and a conceptualization that is only to a limited degree based on the use of natural language, they also differ from other ontologies with respect to the use of specific annotations and roles, e.g. the extensive use of the partOf relation. The manual harmonization of the ontologies leads to a situation, where we have a high number of rather trivial mappings that can be found by simple string comparison techniques. At the same time, we have a good share of non-trivial mappings that require a careful analysis and sometimes also medical background knowledge.

Differences compared to 2008 *

The most important modification is related to fixing several flaws in the reference alignment and in the ontologies to be matched. Further we carefully added some disjointness axioms into the ontologies. As a result of these modifications the reference mapping will now contain a few numbers (less than five) of subsumption correspondences. This requires further considerations with respect to the chosen evaluation strategy. The revised datsets are not yet available.

~~Additionally, we plan to introduce an optional 5th subtask which is not yet completely defined.~~ A 5th subtask will not be included in 2009 evaluation.

Subtrack #1 to #4 have already been conducted in the same way in 2008, so if your are a 2008 participant you can skip over explanations related to these tracks.

Preliminary results

Unfortunately, at the moment (25.08.09) only the 2008 version of the dataset is available. But since there will only be minor modifications, please use this old version for preliminary test on task #1 - #4 (resp. in those substracks you like to participate, i.e. at least substask #1).

September 1st
    participants send preliminary results (for interoperability-checking)

At the moment it looks that there will be no additional 5th subtask.

Data sets

Use the following ontologies as input for your matching system. All of the four subtasks are based on matching these ontologies.

Subtask #4 is about matching two ontologies based on a partial reference alignment that has been e.g. generated by domain expert. This mapping has to be specified as additional input to your system with respect to task #4. See section 'Modalities' for detailed information on this task.

partial_reference.rdf

This alignment contains all 'trivial correspondences' as well as a small subset of non trivial correspondences.

Availability of the Reference Alignment

Notice again that the complete reference alignment is still not available! Nevertheless, it is in exceptional cases possible to make use of an 'evaluation service' where we compare the reference alignment for you against a submitted alignment in case the results (limited to precision & recall) are useful for a specific research question. You might write an email and ask for more details about this possibility.

Modalities

Subtracks

The anatomy track consists of four subtracks (a 5th subtrack will evantually be added). Substrack #1 is obligatory for all participants of the anatomy track, while subtrack #2, #3, and #4 are optional.

Subtrack #1, #2, and #3 are standard matching tasks with respect to the input (two ontologies to be matched). For all of these subtracks your matching system should generate an alignment between the mouse-anatomy and the human-anatomy that differs with respect to recall and precision. For subtrack #1 the generated alignment should be an optimal solution to the matching problem with respect to both recall and precision as far as possible (= precision and recall are evenly weighted). In the evaluation we will focus on the f-value. You should apply your system with standard parameters or at least with standard parameters for the biomedical domain. For subtrack #2 your matching system should generate a result that is optimized for precisison. Think of an scenario where the result of your system is directly used without verification of a domain expert. For subtrack #3 your matching system should generate a result that is optimized for recall. Think of a scenario where the result of your system is used afterwards as a comprehensive candidate mapping, that will be revised by a domain expert afterwards, by removing the incorrect correspondences manually. Comparing the results of subtrack #1, #2, and #3 will show in how far you system can be adjusted / parameterised for certain requirements.

Subtrack #4 has been added to the anatomy track for the second time. While we expect most systems to solve tasks #1, #2, and #3, we expect only few systems to solve this task. For this subtrack a part of the reference mapping is available as additional input.

#1: Matcher(Mouse, Human) => Mapping
#2: Matcher(Mouse, Human) => Mapping (increased precision)
#3: Matcher(Mouse, Human) => Mapping (increased recall)
#4: Matcher(Mouse, Human, PartialReferenceMapping) => ExtendedReferenceMapping
#5: ??? (not yet fixed, suggestions are welcome)

Suppose that this part of the reference mapping has been generated by e.g. a group of domain experts. You job is to use the information encoded in the mapping to imrprove the matching process. We believe that the information that certain correspondences are definitely correct can be used in some way within the matching process. In the evaluation we will compare the results of subtrack #1 with the results of #4, in particular we will compare Mapping \ PartialReferenceMapping to ExtendedReferenceMapping \ PartialReferenceMapping to see wether or not the additional information had positive effects.

Research Questions

Within the evaluation we try to focus on the following aspects:

Which system performed best (mainly with respect to #1)?
What about the runtime of your system (mainly with respect to #1)?
Can your system be adjusted for certain requirements (comparing results for #1, #2, and #3)?
Which system is best in finding non trivial correspondences (based on subtask #1 and #3)?
Can your system solve subtrack #4? How strong are the positive effects of exploiting the partial reference mapping?

Participation Conditions

Due to our 2007 and 2008 experiences we know that certain correspondences in the partial reference mapping are hard to detect by a matching system. If some of these correspondences are part of the submissions for #1, #2 or #3, we will ask the authors of the matching system to explain how these correspondences could be detected by the implemented algorithms. If it cannot be shown or at least be suggested how these correspondences have been generated automatically, we will exclude the system from taking part in the anatomy track!

We will choose a small sample of matching systems, install these systems, perform some of the matching tasks, and reproduce the results. In case your system has been chosen, we expect support to get your system running!

Format of submission

Your submission should contain the following folders and files:

+- anatomy
|  +- 1
|  |  +- participant.rdf
|  |  +- configuration-runtime.txt
|  +- 2
|  |  +- participant.rdf
|  |  +- configuration-runtime.txt
|  +- 3
|  |  +- participant.rdf
|  |  +- configuration-runtime.txt
|  +- 4
|  |  +- participant.rdf
|  |  +- configuration-runtime.txt

The files participant.rdf (replace 'partcipant' by the name of your system) contain the mappings generated by your system. These files have to follow the format described here (standard format for submissions to the OAEI). The files configuration-runtime.txt should contain a few lines describing the parameter setting, as well as the runtime specfied in seconds and a short description of the used machine (CPU + RAM). There is no specific format for these files. If you do not participate in all subtask, do not include the corresponding folders in your submission. Submission are only accepted for a certain subtrack, if the corresponding folders contains both files!

The reference mapping contains only equivalence correspondences between concepts of the ontologies. No correspondences between properties (roles) are specified. If your system also creates correspondences between properties, or correspondences that describe subsumption relations, these results will not influence the evaluation (but can nevertheless be part of your submitted results)

Please submit the files (preliminary and final results) directly to the email address given under 'Contact' (below). Please send the results zipped in a file participant.zip or participant.rar and let the name of your matching systems occur somewhere in the subject heading of the mail.

Schedule

The schedules given at https://oaei.ontologymatching.org/2009/index.html#schedule is also binding for the anatomy track

Acknowledgements

We would like to gratefully thank Martin Ringwald and Terry Hayamizu (Mouse Genome Informatics - http://www.informatics.jax.org/), who provided us with a reference mapping for the matching task of this track.

In addition, we would like to thank all of the participants of the OAEI-07 and OAEI-08 anatomy track for hints and discussions with respect to the realization and evaluation of the last year.

Contacts

This track is organized by Christian Meilicke and Heiner Stuckenschmidt. If you have any problems working with the ontologies, any questions, or any suggestions related to the anatomy track, feel free to write an email to christian [at] informatik [.] uni-mannheim [.] de.

Initial location of this page: http://webrum.uni-mannheim.de/math/lski/anatomy09/index.html