Track: Anatomy - OAEI 2007 - Reference Sample

To get a better understanding of the OAEI 2007 anatomy track and to support further improvements of matching systems a small sample of the reference mapping is provided for download on this page. To construct this reference mapping some further evaluations have been conducted which might also be interesting. These are reported here.

Constructing the Sample

To construct a useful sample, the following strategy has been applied. We first computed for all submission the subset of all correct non trivial correspondences. These are all correspondences that are both correct (with respect to the reference mapping) and non-trivial (with respect to the simple string equality method, shortly described here).

In the next step, we removed from all of these mappings the correspondences that have been detected by one of the other matching systems: For each matching system, we constructed the set of non trivial correct correspondences that have been detected only by this specific matching system and that no other system could detect. We refer to these sets as UNTC(matcher), the unique non-trivial correspondences detected by matching system matcher.

These mappings are quite interesting. If e.g. for matcher1 we have |UNTC(matcher1)| = 10 while we have |UNTC(matcher2)| = 0, |UNTC(matcher3)| = 0 and so on, then this means that matcher1 has used some source of information or some successful strategy that is very specific to this matcher. From a different point of view: The characteristics of |UNTC(matcher1)| and the characteristics of matcher1 seem to be a good starting point for finding non trivial strategies to increase recall.

The Sample

The following table provides a link to the UNTC mappings for all participating systems (only for Lily we had to choose a subset of only 10 correspondences, to keep the sample small).

Matching System	Size of UNTC	Link to UNTC-Mapping
AOAS	6	untc_AOAS.rdf
Sambo	10	untc_SAMBO.rdf
ASMOV	2	untc_ASMOV.rdf
Rimom	5	untc_rimom.rdf
Falcon-AO	0	untc_falcon.rdf
TaxoMap	6	untc_TaxoMap.rdf
AgreementMaker	2	untc_sunna-cruz.rdf
Prior+	7	untc_prior.rdf
Lily	20 (10)	untc_lily.rdf (a subset of size 10)
X-SOM	0	untc_xsom.rdf
DSSim	6	untc_DSSim.rdf
UNION of all UNTCs	54	untc_UNION.rdf

One might be confused about the relatively small numbers, but notice that it is quite hard for a matcher to find correspondences that non of the other 10 systems (some with low precision) detects. Therefore, in particular the results for Lily are interesting. Notice that Lily does not use any medical background knowledge.

The whole reference sample is referred to as untc_UNION.rdf in the table.

In the context of computing these UNTCs we also computed the union-mappings of all matching systems. For this mapping we measured 92.3% recall and 20.6% precision. The high value for this "union-recall" shows (for the best system we had 80.4% recall) that there is still some room for improvement, even though it will be hard to sort out the incorrect mappings.

Further evaluations

The small sample of the non trivial reference mapping (the UNION of all UNTCs) should be useful in the process of enhancing a matching system, performing some partial evaluation, doing some analysis, and so on. Nevertheless, some of the OAEI 2007 and potential OAEI 2008 anatomy participants might be interested in a full evaluation of their (eventually) improved or first time participating systems. Therefore, it is possible to send us an submission in the already described format. Since it is still not possible to publish the whole reference mapping, as a response to such a submission you will be informed about precision, recall and f-measure of your submission. To avoid reconstructing the reference mapping from precision and recall values (we think this is possible, if submissions are sent to us in a systematic way) we restrict ourselves to do this for each system only two times (in the period from January to June 2008).

If you have any questions, problems with the dataset, or some interesting observations with respect to your experience with the data set, write an email to christian [at] informatik.uni-mannheim.de.

Last updated: 18.12.2007 by Christian Meilicke.