How to interprete OAEI rules
It is not always easy to see whether a certain aspect of a matching system implementation is valid against the rules. To enable a better understanding, we present a list of typical examples together with short explanations.
Examples for breaking the rules
- Machine learning techniques are used on subsets of (or complete) OAEI datasets to find an optimal weighting of different similarity measures.
Why incorrect: OAEI datasets cover different types of matching tasks, however, this choice is not representative. Learning with these examples results in an inacceptable overfitting. In machine learning there has to be a clear distinction between test data and training data.
- Several predefined configurations that are activated by detecting certain namespaces, e.g. it is checked for the occurence of a specific string like "benchmark".
Why incorrect: No need for an explanation.
- A system is trained on OAEI datasets to automatically choose from a set of predefined settings.
Why incorrect: Such a strategy does not take into account that this approach can result in learning something that works very well for OAEI datasets, but the results of the learning process might nevertheless be completely unreasonable. Moreover, see the general remark on machine learning from above.
Examples for not breaking the rules
- A specific large-scale ontology matching strategy is automatically activated, if the ontologies to be matched have more than 1000 concepts.
Why correct: This rule of thumb is intended to work well for both OAEI testcases and other ontology matching problems. It is in general, a reasonable distinction that is guided by detecting a characteristic of the matching problem, which is relevant for solving it.
- The similarity of the ontologies to be matched is measured prior to the core matching process. The higher the similarity is, the more weight you put to smilarity flooding.
Why correct: See remark above.
- Labels are analzed in a preprocessing step. If biomedical terms are detected, activate UMLS is activated as background knowledge.
Why correct: Again, the same argument as above holds. Given a non OAEI matching problem, it can be argued that the conditional usage of UMLS will have no negative effects on matching non-biomedical ontologies and probably a positive effect on matching biomedical ontologies.
This listing is for sure not complete. Please contact OAEI organizers if you are thinking about an issue that is not clarified by this listing.
$Id: oaei-rules.1.html,v 1.1 2012/07/04 08:40:06 euzenat Exp $