OAEI 2014::Instance Matching Track

The Instance Matching Track aims at evaluating the performance of matching tools when the goal is to detect the degree of similarity between pairs of items/instances expressed in the form of OWL Aboxes.

The track is organized in two independent tasks, namely Identity Recognition (id-rec task) and Similarity Recognition (sim-rec task).

For each task, participants receive two datasets called source and target, respectively. The task goal is to discover the matching pairs (i.e., mappings) among the instances in the source dataset and the instances in the target dataset. Both the tasks are blind, meaning that the set of expected mappings (i.e., reference alignment) is not given to the participants.

Access to the datasets for both the tasks of the OAEI 2014 campaign:

Download as a zip file (im_oaei2014_datasets.zip)
Access via the SEALS platform. Dataset identifiers for the SEALS OMT client:
Id-rec task
- Repository: http://seals-test.sti2.at/tdrs-web/
- Suite-ID: 14366713-cc4d-487e-a2fd-80e6700bae25
- Version-ID: 3aa04b2b-47b2-4259-a381-64d622bd32be
Sim-rec task
- Repository: http://seals-test.sti2.at/tdrs-web/
- Suite-ID: e65599bb-8c49-4f59-bd6b-6c970bfbee7f
- Version-ID: f392b30e-51fb-4a00-9cfa-702a33fb1aff

Identity Recognition Task (show/hide results)

Results of the Identity Recognition Task

Participants

InsMT
InsMTL
LogMap
LogMap-C
RiMOM-IM

Expected mappings

The set of expected mappings (initially hidden to the participants) is now available.

Analysis by precision, recall, and F-measure

Identity recognition task - precision, recall, f-measure

	Precision	Recall	F-measure
InsMT	0.0008	0.7785	0.0015
InsMTL	0.0008	0.7785	0.0015
LogMap	0.6031	0.0540	0.0991
LogMap-C	0.6421	0.0417	0.0783
RiMOM-IM	0.6491	0.4894	0.5581

Analysis by increasing values of matching threshold

For each matching tool, we observe precision, recall, and F-measure when increasing values of matching threshold are specified. The threshold defines the minimum matching value between a pair of instances that is required for considering them as referring to the same real-world entity.

Identity recognition task - analysis by matching threshold - InsMT

Identity recognition task - analysis by matching threshold - InsMTL

Identity recognition task - analysis by matching threshold - LogMap

Identity recognition task - analysis by matching threshold - LogMap-C

Identity recognition task - analysis by matching threshold - RiMOM-IM

The goal of the id-rec task is to determine when two OWL instances describe the same real-world entity.

The datasets of the id-rec task have been produced by altering a set of original data with the aim to generate multiple descriptions of the same real-world entities where different languages and representation formats are employed.

We provide two Aboxes:

oaei2014_identity_a.owl - (source dataset)
oaei2014_identity_b.owl - (target dataset)

The source Abox contains 1330 instances described through 4 classes, 5 datatype properties, and 1 annotation property. The target Abox contains 2649 instances described through 4 classes, 4 datatype properties, 1 object property, and 1 annotation property.

What we expect from participants. Participants are requested to match the instances of the class http://www.instancematching.org/ontologies/oaei2014#Book in the source Abox against the instances of the corresponding class in the target Abox. The task goal is to produce a set of mappings between the pairs of matching instances that are found to refer to the same real-world entity. A book instance in the source Abox can have none, one, or more than one matching counterparts in the target Abox.

Evaluation strategy. The mapping produced by the participants will be compared against a ground truth where an instance i in the source dataset is associated with all the instances in the target dataset that represent an altered description of i. Evaluation will be performed through precision, recall, and F-measure.

Submission procedure. The task evaluation will be executed with the support of SEALS. Participants are requested to adjust their matching tool in a way that it can be invoked on the SEALS platform (see OAEI 2014 evaluation details).

Additionally, participants are also required to provide the results in the TSV format, i.e.,
uri_of_source_instance\turi_of_target_instance\tsimilarity_value\n
and to send a text file containing the mappings to Alfio Ferrara (alfio.ferrara@unimi.it).

Similarity Recognition Task (show/hide results)

Results of the Similarity Recognition Task

Participants

InsMT
RiMOM-IM

Expected mappings

The set of expected mappings (initially hidden to the participants) is now available.

Mapping analysis

We call reference alignment the mapping set obtained through crowdsourcing, where each mapping is associated with a similarity degree σ defined by the workers involved in the crowdsourcing activities. The cardinality of the reference alignment is 4.104 mappings.
In the analysis, for each mapping, we are interested in comparing the similarity degree of the reference alignment against the similarity degree calculated by the matching tools of the participants.
In the diagram, for a mapping m (i.e., the mapping between a certain instance i of the source dataset and a certain instance j of the target dataset):

we plot the similarity degree σ of the mapping m in the reference alignment (red line);
we plot the distance between σ and the similarity degree calculated by the matching tool (blue line for InsMT; green line for RiMOM-IM).

For the sake of readability, the mappings of the reference alignment are sorted according to the associated similarity degree.

Similarity recognition task - mapping analysis

Analysis by range of distance

We split the range [0,1] of possible similarity degrees into ten smaller ranges of size 0.1 that we call range of distance. A range of distance r is populated with those mappings whose distance from the reference alignment is in the range of r. Consider a mapping m:

σ is the similarity degree of the mapping m in the reference alignment;
τ is the similarity degree of the mapping m calculated by a considered matching tool.

The mapping m is placed in the range of distance corresponding to the value |σ - τ|.

Similarity recognition task - analysis by range of distance

As a further, synthetic analysis indicator, for each matching tool, we calculate the Euclidean distance with respect to the reference alignment. Consider:

M is the overall number of mappings in the reference alignment based on crowdsourcing (i.e., 4.104 mappings);
σ is the similarity degree of a mapping in the reference alignment;
τ is the similarity degree of a mapping calculated by the considered matching tool.

The Euclidean distance is calculated as follows: $$d=√{∑↙{i=1}↖M (σ_i - τ_i)^2}$$ As a result, we have

d(InsMT) = 37.03
d(RiMOM) = 21.83

The goal of the sim-rec task is to evaluate the degree of similarity between two OWL instances, even when the two instances describe different real-world entities.

The datasets of the sim-rec task have been produced through crowdsourcing by employing the Argo system (Italian language). More than 250 workers have been involved in the crowdsourcing process to evaluate the degree of similarity between pairs of instances describing real books. Crowdsourcing activities have been organized into a set of HITs (Human Intelligent Task) assigned to workers for execution. A HIT is a question where the worker is asked to evaluate the degree of similarity of two given instances. The worker exploits the instances (i.e., book descriptions) “at a glance” and she/he has to specify her/his own perceived similarity by assigning a degree in the range [0,1].

We provide two Aboxes:

oaei2014_similarity_a.owl - (source dataset)
oaei2014_similarity_b.owl - (target dataset)

What we expect from participants. Participants are requested to match the instances of the class http://www.instancematching.org/ontologies/oaei2014#Book in the source Abox against the instances of the corresponding class in the target Abox. The task goal is to produce a complete set of mappings between any pair of instances. The source Abox contains 173 book instances and the target Abox contains 172 book instances, then we expect the participants to provide 173x172 = 29756 mappings, each one featured by a degree of similarity in the range [0,1].

Evaluation strategy. The mappings produced by the participants will be compared against the mappings obtained through crowdsourcing. Given a mapping m, we will compare the similarity degree assigned to m by the matching tool against the corresponding similarity degree assigned by the crowdsourcing workers. The evaluation will be performed through the Euclidean distance.

Original page: http://islab.di.unimi.it/im_oaei_2014/index.html [cached: 13/05/2016]