Ontology Alignment Evaluation Initiative - OAEI-2011 Campaign

Instance Matching at OAEI 2011 (IM@OAEI2011)

General description

IM@OAEI2011 is an initiative for the evaluation of instance matching techniques and tools. IM@OAEI2011 is a track of the Ontology Alignment Evaluation Initiative (OAEI - http://oaei.ontologymatching.org/2011), held every year in collaboration with the Ontology Matching Workshop at ISWC (http://om2011.ontologymatching.org).

IM@OAEI2011 is focused on RDF and OWL data in the context of the Semantic Web. Participants will be asked to execute their algorithms against various datasets and their results will be evaluated by comparing them with a pre-defined reference alignment provided by IM@OAEI2011. Results will be evaluated according to standard precision and recall metrics.

For each task describe below, we give the datasets to interlink, but also the reference alignments. Participants can thus prepare their algortihms so that they work at best and send us their evaluation results.

Datasets

Use the following datasets as input for your matching system. You can already test with this data and report probems (send reports to contact email). They will be frozen by July 1st.

Interlinking New-York Times Data - DOWNLOAD

Participants are requested to re-build the links among the NYT dataset itself (see data.nytimes.com), and to the external data sources DBPedia, Geonames and Freebase. Reference alignments are provided for each resource as RDF alignments. These alignments are extracted from the links provided and curated by NYT. The whole NYT dataset is available under Creative Commons license - CC BY 3.0.

Here are a few stats on the datasets and their interlinks. The NYT dataset contain three areas: people, organizations and locations.

StatPeopleOrganizationsLocations
Nr of NYT resources995860883840
Total nr of sameAs links14884800387861
Links to Freebase497930441920
Links to DBPedia497719491920
Links to NYT497930441920
Links to Geonames001789

(1) 12 links actually do not belong to the interconnected datasets and are not considered for this task.

(2) We are interested in matching all DBpedia resources, with the exception of resources that redirect to another resource (e.g. dbpedia:Gordon_Sumner dbpedia:ontology/wikiPageRedirects dbpedia:Sting_(musician) .) and resources that represent Wikipedia disambiguation pages (e.g. http://dbpedia.org/page/Lille_(disambiguation) dbpedia-owl:wikiPageDisambiguates dbpedia:Lille .).

Instructions to acess the other datasets are given below:

Synthetic data generated from Freebase data

 

Modalities

Subtasks

IM@OAEI2010 is organized in two sub-tracks, namely:

 

Data interlinking track (DI). This year the Data interlinking track focuses the following aspects: Retrieving New York Times interlinks with DBPedia, Freebase and Geonames. The dataset and the reference alignments are given in the Datasets section above. The New York Times Dataset includes 4 sub datasets: Persons, locations, organizations and descriptors that should be matched to themselves to detect duplicates, and to DBPedia, Freebase and Geonames. We note that only Geonamaes has links to the Locations dataset of NYT.

 

Synthetic data track (IIMB). The synthetic data track is focused on two main goals: i) to provide an evaluation dataset for various kinds of data trasformations, including value trasformations, structural tranformations, and logical transformations; ii) to cover a wide spectrum of possible techniques and tools. To this end, the IIMB benchmark is generated by starting from an initial OWL knowledge base that is transformed into a set of modified knowledge bases by applying several automatic transformations of data. Participants are requested to find the correct correspondences among individuals of the first knowledge base and individuals of the others.

Participation Conditions

Participating systems are free to use any combination of matching techniques and background knowledge.

Format of submission

For each track you participate, your submission should contain the following folders and files.

+- imei
|  +- [trackname]
|  |  +- participant.rdf

The files participant.rdf (replace 'partcipant' by the name of your system) contain the mapping generated by your system. These files have to follow the format described here (standard format for submissions to the OAEI).

The reference mapping contains only correspondences between instances of the ontologies. No correspondences between concepts and properties (roles) are specified in the reference alignment.

Please submit the files (preliminary and final results) directly to the email address contact mail. Send the results (g)zipped in a file participant.zip or participant.tgz and let the name of your matching systems occur somewhere in the subject heading of the mail.

Schedule

Acknowledgements

We would like to thank all of the participants of the previous OAEI instance matching track editions for hints and discussions with respect to the realization and evaluation over the last years.
We would like to thank Evan Sandhaus from the New York Times for his support.

Contact

Alfio Ferrara, Università degli Studi di Milano, Italy

Laura Hollink, TU Delft, Netherland

Andriy Nikolov, Knowledge Media Institute, The Open University, UK

Jan Noessner, University of Mannheim, Germany

Willem Robert van Hage, VU Amsterdam, Netherland

François Scharffe, LIRMM, University of Montpellier, France

Raphael Troncy, Eurecom, France

Original page: http://www.instancematching.org/oaei/imei2011.html [cached: 06/12/2011]